Jira Legacy server System JiraJIRA columnIds issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODKBEKBJ-260
...
When a user clicks on 'Export package CSV' the modal window is getting shown. The user can select all or multiple Package fields, Title fields, and additional fields to export. Then a user presses the 'Export' button, the modal panel disappears, and an export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins)
The name of the generated file depends on what is going to be exported from the packages:
- Package details only export (when a user chooses fields in the Package dropdown onlyONLY, staying on the Package export page) - <<YYYY_MM_DD_hh_mm_>>ss>>_<<Package name>>_packagedetails.csv, for example, 2022_04_11_09_10_45_WileyOnlineLibrary_packagedetails.csv
- Title-package details export (when a user ALSO chooses fields in the Titles dropdown staying on the Package export page) - <<YYYY_MM_DD_hh_mm_>>ss>>_<<Title name>>_packagetitles.csv, for example 2022_04_11_15_02_03_WileyOnlineLibrary_packagetitles.csv
So, the Package detail export produces 1 csv file only when package details are exported, and 2 csv files when Package details + Title-package details are exported
Title Package export
Works similar to Package export. When a user presses the 'Export' button, a modal panel disappears, and the export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins)
The name of a generated file is always <<YYYY_MM_DD>>_DD_hh_mm_ss>>_<<Title name>>_titledetails.csv, for example 2022_04_11_23_15_31_WileyOnlineLibrary_titledetails.csv
...
- Multiple records separator: pipe
- The extension is: csv
- Delimiter for an array of values: comma
...
Solution
Export Manager The ExportManager application can satisfy the given requirements. It provides functionality to process batches of data in a flexible and configured manner. There are sources we can reuse to retrieve various objects, generate CSV files, upload files into vendor-specific storage, and share access to the stored files. This application can manage 'immediate' export jobs and 'scheduled' export jobs, that have to be configured before the run. Export Manager consists of backend modules mod-data-export-spring, mod-data-export-worker, and UI module ui-export-manager.
Learning: https://www.toptal.com/spring/spring-batch-tutorial
https://docs.spring.io/spring-batch/docs/current/reference/html/index.html
ui-export-manager shows a list of jobs, job status, job type, and other information. Here users can see the result of job execution, and download files. This module uses REST API of the mod-data-export-spring to retrieve jobs.
...
mod-data-export-spring is designed to manage, configure, and run jobs. This module is the an entry point to start data export, it calls mod-data-export-worker to execute jobs sending events to the mod-data-export-worker's Kafka topic.
What should be done in this module:
- add new export type - 'eHoldings';
- add new request parameters (to ExportTypeSpecificParameters.json), needed to pass export fields, search params for titles search, and other params;
- add new JobCommandBuilder, needed to take request parameters to pass in Kafka event;
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
mod-data-export-worker is intended to receive events from mod-data-export-spring,andexecute its jobs. The module is built based on Spring Batch Framework, and jobs are configured by a set of steps. The execution of a job happens in 3-stages: retrieve data, process data, and write data to the temporary file. Uploading files to some vendor-specific storage is preconfigured already (using AWS S3 bucket) is preconfigured already by the listener and happens when the file is completely written.
What should be done in this module:
- create a Reader extending base functionality (CsvItemReader.java, see CirculationLogCsvItemReader.java as an example). The reader should retrieve packages/titles using REST clients, taking search parameters from the incoming Kafka event (from job parameters);
- create a Processor (implementing ItemProcessor). The processor has to take only selected fields for export from the incoming packages/titles. The list of fields for export comes from job parameters;
- create a Writer extending base functionality (we can just use CsvWriter.java if nothing special is needed);
- create a Configurationto Configuration to build a job and set Reader, Writer, and Processor (see CirculationLogJobConfig.java);
- configure a cleaner to purge deprecated files (that we generated more than 30 days back);
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
ui-eholdings will be able to create and send the export jobs from UI
What should be done in this module:
- store the fields available to export (see attached user story);
- send jobs when a user clicks on the 'Actions' button, the modal window appears and shows multi-select controls filled with title & package fields.
NOTE: it will be useful to have one task to display only eHoldings fields, and another task to display non-eHoldings fields also;
- when a user clicks on the 'Export' button, UI creates a Job sending a request to mod-data-export-spring;
- show a green toast message with :
"methods": [ "POST" ], "pathPattern": "/data-export-spring/jobs", "permissionsRequired": [ "data-export.job.item.post" ], "modulePermissions": []
Request payload follows this schema: https://github.com/folio-org/folio-export-common/blob/master/schemas/job.json
"type" equals 'E_HOLDINGS, and "exportTypeSpecificParameters" see here https://github.com/folio-org/folio-export-common/blob/master/schemas/eholdings/eHoldingsExportConfig.json
- when the export has started, a modal window disappears and the green toast message appears, it shows the name of generating file, and the an approximate duration of export. Duration is defined based on the number of records for export;
We will test the export in various numbers of records to determine what's the approximate duration equal to.
- when a job is completed it contains the link to download the csv file
mod-kb-ebsco-java stores packages & titles, and provides REST API to retrieve these objects. REST methods are already provided and needed for mod-data-export-worker:
Retrieve package details: "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}", "permissionsRequired": ["kb-ebsco.packages.item.get"]
Retrieve package titles : GET (resource): "methods": ["GET"], "pathPattern": "/eholdings/titlespackages/{titleId}
Application Reliability and Performance
...
packageId}/resources", "permissionsRequired": ["kb-ebsco.package-resources.collection.get"]
Integration tests
There are no integration tests for the ExportManager module written yet. We will add the folder 'mod-data-export-spring' and create test cases to cover eHoldings export feature
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Application reliability and performance
- Generated CSV file is getting stored in a temporary folder(java.io.tempdir) on the file system. The available volume of the temporary folder depends on how what the Java Virtual Machine is configured. In case When this folder gets overflowed, the job stops and gets FAILED execution status, and the description shows the exact error. Now the system doesn't provide ways to control the volume of the temporary folder, however, we can easily add such an ability, when we need it;.
- Mod-data-export-worker persists job execution parameters in the underlying database (table BATCH_JOB_EXECUTION_PARAMS). Developers from other teams increased the size of the column to store request parameters (10000 symbols for now). We should be careful with passing a lot of request parameters from UI (export fields and other request paramsparameters);.
- Storing files on Amazon Cloud will take some costs. We will set up a cleaner that will purge deprecated files, so this will help us to keep the storage in a good condition. Now parameters for the frequency of cleaning and time to keep files are hardcoded. In case we need to control this and set parameters on application startup, we can implement easily add such an ability;.
- There are no performance tests done and documented yet. We can do and document such tests, at least just to know how much time will take the export a limited some fixed number of records;.
...
Questions to the story:
- Q(from Igor): Should a user be automatically directed to the Export Manager after pressing the 'Export' button? A(from Khalilah): No, the user just starts export, a modal panel disappears, a green toast message appears, and then nothing happens on UI;
- Q(from Igor): Should the list of package&title fields be configured in Settings? Or it is always will be hardcoded?
...
- the question is still open, will discuss it ;
- Q(from Igor): Does it make sense to add a time to the generated file name (to make it unique)? A(from Khalilah): yes, it does;
- Q (from Khalilah): Does this approach exporting selected packages or titles only? A(from Igor): This approach will work for both selected and non-selected packages/titles
- Q (from Khalilah): We need to create large data load functional testing stories. We need to determine if export must have record size/row size/file size limit. And if export has any system/hosting requirements