Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Jira Legacy
serverSystem JiraJIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODKBEKBJ-260

...

When a user clicks on 'Export package CSV' the modal window is getting shown. The user can select all or multiple Package fields, Title fields, and additional fields to export. Then a user presses the 'Export' button, the modal panel disappears, and an export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins). Is the user automatically directed to Export Manager ?

...

  Image Added Image AddedImage Added


The name of the generated file depends on what is going to be exported from the packages:

  • Package details only export (when a user chooses fields in the Package dropdown onlyONLY, staying on the Package export page) -  <<YYYY_MM_DD_hh_>>mm_ss>>_<<Package name>>_packagedetails.csv,  for example, 2022_04_11_09_10_45_WileyOnlineLibrary_packagedetails.csv
  • Title-package details export (when a user ALSO chooses fields in the Titles dropdown staying on the Package export page) - <<YYYY_MM_DD_hh_mm_>>ss>>_<<Title name>>_packagetitles.csv, for example 2022_04_11_15_02_03_WileyOnlineLibrary_packagetitles.csv

So, the Package detail export produces 1 csv file only when package details are exported, and 2 csv files when Package details + Title-package details are exported

Title Package export

Works similar to Package export. Then When a user presses the 'Export' button, the a modal panel disappears, and an the export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins). Is the user automatically directed to Export Manager ?

...

Image AddedImage Added

 The name of a generated file is always  <<YYYY_MM_DD>>DD_hh_mm_ss>>_<<Title name>>_titledetails.csv, for example 2022_04_11_23_15_31_WileyOnlineLibrary_titledetails.csv

...

  • Multiple records separator: pipe 
  • The extension is: csv
  • Delimiter for an array of values: comma


...

Solution

The Export Manager ExportManager application can satisfy the given requirements. It provides functionality to process batches of data in a flexible and configured manner. There are sources we can reuse to retrieve various objects, generate CSV files, upload files into vendor-specific storage, and share access to the stored files. This application can manage 'immediate' export jobs and 'scheduled' export jobs, that have to be configured before runningthe run. Export Manager consists of backend modules mod-data-export-spring, mod-data-export-worker, and UI module ui-export-manager.\\ui-export-manager


Learning: https://www.toptal.com/spring/spring-batch-tutorial

https://docs.spring.io/spring-batch/docs/current/reference/html/index.html


ui-export-manager shows a list of jobs, job status, job type, and other information. Here users can see the result of job execution, and download files. This module uses REST API of the mod-data-export-spring to retrieve jobs.

Image Added

This module should be able to display and filter new eHoldings jobs, that we will use to export packages & titles.
What should be done in this module:
      - add a new job type - 'eHoldings';


mod-data-export-spring

...

is designed to manage, configure, and run jobs. This module is an entry point to start data export, it calls mod-data-export-worker to execute jobs sending events to the Kafka topic.
What should be done in this module:
      - add new export type - 'eHoldings';
      - add new request parameters (to ExportTypeSpecificParameters.json), needed to pass export fields, search params for titles search, and other params;
      - add new JobCommandBuilder, needed to take request parameters to pass in Kafka event;

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPS-94


mod-data-export-worker

...

is intended to receive events from mod-data-export-spring,andexecute its jobs. The module is built based on Spring Batch Framework, and jobs are configured by a set of steps. The execution of a job happens in 3-stages: retrieve data, process data, and write data to the temporary file. Uploading files to some vendor-specific storage (AWS S3 bucket) is preconfigured already by the listener and happens when the file is completely written.

Image Added
What should be done in this module:
      - create a Reader extending base functionality (CsvItemReader.java, see CirculationLogCsvItemReader.java as an example). The reader should retrieve packages/titles using REST clients, taking search parameters from the incoming Kafka event (from job parameters);
      - create a Processor (implementing ItemProcessor). The processor has to take only selected fields for export from the incoming packages/titles. The list of fields for export comes from job parameters;
      - create a Writer extending base functionality (we can just use CsvWriter.java if nothing special is needed);
      - create a Configuration to build a job and set Reader, Writer, and Processor (see CirculationLogJobConfig.java);


Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-107

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-122

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-108

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-111

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-112

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-113


ui-eholdings will be able to create and send the export jobs from UI
      What should be done in this module:
     -
when a user clicks on the 'Actions' button, the modal window appears and shows multi-select controls filled with title & package fields.
        NOTE: it will be useful to have one task to display only eHoldings fields, and another task to display non-eHoldings fields also;

     - when a user clicks on the 'Export' button, UI creates a Job sending a request to mod-data-export-spring:
        "methods": [ "POST" ], "pathPattern": "/data-export-spring/jobs", "permissionsRequired": [ "data-export.job.item.post" ], "modulePermissions": []
        Request payload follows this schema: https://github.com/folio-org/folio-export-common/blob/master/schemas/job.json
        "type" equals 'E_HOLDINGS, and "exportTypeSpecificParameters" see here https://github.com/folio-org/folio-export-common/blob/master/schemas/eholdings/eHoldingsExportConfig.json

      - when the export has started, a modal window disappears and the green toast message appears, it shows the name of generating file, and an approximate duration of export.
         We will test the export in various numbers of records to determine what's the approximate duration equal to.

       - when a job is completed it contains the link to download the csv file 


mod-kb-ebsco-java

...

ui-eholdings ToDo

...

stores packages & titles, and provides REST API to retrieve these objects. REST methods are already provided for mod-data-export-worker:
        Retrieve package details: "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}", "permissionsRequired": ["kb-ebsco.packages.item.get"]
        Retrieve package titles (resource): "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}/resources", "permissionsRequired": ["kb-ebsco.package-resources.collection.get"]

Integration tests

There are no integration tests for the ExportManager module written yet. We will add the folder 'mod-data-export-spring' and create test cases to cover eHoldings export feature

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyFAT-1659


...

Application reliability and performance

  • Generated CSV file is getting stored in a temporary folder(java.io.tempdir) on the file system. The available volume of the temporary folder depends on what the Java Virtual Machine is configured. When this folder gets overflowed, the job stops and gets FAILED execution status, and the description shows the exact error. Now the system doesn't provide ways to control the volume of the temporary folder, however, we can easily add such an ability, when we need it.
  • Mod-data-export-worker persists job execution parameters in the underlying database (table BATCH_JOB_EXECUTION_PARAMS). Developers from other teams increased the size of the column to store request parameters (10000 symbols for now). We should be careful with passing a lot of request parameters from UI (export fields and other request parameters).
  • Storing files on Amazon Cloud will take some costs. We will set up a cleaner that will purge deprecated files, so this will help us to keep the storage in a good condition. Now parameters for the frequency of cleaning and time to keep files are hardcoded. In case we need to control this and set parameters on application startup, we can easily add such an ability.
  • There are no performance tests done and documented yet. We can do and document such tests, at least just to know how much time will take the export some fixed number of records.


...

 Questions to the story:

  1. Q(from Igor): Should a user be automatically directed to the Export Manager after pressing the 'Export' button? A(from Khalilah): No, the user just starts export, a modal panel disappears, a green toast message appears, and then nothing happens on UI;
  2. Q(from Igor): Should the list of package&title fields be configured in Settings? Or it always will be hardcoded? the question is still open, will discuss it ;
  3. Q(from Igor): Does it make sense to add a time to the generated file name (to make it unique)? A(from Khalilah): yes, it does;
  4. Q (from Khalilah): Does this approach exporting selected packages or titles only? A(from Igor): This approach will work for both selected and non-selected packages/titles
  5. Q (from Khalilah): We need to create large data load functional testing stories. We need to determine if export must have record size/row size/file size limit. And if export has any system/hosting requirements