Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Jira Legacy
serverSystem JiraJIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODKBEKBJ-260

...


mod-data-export-spring is designed to manage, configure, and run jobs. This module is an entry point to start data export, it calls mod-data-export-worker to execute jobs sending events to the Kafka topic.
What should be done in this module:
      - add new export type - 'eHoldings';
      - add new request parameters (to ExportTypeSpecificParameters.json), needed to pass export fields, search params for titles search, and other params;
      - add new JobCommandBuilder, needed to take request parameters to pass in Kafka event;

Jira Legacy
serverSystem JiraJIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPS-94


mod-data-export-worker is intended to receive events from mod-data-export-spring,andexecute its jobs. The module is built based on Spring Batch Framework, and jobs are configured by a set of steps. The execution of a job happens in 3-stages: retrieve data, process data, and write data to the temporary file. Uploading files to some vendor-specific storage (AWS S3 bucket) is preconfigured already by the listener and happens when the file is completely written.

Image Added
What should be done in this module:
      - create a Reader extending base functionality (CsvItemReader.java, see CirculationLogCsvItemReader.java as an example). The reader should retrieve packages/titles using REST clients, taking search parameters from the incoming Kafka event (from job parameters);
      - create a Processor (implementing ItemProcessor). The processor has to take only selected fields for export from the incoming packages/titles. The list of fields for export comes from job parameters; [Use 2 processors]
      - create a Writer extending base functionality (we can just use CsvWriter.java if nothing special is needed);
      - create a Configuration to build a job and set Reader, Writer, and Processor (see CirculationLogJobConfig.java);


Jira Legacy
serverSystem JiraJIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-107

Jira Legacy
serverSystem JiraJIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-122

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-108

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-111

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-112

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODEXPW-113


ui-eholdings will be able to create and send the export jobs from UI
      What should be done in this module:
     
  - store the fields available to export (see attached user story)- when a user clicks on the 'Actions' button, the modal window appears and shows multi-select controls filled with title & package fields.
        NOTE: it will be useful to have one task to display only eHoldings fields, and another task to display non-eHoldings fields also;

        - send jobs - when a user clicks on the 'Export' button, UI creates a Job sending a request to mod-data-export-spring;
        - show a green toast message with :
        "methods": [ "POST" ], "pathPattern": "/data-export-spring/jobs", "permissionsRequired": [ "data-export.job.item.post" ], "modulePermissions": []
        Request payload follows this schema: https://github.com/folio-org/folio-export-common/blob/master/schemas/job.json
        "type" equals 'E_HOLDINGS, and "exportTypeSpecificParameters" see here https://github.com/folio-org/folio-export-common/blob/master/schemas/eholdings/eHoldingsExportConfig.json

      - when the export has started, a modal window disappears and the green toast message appears, it shows the name of generating file, and the an approximate duration of export. Duration is defined based on the number of records for export;
         We will test the export in various numbers of records to determine what's the approximate duration equal to.

       - when a job is completed it contains the link to download the csv file 


mod-kb-ebsco-java stores packages & titles, and provides REST API to retrieve these objects. REST methods are already provided for mod-data-export-worker:
        Retrieve package details: "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}", "permissionsRequired": ["kb-ebsco.packages.collectionitem.get"]
        Retrieve package titles (resource):     "methods": ["GET"], "pathPattern": "/eholdings/titlespackages/{packageId}/resources", "permissionsRequired": ["kb-ebsco.titlespackage-resources.collection.get"]

Integration tests

There are no integration tests for the ExportManager module written yet. We will add the folder 'mod-data-export-spring' and create test cases to cover eHoldings export feature

Jira Legacy
serverSystem JiraJIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyFAT-1659

...

  • Generated CSV file is getting stored in a temporary folder(java.io.tempdir) on the file system. The available volume of the temporary folder depends on what the Java Virtual Machine is configured. When this folder gets overflowed, the job stops and gets FAILED execution status, and the description shows the exact error. Now the system doesn't provide ways to control the volume of the temporary folder, however, we can easily add such an ability, when we need it.
  • Mod-data-export-worker persists job execution parameters in the underlying database (table BATCH_JOB_EXECUTION_PARAMS). Developers from other teams increased the size of the column to store request parameters (10000 symbols for now). We should be careful with passing a lot of request parameters from UI (export fields and other request parameters).
  • Storing files on Amazon Cloud will take some costs. We will set up a cleaner that will purge deprecated files, so this will help us to keep the storage in a good condition. Now parameters for the frequency of cleaning and time to keep files are hardcoded. In case we need to control this and set parameters on application startup, we can easily add such an ability.
  • There are no performance tests done and documented yet. We can do and document such tests, at least just to know how much time will take the export some fixed number of records.


...

 Questions to the story:

  1. Q(from Igor): Should a user be automatically directed to the Export Manager after pressing the 'Export' button? A(from Khalilah): No, the user just starts export, a modal panel disappears, a green toast message appears, and then nothing happens on UI;
  2. Q(from Igor): Should the list of package&title fields be configured in Settings? Or it always will be hardcoded? - the question is still open, will discuss it ;
  3. Q(from Igor): Does it make sense to add a time to the generated file name (to make it unique) - )? A(from Khalilah): yes, it does;
  4. Q (from Khalilah): Does this approach exporting selected packages or titles only?   A(from Igor): This approach will work for both selected and non-selected packages/titles
  5. Q (from Khalilah): We need to create large data load functional testing stories. We need to determine if export must have record size/row size/file size limit. And if export has any system/hosting requirements.