Spike: MODKBEKBJ-260 - Ability to export package and title+package details

MODKBEKBJ-260 - Getting issue details... STATUS

Introduction

This page is created to describe the ability to export library's holdings details from eHoldings application. There will be package export and title export available. In addition to exporting an entire package/title it will be possible to export only selected fields.

User starts export of a holdings detail - package or title, presses 'Actions' button, then 'Export package/title (CSV)'.  The result of export is generated CSV file, that is available for librarians to download. To get a deeper understanding see UI mockups.

Package detail export

When a user clicks on 'Export package CSV' the modal window is getting shown. The user can select all or multiple Package fields, Title fields, and additional fields to export. Then a user presses the 'Export' button, the modal panel disappears, and an export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins)

 


The name of the generated file depends on what is going to be exported from the packages:

  • Package details only export (when a user chooses fields in the Package dropdown ONLY, staying on the Package export page) -  <<YYYY_MM_DD_hh_mm_ss>>_<<Package name>>_packagedetails.csv,  for example, 2022_04_11_09_10_45_WileyOnlineLibrary_packagedetails.csv
  • Title-package details export (when a user ALSO chooses fields in the Titles dropdown staying on the Package export page) - <<YYYY_MM_DD_hh_mm_ss>>_<<Title name>>_packagetitles.csv, for example 2022_04_11_15_02_03_WileyOnlineLibrary_packagetitles.csv

So, the Package detail export produces 1 csv file only when package details are exported, and 2 csv files when Package details + Title-package details are exported

Title Package export

Works similar to Package export. When a user presses the 'Export' button, a modal panel disappears, and the export process is getting started. The green toast message is displayed, it shows the name of generating file, and the approximate duration of export (30 mins)

 The name of a generated file is always  <<YYYY_MM_DD_hh_mm_ss>>_<<Title name>>_titledetails.csv, for example 2022_04_11_23_15_31_WileyOnlineLibrary_titledetails.csv

Additional details

    - What titles to include in export? If user is on the package detail record and wants to export title information THEN include the titles based on the titles returned by the Titles accordion search. For example, if the user is on a package record and conducts a title search within that returns 100 titles versus the 1000 titles in the package then the export should only include the 100 titles returned in the search. 

    - Requirements to generated files:

  • Multiple records separator: pipe 
  • The extension is: csv
  • Delimiter for an array of values: comma



Solution

The ExportManager application can satisfy the given requirements. It provides functionality to process batches of data in a flexible and configured manner. There are sources we can reuse to retrieve various objects, generate CSV files, upload files into vendor-specific storage, and share access to the stored files. This application can manage 'immediate' export jobs and 'scheduled' export jobs, that have to be configured before the run. Export Manager consists of backend modules mod-data-export-spring, mod-data-export-worker, and UI module ui-export-manager.


Learning: https://www.toptal.com/spring/spring-batch-tutorial

https://docs.spring.io/spring-batch/docs/current/reference/html/index.html


ui-export-manager shows a list of jobs, job status, job type, and other information. Here users can see the result of job execution, and download files. This module uses REST API of the mod-data-export-spring to retrieve jobs.

This module should be able to display and filter new eHoldings jobs, that we will use to export packages & titles.
What should be done in this module:
      - add a new job type - 'eHoldings';


mod-data-export-spring is designed to manage, configure, and run jobs. This module is an entry point to start data export, it calls mod-data-export-worker to execute jobs sending events to the Kafka topic.
What should be done in this module:
      - add new export type - 'eHoldings';
      - add new request parameters (to ExportTypeSpecificParameters.json), needed to pass export fields, search params for titles search, and other params;
      - add new JobCommandBuilder, needed to take request parameters to pass in Kafka event;

MODEXPS-94 - Getting issue details... STATUS


mod-data-export-worker is intended to receive events from mod-data-export-spring, and execute its jobs. The module is built based on Spring Batch Framework, and jobs are configured by a set of steps. The execution of a job happens in 3-stages: retrieve data, process data, and write data to the temporary file. Uploading files to some vendor-specific storage (AWS S3 bucket) is preconfigured already by the listener and happens when the file is completely written.


What should be done in this module:
      - create a Reader extending base functionality (CsvItemReader.java, see CirculationLogCsvItemReader.java as an example). The reader should retrieve packages/titles using REST clients, taking search parameters from the incoming Kafka event (from job parameters);
      - create a Processor (implementing ItemProcessor). The processor has to take only selected fields for export from the incoming packages/titles. The list of fields for export comes from job parameters;
      - create a Writer extending base functionality (we can just use CsvWriter.java if nothing special is needed);
      - create a Configuration to build a job and set Reader, Writer, and Processor (see CirculationLogJobConfig.java);


MODEXPW-107 - Getting issue details... STATUS
MODEXPW-122 - Getting issue details... STATUS
MODEXPW-108 - Getting issue details... STATUS
MODEXPW-111 - Getting issue details... STATUS
MODEXPW-112 - Getting issue details... STATUS
MODEXPW-113 - Getting issue details... STATUS


ui-eholdings will be able to create and send the export jobs from UI
      What should be done in this module:
     -
when a user clicks on the 'Actions' button, the modal window appears and shows multi-select controls filled with title & package fields.
        NOTE: it will be useful to have one task to display only eHoldings fields, and another task to display non-eHoldings fields also;

     - when a user clicks on the 'Export' button, UI creates a Job sending a request to mod-data-export-spring:
        "methods": [ "POST" ], "pathPattern": "/data-export-spring/jobs", "permissionsRequired": [ "data-export.job.item.post" ], "modulePermissions": []
        Request payload follows this schema: https://github.com/folio-org/folio-export-common/blob/master/schemas/job.json
        "type" equals 'E_HOLDINGS, and "exportTypeSpecificParameters" see here https://github.com/folio-org/folio-export-common/blob/master/schemas/eholdings/eHoldingsExportConfig.json

      - when the export has started, a modal window disappears and the green toast message appears, it shows the name of generating file, and an approximate duration of export.
         We will test the export in various numbers of records to determine what's the approximate duration equal to.

       - when a job is completed it contains the link to download the csv file 


mod-kb-ebsco-java stores packages & titles, and provides REST API to retrieve these objects. REST methods are already provided for mod-data-export-worker:
        Retrieve package details: "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}", "permissionsRequired": ["kb-ebsco.packages.item.get"]
        Retrieve package titles (resource): "methods": ["GET"], "pathPattern": "/eholdings/packages/{packageId}/resources", "permissionsRequired": ["kb-ebsco.package-resources.collection.get"]

Integration tests

There are no integration tests for the ExportManager module written yet. We will add the folder 'mod-data-export-spring' and create test cases to cover eHoldings export feature

FAT-1659 - Getting issue details... STATUS



Application reliability and performance

  • Generated CSV file is getting stored in a temporary folder(java.io.tempdir) on the file system. The available volume of the temporary folder depends on what the Java Virtual Machine is configured. When this folder gets overflowed, the job stops and gets FAILED execution status, and the description shows the exact error. Now the system doesn't provide ways to control the volume of the temporary folder, however, we can easily add such an ability, when we need it.
  • Mod-data-export-worker persists job execution parameters in the underlying database (table BATCH_JOB_EXECUTION_PARAMS). Developers from other teams increased the size of the column to store request parameters (10000 symbols for now). We should be careful with passing a lot of request parameters from UI (export fields and other request parameters).
  • Storing files on Amazon Cloud will take some costs. We will set up a cleaner that will purge deprecated files, so this will help us to keep the storage in a good condition. Now parameters for the frequency of cleaning and time to keep files are hardcoded. In case we need to control this and set parameters on application startup, we can easily add such an ability.
  • There are no performance tests done and documented yet. We can do and document such tests, at least just to know how much time will take the export some fixed number of records.



 Questions to the story:

  1. Q(from Igor): Should a user be automatically directed to the Export Manager after pressing the 'Export' button? A(from Khalilah): No, the user just starts export, a modal panel disappears, a green toast message appears, and then nothing happens on UI;
  2. Q(from Igor): Should the list of package&title fields be configured in Settings? Or it always will be hardcoded? the question is still open, will discuss it ;
  3. Q(from Igor): Does it make sense to add a time to the generated file name (to make it unique)? A(from Khalilah): yes, it does;
  4. Q (from Khalilah): Does this approach exporting selected packages or titles only? A(from Igor): This approach will work for both selected and non-selected packages/titles
  5. Q (from Khalilah): We need to create large data load functional testing stories. We need to determine if export must have record size/row size/file size limit. And if export has any system/hosting requirements