...
mod-data-export-worker is intended to receive events from mod-data-export-spring,andexecute its jobs. The module is built based on Spring Batch Framework, and jobs are configured by a set of steps. The execution of a job happens in 3-stages: retrieve data, process data, and write data to the temporary file. Uploading files to some vendor-specific storage is preconfigured already (using AWS S3 bucket) by the listener and happens when all the 3 stages are completedthe file is written.
What should be done in this module:
- create a Reader extending base functionality (CsvItemReader.java, see CirculationLogCsvItemReader.java as an example). The reader should retrieve packages/titles using REST clients, taking search parameters from the incoming Kafka event (from job parameters);
- create a Processor (implementing ItemProcessor). The processor has to take only selected fields for export from the incoming packages/titles. The list of fields for export comes from job parameters;
- create a Writer extending base functionality (we can just use CsvWriter.java if nothing special is needed);
- create a Configurationto build a job and set Reader, Writer, and Processor (see CirculationLogJobConfig.java);
- configure a cleaner to purge deprecated files (that we generated more than 30 days back);
...
mod-kb-ebsco-java stores packages & titles, and provides REST API to retrieve these objects. REST methods are already provided and needed for mod-data-export-worker:
Retrieve package details: GET /eholdings/packages/{packageId}
Retrieve package titles: GET /eholdings/titles/{titleId}
...
Application Reliability and Performance
- The generated temporary file is getting stored in a temporary folder(java.io.tempdir) on the file system. The available volume of the temporary folder depends on how the Java Virtual Machine is configured. In case this folder gets overflowed, the job stops and gets FAILED execution status, and the description shows the exact error. The size/volume of a temporary folder is configurable, we can identify when the folder is overflowed and set another value of size/volume in system variables;
- mod-data-export-worker persists the Job execution parameters to the underlying database (table BATCH_JOB_EXECUTION_PARAMS). Developers from other team increased the size of column to store a request parameters (10000 symbols for now)
- There are no performance tests done and documented yet.
- AWS
...
Questions to the story:
- Should a user be automatically directed to the Export Manager after pressing the 'Export' button?
- Should the list of package&title fields be configured in Settings? Or it is hardcoded?
- Performance concerns: ToDo
...