Page Properties

Status

colour	Green
title	DONE

Stakeholders

Vasily Gancharov VBar Taras Spashchenko Mikhail Fokanov

Outcome

All the new Data export features should be implemented by using this approach

Created date

13 Jan 2021

Owner

Vasily Gancharov

Note

title	NOTICE

This decision has been migrated to the Technical Council's Decision Log as part of a consolidation effort. See: DR-000008 - Data export by using Spring Batch (aka Export Manager)

Terms

Data export - a general solution that should be applied for all new features intended to export data from Folio modules to the destination (file, database, etc.).

A potential application of this solution isn't limited only by SRS/Inventory Data export feature (ui-data-export and mod-data-export modules). It should be a system-wide solution leveraged for all data export business cases.

Business goals

The new Data export approach was designed for the following features:

Export orders in bulk via delimited file format: https://issuesfolio-org.folioatlassian.orgnet/browse/UXPROD-2318
Build a report with rollover errors and store in CSV file: https://issuesfolio-org.folioatlassian.orgnet/browse/MODFISTO-173
Circulation log export to CSV: https://issuesfolio-org.folioatlassian.orgnet/browse/UXPROD-2691
Cornell Library's go-live requirements to transfer fees/fines to the Cornell bursar system: https://issuesfolio-org.folioatlassian.orgnet/browse/UXPROD-2862
The ability to import/export fund updates via CSV file in order to bulk edit funds: https://issuesfolio-org.folioatlassian.orgnet/browse/UXPROD-199

There is no intent to replace any existing data export solution for now. If later there will be a requirement to significantly extend any existing Data export solution, the new approach should be applied.

Architecture design

To eliminate limitations of the existing mod-data-export module (see Data export by using mod-data-export) and speed up development 2 new modules should be implemented based on the Spring way approach (see Pic. 1):

...

Good separation of concerns. It has concepts of:
- Jobs
- Steps
- Tasks
- Task partitions
- Readers
- Writers
- etc.
It makes it easy to implement Batch export workers and extend them later.
Chunk based processing
Spring batch is designed a way to process batch jobs chunk by chunk.
Each chunk is extracted from a data source, transformed (if required), and loaded to a destination file/storage.
Parallel Job execution by using multiple threads.
A Task could be split into smaller parts by Partitioner, then each part could be processed by a separate thread.
Ability to save job state in a database and continue execution from the place at which the job was interrupted.
It also provides a lot of different hooks and custom steps, which also makes it easy to create a new job or extend an existing one.

NOTE: Mod-data-export-worker can be used with PostgreSQL database for spring batch. According investigation in scope of MODEXPW-215 using PostgreSQL with Spring Batch requires:

Providing READ_COMMITTED isolation level for spring batch config for job repository .
Using unique set of job parameters for job launching as spring batch does not allow to run several instances of job with same job's parameters.

Kafka

The following topics should be created in Kafka:

dataExportJobCommandsTopic<tenant>.data-export.job.command
The topic to send start job commands from mod-data-export-spring to mod-data-export-workers. Each worker gets messages from a dedicated topic partition(s).
It helps to make sure that the same Job isn't executed by multiple workers. The worker sends message processing acknowledgment only after Job is finished.
If the worker fails in the middle of the job execution, it can retrieve uncommitted messages from Kafka again and re-process the Job.
dataExportJobExecutionUpdatesTopic<tenant>.data-export.job.update
The topic to send Job status updates from mod-data-export-workers to mod-data-export-spring. The later updates Job status in PostgreSQL database according to received messages.

...

Version	Old Version 49	New Version Current
Changes made by	Aleksei Prokhorov	Radhakrishnan Gopalakrishnan
Saved on	Apr 27, 2021	Jan 11, 2023

Versions Compared

Key

Terms

Business goals

Architecture design

Kafka

Content Comparison

Versions Compared

Key

Terms

Business goals

Architecture design

Kafka