Table of Contents |
---|
...
The existing approach, when the user uploads a source file directly to the Data Import app, will be removed because it . It will make the mod-data-import module stateless and allow us to scale this module horizontally (Stateless, Horizontal scaling, and High Availability), making it HA-compliant.
The second improvement is to implement large data import file slicing logic in the Data Import application as well.
...
- The max chunk file size or the max number of source records in the chunk file must be configurable.
- Records would need to be chunked and named based on the sequential order of the records in the original file, e.g. records 1-1000 in chunk file_1, records 1001-2000 in chunk file_2, etc.
Non-functional requirements
- The implementation must be decoupled from the mod-data-import main code base and simple enough to make the backporting of it to the previous releases at least twice cheaper (in terms of man/days) than the original development effort. TBD: define the list of releases for backporting.
- The usage of the S3-like storage should not be "vendor locked" and must support different types of storage (AWS S3, MinIO)
Assumptions
- Garbage collection (removing already processed files and chunk files) is out of the scope of the feature. It can be done by configuring appropriate retention policies on S3-like storage.
Implementation
The solution will be implemented as a part of the mod-data-import and ui-data-import modules.
...
Uploading to S3-like storage directly from a FOLIO UI application can be implemented using the following guide https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/. The initial call to acquire theĀ uploadURL must be done by the back-end mod-data-import module.
The diagram below represents in detail the Direct upload flow.
Simultaneous launch of a large number of Data Import Jobs (9)
...