Table of Contents |
---|
...
Splitting files by a separate tool will not bring the expected reliability to the Data Import process because the file upload step will still be included in the process.
The first idea is to let the Data Import app download the source file from the S3-like storage instead of consuming it as a server for uploading. Thus the Data Import initial stage will look the following:
...
This change will make the initial stage of the Data Import application more reliable and prevent a potential denial of service (DoS) attack in which a threat actor can fill up disk space. In addition, the risk of uncontrolled resource consumption in the case of multiple simultaneously running Data Import file uploads is also eliminated.
The existing approach, when the user uploads a source file directly to the Data Import app, will be preserved for backward compatibility, but the max size of files that can be processed using this approach will be significantly reducedremoved because it will make the mod-data-import module stateless and allow us to scale this module horizontally (Stateless, Horizontal scaling, and High Availability), making it HA compliant.
The second improvement is to implement large data import file slicing logic in the Data Import application as well.
...
- The max chunk file size or the max number of source records in the chunk file must be configurable.
- Records would need to be chunked and named based on the sequential order of the records in the original file, e.g. records 1-1000 in chunk file_1, records 1001-2000 in chunk file_2, etc.
Non-functional requirements
- The implementation must be decoupled from the mod-data-import main code base and simple enough to make the backporting of it to the previous releases at least twice cheaper (in terms of man/days) than the original development effort. TBD: define the list of releases for backporting.
Implementation
The solution will be implemented as a part of the mod-data-import moduleand ui-data-import modules.
High-level operation overview
...
Direct uploading (1, 2)
Uploading to Amazon S3-like storage directly from a FOLIO UI application can be implemented using the following guide https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/. The initial call to acquire theĀ uploadURL must be done by the back-end mod-data-import module.
The diagram below represents in detail the Direct upload flow.
Simultaneous launch of a large number of Data Import Jobs (9)
...