Table of Contents |
---|
Summary
...
Uploading to S3-like storage directly from a FOLIO UI application can be implemented using the following guide https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/. The initial call to acquire the uploadURL must be done by the back-end mod-data-import module.
The diagram below represents in detail the Direct upload flow.
The alternative solution could be the usage of an AWS SFTP-enabled server from the AWS Transfer Family service, but it has the following drawbacks:
- It is an extra service that should be set up and configured
- This service will require an extra security configuration (it requires a separate identity provider to manage user access to the SFTP server)
- The price for US-East1 is $0.30 per hour per endpoint (~$216 per month) + $0.04 per gigabyte (GB) transferred
Based on the above, Direct uploading is preferable to the usage of the managed SFTP Server.
Simultaneous launch of a large number of Data Import Jobs (9)
To smooth the spike of resource consumption by the mod-data-import module when starting a large number of Data Import Jobs, it is necessary to establish a queue for jobs that do not have enough resources eliminating resource deplation.
The recommended approach is to create a DB table that stores job details and use it to organize the queue. This table must be established in a dedicated schema, distinct from tenant-specific schemas, thereby enabling job data for all tenants to be stored in a centralized location. This straightforward method allows for easy retrieval of jobs for each tenant, with the next Data Import Job selected based on either priority or complexity/size. Consequently, Data Import Jobs with a smaller number of records may be given higher priority, and be processed in between chunks of larger Data Import Jobs.
Result aggregation
To streamline the aggregation of results from Data Import Jobs that process chunk files, it is essential to establish a connection with the primary Data Import job, which must be defined at the outset of the operation. This means that prior to initiating Data Import Jobs for chunk files, it is necessary to create the primary Data Import Job, to which the Data Import Jobs for chunk files will be linked. By doing so, we will be able to retrieve all the logs that pertain to the source file. It is worth noting that all the necessary data structures and relationships are already in place in the Data Import app (mod-source-record-manager).
Rough WBS
Task | Comment | Size | |
UI | |||
Select DI Job Profile | Small | ||
Get UploadURL | VS | ||
Implement file Select dialog with filtering | Medium | ||
Upload a file to S3-like | Small | ||
Invoke Start Data Import Job | Small | ||
Navigate back to a Landing page | VS | ||
Disable old file upload functionality. Do not remove | Configuration parameter to switch between implementations | Medium | |
Backend | |||
integrate folio-s3-client | Add dependency, configuration | Small | |
Implement S3 interaction for uploadURL | Medium | ||
Add end-poing to accept Start Import Job | Large | ||
Implement logic to slice the file into chunks | Large | ||
Implement logic to start chunk processing using HTTP calls | Master DI Job must be created. All chunk jobs must be linked to the master DI Jobs | Medium | |
Implement DI queue management and processing | XL | ||
Karate tests | |||
E2E tests |
Sizes | |
Very Small | < 1 day |
Small | < 3 days |
Medium | < 5 days |
Large | < 10 days |
XL | < 15 days |
XXL | < 30 days |