MDEXP-168 Investigate maximum file size for the exported records
In the current implementation, all records are exported in one file. This page is to document the technical limitations of this approach.
- Should there be a limit set on the number of records one file can contain?
I tested the mod-data-export on generating a 20Gb binary file. I tested it on the localhost using snapshot-backend-core and even 20GB file doesn't breakdown the module.
The duration of exporting 1m srs records takes 01h 01m 40s, tested on bugfest-goldenrod.
To answer the question "to limit file size or not" we need to know how frequent users will run the export, and what data volumes they will generate. Maybe at this moment, it's enough to do not limit at all, just update CleanupFileStorageService to clean the file system much more frequently
2. If yes, how we can determine the number?
The proportion is 10 million SRS records take ~ 20Gb file, so 1 million records take ~ 2Gb file. We can use this proportion to determine the file size when need to stop writing to the file and create a new one.
This setting can be on the Settings page so the user can decide the minimum size of the one file
3. What would be the impact of having multiple files generated to make them accessible to the user and how it would impact UI?
In case when UI needs to show all the links in a particular job, this would lead to different height size of the job on UI
Alternative approach: Do not provide multiple links, just provide a link to the entire bucket containing all marc files
4. Need to create more stories:
- story to check how many jobs can run in parallel and do not break SRS/Inventory by intensive data consumption, need to get system requirement (discussed with Taras Spashchenko)
- story to implement the idea to export records in multiple files
- story to check how much memory the module needs to get generated at least 1 average-size file, need to get system requirement (discussed with Taras Spashchenko)