Metadata Record Export (UXPROD-652)

[UXPROD-3444] Split large exports into smaller files Created: 09/Dec/21  Updated: 10/Aug/23

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: Umbrellaleaf (R2 2025)
Parent: Metadata Record Export

Type: New Feature Priority: P3
Reporter: Magda Zacharska Assignee: Magda Zacharska
Resolution: Unresolved Votes: 0
Labels: LC6, SolutionArchitecture, firebird-po-share, loc, requires-discussion
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Cloners
clones UXPROD-3443 Purging data export logs Draft
Defines
is defined by MDEXP-235 Implement the idea to export records ... Closed
is defined by MDEXP-312 Limit the records on Export by CQL Closed
is defined by MDEXP-497 Spike: How to start multiple job in p... Closed
is defined by MDEXP-498 Spike - Investigate placement of the ... Closed
is defined by MDEXP-634 Slicer component: configuration support Closed
Potential Workaround: User trigger export with smaller batches of identifiers
Epic Link: Metadata Record Export
Front End Estimate: XXL < 30 days
Front End Estimator: Magda Zacharska
Front-End Confidence factor: 20%
Back End Estimate: XXL < 30 days
Back End Estimator: Magda Zacharska
Back-End Confidence factor: 20%
Development Team: Firebird
PO Rank: 0
Rank: Cornell (Full Sum 2021): R3
Solution Architect: Taras Spashchenko

 Description   

Current situation or problem:
The number of records that can be successfully exported depends on the type of mappings and how the export has been triggered.  Exporting a couple of millions with the default mapping profile normally completes successfully but when the same list of records is exported with additional data from holdings or items records defined in a custom profile, the performance plummets. Also, for exports that are triggered by a CQL query the performance further deteriorates.   In the Kiwi release the recommended max number of records is 300K but this limit is not enforced. During the Data Export Working Group meeting on Dec 9, 2021 it was proposed to let the system automatically splits the requested records into batches that are exported in the supported number of records.

In scope

  • Add a tenant level configuration value that would set the limit for data export, the default value will be 100K.
  • During the export, the system will split the files that are generated so that each file contains maximum of 100K records.
  • User can trigger export by submitting list of identifiers or cql and system will automatically split larger exports.
  • All files generated during the export will be stored in one directory and will be compressed.
  • UI will display one link to the directory containing the files.
  • If errors occur, they will be listed in one error log associated with the given export job.

Questions

  1. Should the number of records per file be configurable? If yes should it be configurable on the tenant level (it will affect all exports) or on the job profile level (it will affect all exports of the same type) - Initial implementation will support configuration on the tenant level.


 Comments   
Comment by Magda Zacharska [ 23/May/23 ]

Adding LC2 label based on prioritization during the LCAP MM meeting on May 17, 2023

Comment by Magda Zacharska [ 01/Aug/23 ]

Relevant documentation: https://folio-org.atlassian.net/wiki/display/FOLIJET/Data+Export+redesign

Generated at Fri Feb 09 00:32:03 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.