Metadata Record Export (UXPROD-652)

[UXPROD-4127] Improve Data export performance Created: 13/Mar/23  Updated: 05/Feb/24

Status: In Progress
Project: UX Product
Components: None
Affects versions: None
Fix versions: Quesnelia (R1 2024)
Parent: Metadata Record Export

Type: New Feature Priority: P2
Reporter: Magda Zacharska Assignee: Magda Zacharska
Resolution: Unresolved Votes: 0
Labels: LC-priority2, galileo, loc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Defines
is defined by PERF-493 Data export -Performance dependencies... Open
is defined by MDEXP-110 Return list of users who run the comp... Open
is defined by MDEXP-677 Files upload flow rework In Progress
is defined by MDEXP-641 Quick-Export: Rewrite/clean up existi... In Progress
is defined by MDEXP-678 Naming convention for files generated... Closed
is defined by MDEXP-109 Return list of the job profiles used ... Closed
is defined by MDEXP-612 SRS/Inventory - Clients implementation Closed
is defined by MDEXP-623 Migrate RAML API definitions to OpenA... Closed
is defined by MDEXP-624 Create liqubase scripts for new DB ob... Closed
is defined by MDEXP-634 Slicer component: configuration support Closed
is defined by MDEXP-642 Files Download: Rewrite/clean up exis... Closed
is defined by MDEXP-672 Reference Data Service: Rewrite/clean... Closed
is defined by MDEXP-622 Create a project skeleton in a new br... Closed
is defined by MDEXP-626 /data-export/export: Instances Strategy Closed
is defined by MDEXP-627 /data-export/export: Authority Strategy Closed
is defined by MDEXP-628 /data-export/export: Holdings Strategy Closed
is defined by MDEXP-629 /data-export/file-definitions/{id}/up... Closed
is defined by MDEXP-630 /data-export/export: Single File Proc... Closed
is defined by MDEXP-631 Logs API, Transformation fields API: ... Closed
is defined by MDEXP-632 Spike: Cross-schemas Views Creation Closed
is defined by MDEXP-640 Data-Export: Rewrite/clean up existin... Closed
is defined by MDEXP-621 Export all endpoint In QA
Gantt End to Start
has to be done after UXPROD-4110 Investigate Data export performance i... Closed
Relates
relates to MDEXP-611 Spike: HTTP vs View mechanism for SRS Closed
Epic Link: Metadata Record Export
Front End Estimate: Medium < 5 days
Front End Estimator: Magda Zacharska
Front-End Confidence factor: 20%
Back End Estimate: XXXL: 30-45 days
Back End Estimator: Viachaslau Khandramai (Inactive)
Back-End Confidence factor: 80%
Development Team: Firebird
PO Rank: 0
Rank: Cornell (Full Sum 2021): R1

 Description   

Current situation or problem:
The existing implementation of users can export up to 2 M records with the default mapping profile but significantly less when exporting with a custom mapping profile that includes data coming from holdings and item records. The performance deteriorates further when triggered by a CQL query and recommended number of records is 300K. However, the limits not enforced programmatically and are causing additional work for librarians who must manually create a files with the specified number of records UUIDs to trigger the export.

In scope

  • Performance improvements for exporting:
    • instances and SRS records
    • holdings
    • authority records
  • Export of 22M SRS records in 24 hours without significantly impacting the performance of other FOLIO modules (cataloging, check in, check out, data import)
  • Support multiple concurrent exports:
    • daily
    • monthly
    • annually
  • Large exports are split into smaller files:
    • Number of records per file can be configured on the tenant level through API nice to have
    • The number of records in the file is consistent throughout all exports
  • Compress all files generated in one export job into one directory and provide the link to it similarly as it is done currently for single files.
  • Performance confirmed by PTF tests meets expectations.

Out of scope
*Support an easy way for triggering the export of all instances/SRS records without the need to first list UUIDs of matching records.



 Comments   
Comment by Khalilah Gambrell [ 14/May/23 ]

Magda Zacharska - is this PTF only work?

Comment by Magda Zacharska [ 15/May/23 ]

Khalilah Gambrell - this feature will require some functional changes to support full export and performance improvements so that the larger than 1M records can be supported at once. The stories will be added after MDEXP-594 Closed is completed. This will be a major re-work for data export.

Comment by Viachaslau Khandramai (Inactive) [ 28/Jun/23 ]

Re-estimated to XXXL taking into account refactoring https://folio-org.atlassian.net/wiki/display/FOLIJET/Data+Export+redesign + Data export performance improvement https://folio-org.atlassian.net/browse/MDEXP-612

Comment by Khalilah Gambrell [ 10/Jul/23 ]

Hey Magda Zacharska. Will this feature be done for "Q" or Poppy release?

Comment by Taras Spashchenko [ 29/Jan/24 ]
  • Support multiple concurrent exports:
    • daily
    • monthly
    • annually

Magda Zacharska, it would be great if we could agree on a particular number here regarding how many concurrent export processes should be supported per tenant.

Comment by Magda Zacharska [ 31/Jan/24 ]

Taras Spashchenko please see: https://folio-org.atlassian.net/wiki/x/WX0V

Generated at Fri Feb 09 00:37:29 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.