Metadata Record Export (UXPROD-652)

[UXPROD-2330] Scheduling export of inventory records through an API Created: 18/Mar/20  Updated: 07/Aug/23

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: Trillium (R1 2025)
Parent: Metadata Record Export

Type: New Feature Priority: P2
Reporter: Magda Zacharska Assignee: Magda Zacharska
Resolution: Unresolved Votes: 0
Labels: LC5, export, inventory, loc, metadatamanagement, po-mvp, requires-discussion, round_iv
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: Text File AutomatingDataExport.txt     Text File AutomatingDataExport2.txt    
Issue links:
Defines
is defined by MDEXP-323 Implement config API to store usernam... Open
is defined by MDEXP-324 Implement CRUD APIs to handle schedul... Open
is defined by MDEXP-320 Create Schemas for scheduling exports Open
is defined by MDEXP-597 Verify workaround for scheduling jobs Closed
is defined by MDEXP-321 Edit job Profile to add FTP details Draft
is defined by MDEXP-322 Implement Scheduled Exports Draft
is defined by MDEXP-325 Create an API to push to FTP Draft
Gantt End to Start
has to be done before UXPROD-2718 Data export - scheduling the job thro... Draft
has to be done after UXPROD-2333 Export Manager - triggering export of... Closed
Relates
relates to UXPROD-2065 Export Manager - data export landing ... Closed
Potential Workaround: *Start the export:*
1. Post file definition to /data-export/fileDefinition (this is an equivalent of uploading UUIDs or CQL)
2. Start the export by /data-export/export
*Retrieve the files generated by the export:*
1. Get completed jobExecutionId and fileId : /data-export/jobExecution - by querying by completed time stamp
2. Download the file by /data-export/jobExecutions/{jobExecutionId}/download/{fileId}
Release: Not Scheduled
Epic Link: Metadata Record Export
Front End Estimate: XXL < 30 days
Front End Estimator: Magda Zacharska
Front-End Confidence factor: Low
Back End Estimate: XXXL: 30-45 days
Back End Estimator: Magda Zacharska
Back-End Confidence factor: 20%
Development Team: Firebird
PO Rank: 90
PO Ranking Note: Needed for automated exports
Rank: Chicago (MVP Sum 2020): R1
Rank: Cornell (Full Sum 2021): R4
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: GBV (MVP Sum 2020): R1
Rank: Grand Valley (Full Sum 2021): R2
Rank: hbz (TBD): R1
Rank: Lehigh (MVP Summer 2020): R1
Rank: MO State (MVP June 2020): R1
Rank: TAMU (MVP Jan 2021): R1
Rank: U of AL (MVP Oct 2020): R3

 Description   

In the existing implementation the data export can only be triggered manually. For the exports that reoccur on regular basis (like incremental export of all records that were added or modified since the last export), the application will need to provide the API so that the export could be triggered by the external custom export script .

This feature covers backend work that would support a scenario when the library has an export job that needs to run on the regular basis on the data identified in a consistent way. Such jobs are mostly run when the exported data is need for integration with the external services and the file generated by the export might need to be FTP-ed to a specific location.

The user should be able to:

  1. schedule when the job will need to run (quarterly, monthly, weekly, daily, at a specificied time)
  2. determine if this re-occurring export job
  3. the files generated by export are stored in the standard location
  4. the job will be associated with a mapping profile that will determine required data manipulation.
  5. identify the data that will be exported by CQL query that can take system parameters (like date of the last execution) for example or by providing list of UUIDs if static data needs to be exported.

Additional information:
Updated workaround: https://issues.folio.org/secure/attachment/61166/AutomatingDataExport.txt



 Comments   
Comment by Jenn Colt [ 05/Oct/20 ]

Hi- This mentions being able to submit a cql query via this api. Can we have a parameter that allows us to send a list of UUIDs instead? The use case would be if we do a search in LDP first and then want to bring that result list to the export app.

Comment by Magda Zacharska [ 05/Oct/20 ]

Jenn Colt - this feature covers the functionality for triggering the exports automatically so I'm not sure how you would provide the list of UUIDs from LDP. Does LDP have API that would provide the list of UUIDs that fulfill the search criteria?

Comment by Jenn Colt [ 05/Oct/20 ]

Hi- Yes I would expect there are different ways I might come up with a list of UUIDs that I would want to schedule the export for. I just wanted to make sure this won't be limited to cql.

Comment by Nick Cappadona [ 05/Oct/20 ]

Magda Zacharska I think there's still some ambiguity in this feature description. Currently, it is described as follows:

the application will need to provide API so that the export could be triggered by the external custom script export

Technically this endpoint already exists with data-export/export.

Perhaps we can we be more explicit and say that this will provide an API endpoint to schedule and automate future data export job executions, along with details on what types of scheduling will be supported.

I also think the mention of providing CQL in this description is confusing, because while related it's more a single example of the type of job that could be scheduled, plus the CQL feature is already covered separately via UXPROD-2333 Closed .

P.S. Jenn Colt I believe specifying the records to export via a file containing a list of UUIDs is already supported via data-export/file-definitions.

Comment by Magda Zacharska [ 06/Oct/20 ]

Nick Cappadona - the scheduling API is not in the scope of this feature as it assumes that the user will trigger their scripts through batch jobs and use existing schedulers. After reviewing the functionality with dev team - the proposed functionality is already in place. The high level flow would be as follows:

Start the export:
1. Post file definition to /data-export/fileDefinition (this is an equivalent of uploading UUIDs or CQL)
2. Start the export by /data-export/export

Retrieve the files generated by the export:
1. Get completed jobExecutionId and fileId : /data-export/jobExecution - by querying by completed time stamp
2. Download the file by /data-export/jobExecutions/{jobExecutionId}/download/{fileId}

Comment by Nick Cappadona [ 07/Oct/20 ]

Hi Magda Zacharska. Thanks for clarifying. I was under the impression that this feature was exactly about implementing the scheduling API for data-export, so this comes as a bit of a surprise to me, especially considering that the existing API endpoints we identify above have been in place since v1.0.0 of mod-data-export, released in 2020-03-13, prior to the creation of this issue.

API Docs for v1.0
Data export API
Data export File Definition API

It is very possible that I am alone in this misconception, but since every institution has this feature ranked at the highest priority, can we add this to our agenda for tomorrow's call to ensure everyone is on the same page?

Comment by Magda Zacharska [ 07/Oct/20 ]

Nick Cappadona the APIs are available since v1.0 but their functionality has definitely increased since the first version so things that are possible after Goldenrod and Honeysuckle release were not possible in Fameflower. For example, you could only exporti inventory instances records from SRS in v.1. Now you can associate exports with different mapping profiles that would get the data you need for the export.

I will put this on the agenda for the meeting on 10/08/2020, so we can discuss it further if needed.

Comment by Magda Zacharska [ 07/Oct/20 ]

Attaching Monica Arnold's notes related to automating data export.

Comment by Monica Arnold [ 22/Oct/20 ]

AutomatingDataExport2.txt Attached is a slightly modified version of the steps needed for automating data export. Note that this document is specific to Goldenrod. Some API endpoints are changing in Honeysuckle.

Comment by Kruthi Vuppala [ 27/Oct/20 ]

Under the scope of this feature,

  • Should there be ability to view the scheduled jobs?
  • Ability to cancel/delete the scheduled job overall
  • Ability to skip a single cycle- say on a monthly job just skip for a single month?
Comment by Magda Zacharska [ 27/Oct/20 ]

Kruthi Vuppala since this is backend only feature, then the scheduled jobs will be part of data-export/job-execution, I assume.
There is a feature for cancelling, pausing and resuming jobs ( UXPROD-2328 Draft ) that would cover cancelling and pausing (skipping) the job. Deleting the jobs will need to be handled by purging the running jobs for now.

Comment by Magda Zacharska [ 23/May/23 ]

Adding LC2 label based on prioritization during the LCAP MM meeting on May 17, 2023. The priority may change depending on the ease of workaround implementation.

Generated at Fri Feb 09 00:23:03 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.