Implement Async Records Mapping Mechanism for MARC Migration (Process Chunks)
Description
Environment
None
Potential Workaround
None
Attachments
8
defines
has to be done after
has to be done before
Checklist
hideTestRail: Results
Activity
Show:

Viacheslav KolesnykMarch 29, 2024 at 6:04 PM
Tested on a local environment. Process described on https://github.com/folio-org/mod-marc-migrations/pull/30/ . Further testing planned in a scope of
Done
Details
Assignee
Viacheslav KolesnykViacheslav KolesnykReporter
Pavlo SmahinPavlo SmahinLabels
Priority
P3Story Points
8Sprint
NoneDevelopment Team
SpitfireFix versions
Release
Ramsons (R2 2024)TestRail: Cases
Open TestRail: CasesTestRail: Runs
Open TestRail: Runs
Details
Details
Assignee

Reporter
Labels
Priority
Story Points
8
Sprint
None
Development Team
Spitfire
Fix versions
Release
Ramsons (R2 2024)
TestRail: Cases
Open TestRail: Cases
TestRail: Runs
Open TestRail: Runs
Created December 12, 2023 at 11:11 AM
Updated July 5, 2024 at 12:57 PM
Resolved March 29, 2024 at 6:04 PM
Overview:
Prior to commencing the asynchronous data mapping for segmented chunks as part of the MARC migration, this story focuses on setting up the groundwork necessary for data mapping. It involves creating a table to manage chunk steps and preparing the infrastructure for parallel processing of chunks to facilitate efficient data mapping.
Requirements/Scope:
Establish a database table specifically for managing chunk steps during the data mapping process.
Initiate data mapping processing once the operation status changes to "DATA_MAPPING."
For each chunk, perform data mapping in parallel, preparing and storing necessary files while handling any potential errors.
Approach:
Create Table for Chunk Step Management:
Develop a database table structure named "chunk_step" to monitor and manage the progress of chunk-based data mapping.
Trigger Parallel Chunk Processing on Status Change:
Start the data mapping process for chunks in parallel once the status transitions to "DATA_MAPPING" for the registered MARC migration operation.
Process Individual Chunks:
Prepare local files for the authority entities, invalid MARC records, and error records for each chunk.
Stream records from the database and perform mapping operations for each record.
Save successfully mapped records to the entity file, store problematic records in the invalid records file, and log error causes in the error records file.
Store Files in Persistent Storage and Update Tables:
Persist the generated files (authority entities, invalid records, error records) in the designated persistent storage.
Register the file names and associated details (chunk ID, file paths) in both the chunk and chunk_step tables for tracking purposes.
Update status of operation to DATA_MAPPING_COMPLETED or DATA_MAPPING_FAILED
Additional info:
POC for defining chunks and mapping: https://github.com/folio-org/mod-entities-links/commits/MODELINKS-84-long-term-poc/
S3-like client lib: https://github.com/folio-org/folio-s3-client
Acceptance Criteria:
Covered by Unit tests