Authority remapping (MODMARCMIG-10)

[MODMARCMIG-7] Implement Async Records Mapping Mechanism for MARC Migration (Process Chunks) Created: 12/Dec/23  Updated: 07/Feb/24

Status: In Progress
Project: mod-marc-migrations
Components: None
Affects versions: None
Fix versions: 1.0.0
Parent: Authority remapping

Type: Story Priority: P3
Reporter: Pavlo Smahin Assignee: Viacheslav Kolesnyk
Resolution: Unresolved Votes: 0
Labels: back-end, epam-spitfire
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Gantt End to Start
has to be done before MODMARCMIG-9 Implement POST Endpoint for Data-Savi... Open
has to be done after MODMARCMIG-6 Implement Async Records Mapping Mecha... Closed
Relates
relates to UXPROD-4082 Long term solution for applying mappi... In Progress
Sprint: Spitfire Sprint 184
Story Points: 8
Development Team: Spitfire
Release: Quesnelia (R1 2024)
Epic Link: Authority remapping

 Description   

Overview:

Prior to commencing the asynchronous data mapping for segmented chunks as part of the MARC migration, this story focuses on setting up the groundwork necessary for data mapping. It involves creating a table to manage chunk steps and preparing the infrastructure for parallel processing of chunks to facilitate efficient data mapping.

Requirements/Scope:

  1. Establish a database table specifically for managing chunk steps during the data mapping process.
  2. Initiate data mapping processing once the operation status changes to "DATA_MAPPING."
  3. For each chunk, perform data mapping in parallel, preparing and storing necessary files while handling any potential errors.

Approach:

  1. Create Table for Chunk Step Management:
    • Develop a database table structure named "chunk_step" to monitor and manage the progress of chunk-based data mapping.
  2. Trigger Parallel Chunk Processing on Status Change:
    • Start the data mapping process for chunks in parallel once the status transitions to "DATA_MAPPING" for the registered MARC migration operation.
  3. Process Individual Chunks:
    • Prepare local files for the authority entities, invalid MARC records, and error records for each chunk.
    • Stream records from the database and perform mapping operations for each record.
    • Save successfully mapped records to the entity file, store problematic records in the invalid records file, and log error causes in the error records file.
  4. Store Files in Persistent Storage and Update Tables:
    • Persist the generated files (authority entities, invalid records, error records) in the designated persistent storage.
    • Register the file names and associated details (chunk ID, file paths) in both the chunk and chunk_step tables for tracking purposes.
    • Update status of operation to DATA_MAPPING_COMPLETED or DATA_MAPPING_FAILED

Additional info:
POC for defining chunks and mapping: https://github.com/folio-org/mod-entities-links/commits/MODELINKS-84-long-term-poc/

S3-like client lib: https://github.com/folio-org/folio-s3-client

Acceptance Criteria:

  • Covered by Unit tests

Generated at Thu Feb 08 22:31:54 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.