Implement Async Records Mapping Mechanism for MARC Migration (Process Chunks)

Description

Overview:

Prior to commencing the asynchronous data mapping for segmented chunks as part of the MARC migration, this story focuses on setting up the groundwork necessary for data mapping. It involves creating a table to manage chunk steps and preparing the infrastructure for parallel processing of chunks to facilitate efficient data mapping.

Requirements/Scope:

  1. Establish a database table specifically for managing chunk steps during the data mapping process.

  2. Initiate data mapping processing once the operation status changes to "DATA_MAPPING."

  3. For each chunk, perform data mapping in parallel, preparing and storing necessary files while handling any potential errors.

Approach:

  1. Create Table for Chunk Step Management:

    • Develop a database table structure named "chunk_step" to monitor and manage the progress of chunk-based data mapping.

  2. Trigger Parallel Chunk Processing on Status Change:

    • Start the data mapping process for chunks in parallel once the status transitions to "DATA_MAPPING" for the registered MARC migration operation.

  3. Process Individual Chunks:

    • Prepare local files for the authority entities, invalid MARC records, and error records for each chunk.

    • Stream records from the database and perform mapping operations for each record.

    • Save successfully mapped records to the entity file, store problematic records in the invalid records file, and log error causes in the error records file.

  4. Store Files in Persistent Storage and Update Tables:

    • Persist the generated files (authority entities, invalid records, error records) in the designated persistent storage.

    • Register the file names and associated details (chunk ID, file paths) in both the chunk and chunk_step tables for tracking purposes.

    • Update status of operation to DATA_MAPPING_COMPLETED or DATA_MAPPING_FAILED

Additional info:
POC for defining chunks and mapping: https://github.com/folio-org/mod-entities-links/commits/MODELINKS-84-long-term-poc/

S3-like client lib: https://github.com/folio-org/folio-s3-client

Acceptance Criteria:

  • Covered by Unit tests

Environment

None

Potential Workaround

None

Attachments

8

Checklist

hide

TestRail: Results

Activity

Show:

Viacheslav KolesnykMarch 29, 2024 at 6:04 PM

Tested on a local environment. Process described on https://github.com/folio-org/mod-marc-migrations/pull/30/ . Further testing planned in a scope of

Done

Details

Assignee

Reporter

Priority

Story Points

Sprint

Development Team

Spitfire

Fix versions

Release

Ramsons (R2 2024)

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created December 12, 2023 at 11:11 AM
Updated July 5, 2024 at 12:57 PM
Resolved March 29, 2024 at 6:04 PM
Loading...