Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
maxLevel2
minLevel1
maxLevel2
include
outlinefalse
indent
exclude
styledefault
excludetypelist
classprintabletrue
class

Introduction

MARC Migration involves the process of updating and transferring MARC records and related FOLIO records within the FOLIO system, particularly when mapping rules change. This documentation is intended for developers, system administrators, and library personnel involved in the management and migration of MARC records. It provides comprehensive guidelines on utilizing the MARC Migration API, handling known limitations, optimizing performance, and troubleshooting common issues.

...

  • Registering a Migration Operation:

    Code Block
    POST /marc-migrations
    Content-Type: application/json
    X-Okapi-Token: <your_access_token>
    
    {
      "entityType": "authority",
      "operationType": "remapping"
    }

    This call registers a new MARC Migration operation. The operationId needed for subsequent calls will be provided in the response of this request. The entityType can be either "authority" or "instance" depending on the records being migrated.

  • Tracking Migration Operation:

    Code Block
    GET /marc-migrations/{operationId}
    X-Okapi-Token: <your_access_token>

    Replace {operationId} with the ID received from the response of the POST call. This endpoint allows you to track the progress and status of an ongoing MARC Migration operation. Possible statuses include:

    • "new": The operation has been initialized but not yet started.

    • "data_mapping": The operation is currently mapping data.

    • "data_mapping_completed": Data mapping is complete.

    • "data_mapping_failed": Data mapping has failed.

    • "data_saving": Data is currently being saved.

    • "data_saving_completed": Data saving is complete.

    • "data_saving_failed": Data saving has failed.

  • Initiating Data Saving Phase:

    Code Block
    PUT /marc-migrations/{operationId}
    Content-Type: application/json
    X-Okapi-Token: <your_access_token>
    
    {
      "status": "data_saving",
      "publishEvents": false
    }

    Use this call to initiate the data saving phase for a MARC Migration operation once the data mapping phase is complete. Ensure to replace {operationId} with the valid ID from your registered operation. publishEvents field defines if domain events should be published. On big datasets, it is suggested not to publish domain events but to use re-index to index changes introduced during migration, as it will be more performant.

Known Limitations

The MARC Migration system currently has several limitations that users should be aware of when planning and executing migrations:

...

The most efficient test utilized a CHUNK_FETCH_IDS_COUNT of 12,000 and a RECORDS_CHUNK_SIZE of 4,000, reducing the total migration duration to about 4 hours (3 hours 35 minutes for data mapping and 27 minutes for data saving).

Recommendations

  1. Increase CPU Resource Allocation : For services mod-entities-links and mod-marc-migrations, either increase the default CPU allocation or set it to 0 to handle additional loadand Usage: This module is a utility designed for administrators to perform remapping tasks; it is not a standard module meant for continuous operation. Its primary function is to execute remapping during updates or upon request. Therefore, there is no need to limit its resource consumption. Instead, it is recommended to allocate the maximum amount of resources that the module can effectively utilize. Once the remapping process is complete and the module is no longer needed, it can be safely turned off.

  2. Additional File Space: The path to the folder where files will be stored is configured through an environment variable LOCAL_FILE_STORAGE_PATH. Administrators should specify the path to file resources where there is sufficient free space.

  3. Optimal Chunk Sizes: Use CHUNK_FETCH_IDS_COUNT=12000 and RECORDS_CHUNK_SIZE=4000 to decrease migration time. Note that this configuration may cause mod-entities-links to use an additional 25% CPU.Container Configuration: Use only one container for

  4. Performance Optimization and Dependencies: Remapping operations are parallelized within a single instance of the module. By removing CPU limitations and allocating 8 GB of RAM, you can significantly enhance its performance. Since the module writes data through direct calls to mod-inventory-storage, it's important to increase the number of mod-inventory-storage and mod-entities-links instances to prevent any bottlenecks. The optimal number of module instances depends on the resources allocated to mod-marc-migrations to optimize resource usage and should be determined through performance testing.

  5. Data Handling: While data mapping runs, files are stored directly in the working mod-marc-migrations container and later moved to an S3 bucket. If no S3 bucket is provided, data mapping will fail. If the container fails during data mapping, all files will be lost, and the mapping process will hang indefinitely.

...

Change Log

  • 2024-10-26: Initial release of the documentation.