Table of Contents | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Introduction
MARC Migration involves the process of updating and transferring MARC records and related FOLIO records within the FOLIO system, particularly when mapping rules change. This documentation is intended for developers, system administrators, and library personnel involved in the management and migration of MARC records. It provides comprehensive guidelines on utilizing the MARC Migration API, handling known limitations, optimizing performance, and troubleshooting common issues.
...
Registering a Migration Operation:
Code Block POST /marc-migrations Content-Type: application/json X-Okapi-Token: <your_access_token> { "entityType": "authority", "operationType": "remapping" }
This call registers a new MARC Migration operation. The
operationId
needed for subsequent calls will be provided in the response of this request. TheentityType
can be either"authority"
or"instance"
depending on the records being migrated.Tracking Migration Operation:
Code Block GET /marc-migrations/{operationId} X-Okapi-Token: <your_access_token>
Replace
{operationId}
with the ID received from the response of the POST call. This endpoint allows you to track the progress and status of an ongoing MARC Migration operation. Possible statuses include:"new"
: The operation has been initialized but not yet started."data_mapping"
: The operation is currently mapping data."data_mapping_completed"
: Data mapping is complete."data_mapping_failed"
: Data mapping has failed."data_saving"
: Data is currently being saved."data_saving_completed"
: Data saving is complete."data_saving_failed"
: Data saving has failed.
Initiating Data Saving Phase:
Code Block PUT /marc-migrations/{operationId} Content-Type: application/json X-Okapi-Token: <your_access_token> { "status": "data_saving", "publishEvents": false }
Use this call to initiate the data saving phase for a MARC Migration operation once the data mapping phase is complete. Ensure to replace
{operationId}
with the valid ID from your registered operation.publishEvents
field defines if domain events should be published. On big datasets, it is suggested not to publish domain events but to use re-index to index changes introduced during migration, as it will be more performant.
Known Limitations
The MARC Migration system currently has several limitations that users should be aware of when planning and executing migrations:
...
The most efficient test utilized a CHUNK_FETCH_IDS_COUNT
of 12,000 and a RECORDS_CHUNK_SIZE
of 4,000, reducing the total migration duration to about 4 hours (3 hours 35 minutes for data mapping and 27 minutes for data saving).
Recommendations
Increase CPU Resource Allocation : For services mod-entities-links and mod-marc-migrations, either increase the default CPU allocation or set it to 0 to handle additional loadand Usage: This module is a utility designed for administrators to perform remapping tasks; it is not a standard module meant for continuous operation. Its primary function is to execute remapping during updates or upon request. Therefore, there is no need to limit its resource consumption. Instead, it is recommended to allocate the maximum amount of resources that the module can effectively utilize. Once the remapping process is complete and the module is no longer needed, it can be safely turned off.
Additional File Space: The path to the folder where files will be stored is configured through an environment variable
LOCAL_FILE_STORAGE_PATH
. Administrators should specify the path to file resources where there is sufficient free space.Optimal Chunk Sizes: Use
CHUNK_FETCH_IDS_COUNT=12000
andRECORDS_CHUNK_SIZE=4000
to decrease migration time. Note that this configuration may cause mod-entities-links to use an additional 25% CPU.Container Configuration: Use only one container forPerformance Optimization and Dependencies: Remapping operations are parallelized within a single instance of the module. By removing CPU limitations and allocating 8 GB of RAM, you can significantly enhance its performance. Since the module writes data through direct calls to
mod-inventory-storage
, it's important to increase the number ofmod-inventory-storage
andmod-entities-links
instances to prevent any bottlenecks. The optimal number of module instances depends on the resources allocated tomod-marc-migrations
to optimize resource usage and should be determined through performance testing.Data Handling: While data mapping runs, files are stored directly in the working mod-marc-migrations container and later moved to an S3 bucket. If no S3 bucket is provided, data mapping will fail. If the container fails during data mapping, all files will be lost, and the mapping process will hang indefinitely.
...
Change Log
2024-10-26: Initial release of the documentation.