Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Registering a Migration Operation:

    Code Block
    POST /marc-migrations
    Content-Type: application/json
    X-Okapi-Token: <your_access_token>
    
    {
      "entityType": "authority",
      "operationType": "remapping"
    }

    This call registers a new MARC Migration operation. The operationId needed for subsequent calls will be provided in the response of this request. The entityType can be either "authority" or "instance" depending on the records being migrated.

  • Tracking Migration Operation:

    Code Block
    GET /marc-migrations/{operationId}
    X-Okapi-Token: <your_access_token>

    Replace {operationId} with the ID received from the response of the POST call. This endpoint allows you to track the progress and status of an ongoing MARC Migration operation. Possible statuses include:

    • "new": The operation has been initialized but not yet started.

    • "data_mapping": The operation is currently mapping data.

    • "data_mapping_completed": Data mapping is complete.

    • "data_mapping_failed": Data mapping has failed.

    • "data_saving": Data is currently being saved.

    • "data_saving_completed": Data saving is complete.

    • "data_saving_failed": Data saving has failed.

  • Initiating Data Saving Phase:

    Code Block
    PUT /marc-migrations/{operationId}
    Content-Type: application/json
    X-Okapi-Token: <your_access_token>
    
    {
      "status": "data_saving",
      "publishEvents": false
    }

    Use this call to initiate the data saving phase for a MARC Migration operation once the data mapping phase is complete. Ensure to replace {operationId} with the valid ID from your registered operation. publishEvents field defines if domain events should be published. On big datasets, it is suggested not to publish domain events but to use re-index to index changes introduced during migration, as it will be more performant.

Known Limitations

The MARC Migration system currently has several limitations that users should be aware of when planning and executing migrations:

...

  1. Resource Allocation and Usage: This module is a utility designed for administrators to perform remapping tasks; it is not a standard module meant for continuous operation. Its primary function is to execute remapping during updates or upon request. Therefore, there is no need to limit its resource consumption. Instead, it is recommended to allocate the maximum amount of resources that the module can effectively utilize. Once the remapping process is complete and the module is no longer needed, it can be safely turned off.

  2. Additional File Space: The path to the folder where files will be stored is configured through an environment variable LOCAL_FILE_STORAGE_PATH. Administrators should specify the path to file resources where there is sufficient free space.

  3. Optimal Chunk Sizes: Use CHUNK_FETCH_IDS_COUNT=12000 and RECORDS_CHUNK_SIZE=4000 to decrease migration time. Note that this configuration may cause mod-entities-links to use an additional 25% CPU.

  4. Performance Optimization and Dependencies: Remapping operations are parallelized within a single instance of the module. By removing CPU limitations and allocating 8 GB of RAM, you can significantly enhance its performance. Since the module writes data through direct calls to mod-inventory-storage, it's important to increase the number of mod-inventory-storage and mod-entities-links instances to prevent any bottlenecks. The optimal number of module instances depends on the resources allocated to mod-marc-migrations and should be determined through performance testing.

  5. Data Handling: While data mapping runs, files are stored directly in the working mod-marc-migrations container and later moved to an S3 bucket. If no S3 bucket is provided, data mapping will fail. If the container fails during data mapping, all files will be lost, and the mapping process will hang indefinitely.

...