Table of Contents

Summary

There exists significant difficulties with regards to the upgrading of a large set of MARC authority records when mapping rule changes have been applied, as well as migrating authority records from other systems to FOLIO. Updating authority records to apply mapping rules changes is a time-consuming process that takes multiple days, and using DI does not scale well when libraries are attempting to do other work with data import. Failure to update these records leads to data inconsistency. With the implementation of authority control/linking (manual) with Orchid, this issue may become even more problematic for libraries. As automated linking is introduced, this issue will become even more challenging.

The proposed solution addresses issues related to remapping and initial record migration in terms of data volumes and expected performance.

Requirements

Functional requirements

The solution should not use data import, as the core issue lies in migrating a large volume of records. Data import is not designed to handle millions of records.
Operate in the background without disrupting other FOLIO workflows, including data import.
Support UTF-8 encoding.
Support optimistic locking mechanism.
Enforce data validation rules during the creation of MARC authority records to prevent duplicates.
Apply the same data validation rules enforced during the updating of MARC authority records.
Provide a response that includes details on records that failed to create or update in SRS and Inventory.
Consider that it can be used for two types of migrations
- - migrating all authority records from one release to another (remapping)
  - migrating a library with over 500,000+ authority records to FOLIO.
For migration cases the import file format is MARC21
A straightforward solution should be available to the user to obtain and analyze all errors.
Users should have a simple solution to get/receive files that contain records with errors (The records that were not processed successfully).

Jira Legacy

server	System JiraFOLIO Issue Tracker
serverId	01505d016ccf3fe4-b8533301-3c2e368a-90f1983e-ee9b165564fc20c466b11a49
key	ARCH-38

Jira Legacy

server	System JiraFOLIO Issue Tracker
serverId	01505d016ccf3fe4-b8533301-3c2e368a-90f1983e-ee9b165564fc20c466b11a49
key	ARCH-46

Non-functional requirements

...

The solution will be implemented as a separate FOLIO module.

The main idea is that processing large volumes of data could be done in two separate steps.
The first step is to read data from the source file or database, apply new mapping rules (using the existing data-import-processing-core library), and store the resulting entities in the file.
The next step is to deliver the file with entities to the appropriate back-end module and store them in the database one by one using the existing update implementation.

Notes

In order to support significant volumes of data, the Data Access Layer should be implemented using plain JDBC; Spring Data can't be used because of high memory consumption for such volumes of data and lower performance compared to the plain JDBC approach.

API

This module will provide RESTful the following RESTful endpoints:

start import/remapping operations
check import/remapping operations status
get operation errors

...

Versions Compared

Old Version 8

New Version 9

Key

Summary

Requirements

Functional requirements

Non-functional requirements

Notes

API

Page Comparison

Versions Compared

Old Version 8

New Version 9

Key

Summary

Requirements

Functional requirements

Non-functional requirements

Notes

API