Notes on data migration
PAGE IN PROGRESS
Content note
This page is intended to share tips and experiences on data migration. Anyone can participate.
This inital description assumes migration from an Excel/CSV source file. There are numerous forms in which the source data can exist, e.g. in a relational database. Therefore there are also numerous ways of data transformation, e.g. via scripts which pull the data directly from a database (like ILS). However, in all cases, knowledge of the interrelationships within OA as attempted to be presented here may be necessary.
Table of Contents
Set up the target system (Reference data)
When starting with the Open Access app there are a number of planning tasks and system setup tasks that will need to be completed in advance. A detailed overview of them is available >>> here <<<.
This includes:
- Request and Publication setup
- Request numbering
- Closure reasons
- Publishers
- Licenses
- Publication identifiers
- Publication subtypes
- Charge setup
- Charge category
- Default tax rate
- Payer
- People setup
- Institution level
- Correspondence setup
- Correspondence category
Correspondence mode
Correspondence status
- Correspondence category
- Funding details setup
- Aspect funded
- Funder
- Checklist setup
- Checklist items
There ist also a wonderful description how to add pick list (aka reference data) values via the API >>> here <<<.
In some fields of the data to be imported, references to settings values have to be made in form of UUID or string value references. For this reason, all settings that play a role in the migration must be present in the target system before data import.
Important! UUIDs are very individual. Each Folio instance creates own UUIDs for settings and data. Preparing for migration on a system other than the target system results in the need to recreate the UUID references!
Data refers to other data
Some data depends on already existing data. This is true for the relationships to settings (reference data), but also to other classes of information within the OA domain. The following dependencies exist:
Class | UI field | JSON Field | Relationship / Dependency | Type of reference | Example |
---|---|---|---|---|---|
Journals (works) API: oa/works/citation | Instance / Namespace | /"instances"/"ids"/"ns" | UI: Settings/Pick list: PublicationIdentifier.Type API: /oa/refdata | string | issn |
/"instances"/"subType" | |||||
/"indexedInDOAJ"/ | |||||
/"oaStatus"/ | |||||
People (party) API: oa/party | Institution level 1 | /"institutionLevel1"/"id" | UI: Settings/Pick list: Party.InstitutionLevel1 API: /oa/refdata | UUID | |
Requests API: /oa/publicationRequest | Publisher | /"publisher"/"id" | UI: Settings/Pick list: PublicationRequest.Publisher API: /oa/refdata | UUID | |
Corresponding author | /"correspondingAuthor"/"partyOwner"/"id" | UI: People / person API: /oa/party JSON: /"id"/ | UUID | ||
Institution level 1 | /"correspondingAuthor"/"partyOwner"/"institutionLevel1"/"id" | UI: Settings/Pick list: Party.InstitutionLevel1 API: /oa/refdata | UUID | ||
/"correspondingInstitutionLevel1"/"id" | UI: Settings/Pick list: Party.InstitutionLevel1 API: /oa/refdata | UUID | |||
Corresponding author (implicit) | /"correspondingAuthor"/"role" | UI: Settings/Pick list: RequestParty.Role API: /oa/refdata | string | corresponding_author request_contact (not customizable) | |
Request contact | /"requestContact"/"partyOwner"/"id" | UI: People / person API: /oa/party JSON: /"id"/ | UUID | ||
/"requestContact"/"role" | UI: Settings/Pick list: RequestParty.Role API: /oa/refdata | string | corresponding_author request_contact (not customizable) | ||
Status | /"requestStatus"/ | UI: Settings/Pick list: PublicationRequest.RequestStatus | string | open, closed (not customizable) | |
Publication status (repeatable) | /"publicationStatuses"[n]/"publicationStatus"/"id" | UI: Settings/Pick list: PublicationStatus.PublicationStatus API: /oa/refdata | UUID | ||
License | /"license"/"id" | UI: Settings/Pick list: PublicationRequest.License API: /oa/refdata | UUID | ||
Publication type | /"publicationType"/"id" | UI: Settings/Pick list: PublicationRequest.Subtype API: /oa/refdata | UUID | ||
Journal | /"work"/"id" | UI: Journal API: oa/works/citation JSON: /"id"/ | UUID | ||
Journal in DOAJ | /"workIndexedInDOAJ"/"id" | UI: - API: /oa/refdata | UUID | (not customizable) | |
Journal OA status | /"workOAStatus"/"id" | UI: Settings/Pick list: Work.OaStatus | UUID | ||
Charges | TO DO | ||||
Checklist | TO DO |
Not all possible relationships are listed here, e.g. links to Agreements are not considered for data migration (Leipzig).
Know your UUIDs / references
As mentioned above, the JSON source file (and thus the CSV source file from which the JSON is created) must contain the UUID and string value references.
So how do we know which UUIDs and values to use?
There are two main ways to find out the UUID and reference values: via an API query or via a database query.
Via API query:
method: GET
OKAPI endpoint: /oa/refdata/
Optional parameters that can be appended to endpoint:
- max=<integer> ← maximum number of refdata categories to include in the response
- offset=<integer> ← in the case of paging through a list of refdata categories (i.e. if max is lower than the total number of categories), where to retrieve the next page of results from
Anyone with a Folio account on the system and the appropriate permissions can access this API endpoint. It is more a question of "how". An GUI API client such as Postman can be used.
An excerpt from the API response looks like this:
Via database query:
Direct database access is controlled by the system's administrator. A database tool like DBeaver can be used.
The query via direct database access can also provide a concise overview.
Preparing the source files
With everything prepared and knowing the UUID reference values, the source files for the transformation can be created. In this example we are assuming an Excel/CSV files.
The source files contains all OA relevant data which can be mapped in the OA app (hopefully all). This data needs to be enriched with the UUID and other reference values to have them ready for transformation into JSON.
In Leipzig we created six source files for:
- Journals
- People
- Publication requests, Charges and Checklist items (with iterative enrichment of UUIDs for
The source files from Leipzig look like this:
Above data must exist to import a Publication request.
From the column F:correspondingAuthorPartyOwnerId it can be seen that the data of the persons must first exist, the UUIDs must be extracted from the Folio system and enriched in the source file. Extracting the UUIDs can also be done with a direct database query. Here it is important that we have a matching point with the source data, in this case the email address. The enrichment can be done with the Excel function VLOOKUP, in German SVERWEIS.
Such lookup tabels may be necessary for columns U:workId and BI:chargeOwnerId as well.
Above data must exist to import a Charge, since Charges are linked to Requests. Therefore, we use the same source file enriched with column BI:chargeOwnerId (last column):
This applies also to Checklist items, since Checklist items are 'updated' information of the actual request.
OA data schema
The following are JSON files (schemas) and how the data correlate with the Leipzig Excel/CSV source file.
All we need is JSON conversion
Where to POST, PUT, DELETE?
All available OA API endpoints and methods are listed here: https://github.com/folio-org/mod-oa/blob/master/service/src/main/okapi/ModuleDescriptor-template.json
Mainly these might be of interest:
API | Use |
---|---|
GET {{OKAPI}}/oa/refdata?max=100 | to get all reference data values and UUIDs |
POST {{OKAPI}}/oa/works/citation | to post journal data |
GET {{OKAPI}}/oa/works | to get journal data |
GET {{OKAPI}}/oa/works/{{UUID}} | to get journal data based on a specific journal UUID |
POST {{OKAPI}}/oa/party | to post people data |
GET {{OKAPI}}/oa/party | to get people data |
GET {{OKAPI}}/oa/party/{{UUID}} | to get people data based on a specific person UUID |
POST {{OKAPI}}/oa/publicationRequest | to post publication requests |
GET {{OKAPI}}/oa/publicationRequest | to get publication requests |
PUT {{OKAPI}}/oa/publicationRequest | to update requests, i.e. to import checklist items |
GET {{OKAPI}}/oa/publicationRequest/{{UUID}} | to get a publication request based on a specific request UUID |
POST {{OKAPI}}/oa/charges | to post charges |
The actual import can be done e.g. with Postman (Runner function) or ask your local IT, systems librarian or library network.
How to know and learn what the API needs?
The Folio frontend also uses the API (OKAPI) to create, update, and delete records. Using the Firefox Developer Tools / Network Analysis, the corresponding API calls from the frontend to OKAPI can be viewed, i.e. API method, endpoint and the actual payload (JSON). This can help to understand how to prepare corresponding imports for migration.