Notes on data migration

Notes on data migration

PAGE IN PROGRESS

Content note

This page is intended to share tips and experiences on data migration. Anyone can participate.

 

This inital description assumes migration from an Excel/CSV source file. There are numerous forms in which the source data can exist, e.g. in a relational database. Therefore there are also numerous ways of data transformation, e.g. via scripts which pull the data directly from a database (like ILS). However, in all cases, knowledge of the interrelationships within OA as attempted to be presented here may be necessary.

Leipzig examples

Notes in green boxes contain examples from the Leipzig implementation, which may not apply everywhere.

Table of Contents

 

Set up the target system (Reference data)

When starting with the Open Access app there are a number of planning tasks and system setup tasks that will need to be completed in advance. A detailed overview of them is available >>> here <<<. 

This includes:

  • Request and Publication setup

    • Request numbering

    • Closure reasons

    • Publishers

    • Licenses

    • Publication identifiers

    • Publication subtypes

  • Charge setup

    • Charge category

    • Default tax rate

    • Payer

  • People setup

    • Institution level

  • Correspondence setup

    • Correspondence category
      Correspondence mode
      Correspondence status

  • Funding details setup

    • Aspect funded

    • Funder

  • Checklist setup

    • Checklist items

There ist also a wonderful description how to add pick list (aka reference data) values via the API >>> here <<<.

In some fields of the data to be imported, references to settings values have to be made in form of UUID or string value references. For this reason, all settings that play a role in the migration must be present in the target system before data import. 

Important! UUIDs are very individual. Each Folio instance creates own UUIDs for settings and data. Preparing for migration on a system other than the target system results in the need to recreate the UUID references!

Data refers to other data

Some data depends on already existing data. This is true for the relationships to settings (reference data), but also to other classes of information within the OA domain. The following dependencies exist:

 

Class

UI field

JSON Field

Relationship / Dependency

Type of reference

Example

Class

UI field

JSON Field

Relationship / Dependency

Type of reference

Example

Journals (works)

API: oa/works/citation

Instance / Namespace

/"instances"/"ids"/"ns"

UI: Settings/Pick list: PublicationIdentifier.Type

API: /oa/refdata

string

issn

 

 

/"instances"/"subType"

 

 

 

 

 

/"indexedInDOAJ"/

 

 

 

 

 

/"oaStatus"/

 

 

 

People (party)

API: oa/party

Institution level 1

/"institutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID

 

Requests

API: /oa/publicationRequest

Publisher

/"publisher"/"id"

UI: Settings/Pick list: PublicationRequest.Publisher

API: /oa/refdata

UUID

 

 

Corresponding author

/"correspondingAuthor"/"partyOwner"/"id"

UI: People / person

API: /oa/party

JSON: /"id"/

UUID

 

 

Institution level 1

/"correspondingAuthor"/"partyOwner"/"institutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID

 

 

 

/"correspondingInstitutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID

 

 

Corresponding author

(implicit)

/"correspondingAuthor"/"role"

UI: Settings/Pick list: RequestParty.Role

API: /oa/refdata

string

corresponding_author

request_contact

(not customizable)

 

Request contact

/"requestContact"/"partyOwner"/"id"

UI: People / person

API: /oa/party

JSON: /"id"/

UUID

 

 

 

/"requestContact"/"role"

UI: Settings/Pick list: RequestParty.Role

API: /oa/refdata

string

corresponding_author

request_contact

(not customizable)

 

Status

/"requestStatus"/

UI: Settings/Pick list: PublicationRequest.RequestStatus

string

open, closed

(not customizable)

 

Publication status

(repeatable)

/"publicationStatuses"[n]/"publicationStatus"/"id"

 

UI: Settings/Pick list: PublicationStatus.PublicationStatus

API: /oa/refdata

UUID

 

 

License

/"license"/"id"

UI: Settings/Pick list: PublicationRequest.License

API: /oa/refdata

UUID

 

 

Publication type

/"publicationType"/"id"

UI: Settings/Pick list: PublicationRequest.Subtype

API: /oa/refdata

UUID

 

 

Journal

/"work"/"id"

UI: Journal

API: oa/works/citation

JSON: /"id"/

UUID

 

 

Journal in DOAJ

/"workIndexedInDOAJ"/"id"

UI: -

API: /oa/refdata

UUID

(not customizable)

 

Journal OA status

/"workOAStatus"/"id"

UI: Settings/Pick list: Work.OaStatus

UUID

 

Charges

to do

 

 

 

 

 

 

 

 

 

 

Checklist

to do

 

 

 

 

Not all possible relationships are listed here, e.g. links to Agreements are not considered for data migration (Leipzig).

 

Know your UUIDs / references

As mentioned above, the JSON source file (and thus the CSV source file from which the JSON is created) must contain the UUID and string value references.

So how do we know which UUIDs and values to use?

There are two main ways to find out the UUID and reference values: via an API query or via a database query.

Via API query:

method: GET
OKAPI endpoint: /oa/refdata/
Optional parameters that can be appended to endpoint:
- max=<integer> ← maximum number of refdata categories to include in the response
- offset=<integer> ← in the case of paging through a list of refdata categories (i.e. if max is lower than the total number of categories), where to retrieve the next page of results from

Anyone with a Folio account on the system and the appropriate permissions can access this API endpoint. It is more a question of "how". An GUI API client such as Postman can be used.

An excerpt from the API response looks like this:

Leipzig example

more data ...

{
        "id""8aaa80fe859fe8ee0185a0ea5d980023",
        "desc""Work.OaStatus",
        "internal"false,
        "values": [
            {
                "id""8aaa80fe859fe8ee0185a0ea5d9b0024",    ← UUID for OA status "Gold"
                "value""gold",
                "label""Gold"
            },
            {
                "id""8aaa80fe859fe8ee0185a0ea5da00025",    ← UUID for OA status "Hybrid"
                "value""hybrid",
                "label""Hybrid"
            },
            {
                "id""8aaa80fe859fe8ee0185bac6b6c9004f",    ← UUID for OA status "Subscribe to Open"
                "value""subscribe_to_open",
                "label""Subscribe to Open (S2O)"
            }
        ]
    },

... more data

Via database query:

Direct database access is controlled by the system's administrator. A database tool like DBeaver can be used.

The query via direct database access can also provide a concise overview.

Preparing the source files

With everything prepared and knowing the UUID reference values, the source files for the transformation can be created. In this example we are assuming an Excel/CSV files.

The source files contains all OA relevant data which can be mapped in the OA app (hopefully all). This data needs to be enriched with the UUID and other reference values to have them ready for transformation into JSON.

In Leipzig we created six source files for:

  • Journals

  • People

  • Publication requests, Charges and Checklist items (with iterative enrichment of UUIDs for 

The source files from Leipzig look like this:

Leipzig example: Journal source file

Leipzig example: People

Above data must exist to import a Publication request.

Leipzig example: Publication requests


This table include all data about Publication requests, Charges and Checklist items, including enriched reference values.
But it does not yet include BI:chargeOwnerId, which is the UUID of the Request itself.

 

From the column F:correspondingAuthorPartyOwnerId it can be seen that the data of the persons must first exist, the UUIDs must be extracted from the Folio system and enriched in the source file. Extracting the UUIDs can also be done with a direct database query. Here it is important that we have a matching point with the source data, in this case the email address. The enrichment can be done with the Excel function VLOOKUP, in German SVERWEIS. 

Such lookup tabels may be necessary for columns U:workId and BI:chargeOwnerId as well.

Leipzig rexample: lookup table People eMail<->UUID

 

 

 

 

 

Above data must exist to import a Charge, since Charges are linked to Requests. Therefore, we use the same source file enriched with column BI:chargeOwnerId (last column):

Leipzig example: Same source file like Publication requests but now including BI:chargeOwnerId

 

 

This applies also to Checklist items, since Checklist items are 'updated' information of the actual request.

OA data schema

The following are JSON files (schemas) and how the data correlate with the Leipzig Excel/CSV source file. 

Leipzig example: Journal JSON mapping

{ "title": "SPALTE-A", "instances": [ { "ids": [ { "ns": "issn", "id": "SPALTE-B" } ], "subType": "electronic" }, { "ids": [ { "ns": "issn", "id": "SPALTE-C" } ], "subType": "print" } ], "indexedInDOAJ": "SPALTE-E", "oaStatus": "SPALTE-D", "type": "serial" }

Leipzig example: People JSON mapping

to do

Leipzig example: Request JSON mapping

{"publicationStatuses":[{"publicationStatus":{"id":"Y:publicationStatus1Id"},"statusDate":"Z:statusDate1","statusNote":"AA:statusNote1"},{"publicationStatus":{"id":"AB:publicationStatus2Id"},"statusDate":"AC:statusDate2","statusNote":"AD:statusNote2"},{"publicationStatus":{"id":"AE:publicationStatus3Id"},"statusDate":"AF:statusDate3","statusNote":"AG:statusNote3"},{"publicationStatus":{"id":"AH:publicationStatus4Id"},"statusDate":"AI:statusDate4"}],"fundings":[],"requestStatus":"A:requestStatus","closureReason":null,"requestDate":"B:requestDate","externalRequestIds":[{"externalId":"C:externalId1"},{"externalId":"D:externalId2"}],"correspondingAuthor":{"partyOwner":{"id":"F:correspondingAuthorPartyOwnerId","alternateEmails":[],"institutionLevel1":{"id":"L:correspondingInstitutionLevel1Id"}},"role":"G:correspondingAuthorRole"},"correspondingInstitutionLevel1":{"id":"L:correspondingInstitutionLevel1Id"},"useCorrespondingAuthor":false,"requestContact":{"partyOwner":{"id":"I:requestContactPartyOwnerId","alternateEmails":[]},"role":"J:requestContactRole"},"correspondingInstitutionLevel2":"M:correspondingInstitutionLevel2","doi":"N:doi","license":{"id":"O:licenseId"},"publicationTitle":"P:publicationTitle","publicationType":{"id":"Q:publicationTypeId"},"publisher":{"id":"S:publisherId"},"work":{"id":"U:workId"},"workIndexedInDOAJ":{"id":"X:workIndexedInDOAJId"},"workOAStatus":{"id":"V:workOAStatusId"},"requestNumber":"BH:requestNumber","subtype":null,"bookDateOfPublication":null,"bookPlaceOfPublication":null,"agreement":null}

Charge1 JSON mapping

{"tax":"AR:charge1Tax","discountType":{"id":"8aaa80fe859fe8ee0185a0ea5d5e0017"},"exchangeRate":{"coefficient":1,--->>>type:number"toCurrency":"EUR"},"amount":{"baseCurrency":"AQ:charge1AmountBaseCurrency","value":"AP:charge1AmountValue"},"category":{"id":"AN:charge1CategoryId"},"chargeStatus":{"id":"AO:charge1StatusId"},"description":"OA-Migration","discount":0,--->>>type:number"discountNote":"","paymentPeriod":"2023","payers":[{"payerAmount":AT:charge1PayersPayer1Amount,--->>>type:number"payer":{"id":"AS:charge1PayersPayer1Id"},"payerNote":"AU:charge1PayersPayer1Note"},{"payerAmount":AW:charge1PayersPayer2Amount,--->>>type:number"payer":{"id":"AV:charge1PayersPayer2Id"},"payerNote":"AX:charge1PayersPayer2Note"}],"owner":{"id":"BI:chargeOwnerId"}}

Checklist item JSON mapping

{"id":"BI:chargeOwnerId","checklist":[{"outcome":{"value":"AK:checklist1OutcomeValue"},"notes":[],"definition":{"id":"AJ:checklist1DefinitionId"}},{"outcome":{"value":"AM:checklist2OutcomeValue"},"notes":[],"definition":{"id":"AL:checklist2DefinitionId"}}]}

All we need is JSON conversion

 

Where to POST, PUT, DELETE?

All available OA API endpoints and methods are listed here: https://github.com/folio-org/mod-oa/blob/master/service/src/main/okapi/ModuleDescriptor-template.json

Mainly these might be of interest:

API

Use

API

Use

GET {{OKAPI}}/oa/refdata?max=100

to get all reference data values and UUIDs

POST {{OKAPI}}/oa/works/citation

to post journal data

GET {{OKAPI}}/oa/works

to get journal data

GET {{OKAPI}}/oa/works/{{UUID}}

to get journal data based on a specific journal UUID

POST {{OKAPI}}/oa/party

to post people data

GET {{OKAPI}}/oa/party

to get people data

GET {{OKAPI}}/oa/party/{{UUID}}

to get people data based on a specific person UUID

POST {{OKAPI}}/oa/publicationRequest

to post publication requests

GET {{OKAPI}}/oa/publicationRequest

to get publication requests

PUT {{OKAPI}}/oa/publicationRequest

to update requests, i.e. to import checklist items

GET {{OKAPI}}/oa/publicationRequest/{{UUID}}

to get a publication request based on a specific request UUID

POST {{OKAPI}}/oa/charges

to post charges

The actual import can be done e.g. with Postman (Runner function) or ask your local IT, systems librarian or library network.

How to know and learn what the API needs?

The Folio frontend also uses the API (OKAPI) to create, update, and delete records. Using the Firefox Developer Tools / Network Analysis, the corresponding API calls from the frontend to OKAPI can be viewed, i.e. API method, endpoint and the actual payload (JSON). This can help to understand how to prepare corresponding imports for migration.