Notes on data migration

PAGE IN PROGRESS

Content note

This page is intended to share tips and experiences on data migration. Anyone can participate.


This inital description assumes migration from an Excel/CSV source file. There are numerous forms in which the source data can exist, e.g. in a relational database. Therefore there are also numerous ways of data transformation, e.g. via scripts which pull the data directly from a database (like ILS). However, in all cases, knowledge of the interrelationships within OA as attempted to be presented here may be necessary.

Leipzig examples

Notes in green boxes contain examples from the Leipzig implementation, which may not apply everywhere.

Table of Contents


Set up the target system (Reference data)

When starting with the Open Access app there are a number of planning tasks and system setup tasks that will need to be completed in advance. A detailed overview of them is available >>> here <<<. 

This includes:

  • Request and Publication setup
    • Request numbering
    • Closure reasons
    • Publishers
    • Licenses
    • Publication identifiers
    • Publication subtypes
  • Charge setup
    • Charge category
    • Default tax rate
    • Payer
  • People setup
    • Institution level
  • Correspondence setup
    • Correspondence category
      Correspondence mode
      Correspondence status
  • Funding details setup
    • Aspect funded
    • Funder
  • Checklist setup
    • Checklist items

There ist also a wonderful description how to add pick list (aka reference data) values via the API >>> here <<<.

In some fields of the data to be imported, references to settings values have to be made in form of UUID or string value references. For this reason, all settings that play a role in the migration must be present in the target system before data import. 

Important! UUIDs are very individual. Each Folio instance creates own UUIDs for settings and data. Preparing for migration on a system other than the target system results in the need to recreate the UUID references!

Data refers to other data

Some data depends on already existing data. This is true for the relationships to settings (reference data), but also to other classes of information within the OA domain. The following dependencies exist:


Class

UI field

JSON Field

Relationship / Dependency

Type of reference

Example

Journals (works)

API: oa/works/citation

Instance / Namespace

/"instances"/"ids"/"ns"

UI: Settings/Pick list: PublicationIdentifier.Type

API: /oa/refdata

string

issn



/"instances"/"subType"




/"indexedInDOAJ"/




/"oaStatus"/


People (party)

API: oa/party

Institution level 1/"institutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID

Requests

API: /oa/publicationRequest

Publisher/"publisher"/"id"

UI: Settings/Pick list: PublicationRequest.Publisher

API: /oa/refdata

UUID

Corresponding author/"correspondingAuthor"/"partyOwner"/"id"

UI: People / person

API: /oa/party

JSON: /"id"/

UUID

Institution level 1/"correspondingAuthor"/"partyOwner"/"institutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID


/"correspondingInstitutionLevel1"/"id"

UI: Settings/Pick list: Party.InstitutionLevel1

API: /oa/refdata

UUID

Corresponding author

(implicit)

/"correspondingAuthor"/"role"

UI: Settings/Pick list: RequestParty.Role

API: /oa/refdata

string

corresponding_author

request_contact

(not customizable)


Request contact/"requestContact"/"partyOwner"/"id"

UI: People / person

API: /oa/party

JSON: /"id"/

UUID


/"requestContact"/"role"

UI: Settings/Pick list: RequestParty.Role

API: /oa/refdata

string

corresponding_author

request_contact

(not customizable)


Status/"requestStatus"/UI: Settings/Pick list: PublicationRequest.RequestStatusstring

open, closed

(not customizable)


Publication status

(repeatable)

/"publicationStatuses"[n]/"publicationStatus"/"id"


UI: Settings/Pick list: PublicationStatus.PublicationStatus

API: /oa/refdata

UUID

License/"license"/"id"

UI: Settings/Pick list: PublicationRequest.License

API: /oa/refdata

UUID

Publication type/"publicationType"/"id"

UI: Settings/Pick list: PublicationRequest.Subtype

API: /oa/refdata

UUID

Journal/"work"/"id"

UI: Journal

API: oa/works/citation

JSON: /"id"/

UUID

Journal in DOAJ/"workIndexedInDOAJ"/"id"

UI: -

API: /oa/refdata

UUID(not customizable)

Journal OA status/"workOAStatus"/"id"UI: Settings/Pick list: Work.OaStatusUUID
Charges

TO DO











Checklist

TO DO





Not all possible relationships are listed here, e.g. links to Agreements are not considered for data migration (Leipzig).


Know your UUIDs / references

As mentioned above, the JSON source file (and thus the CSV source file from which the JSON is created) must contain the UUID and string value references.

So how do we know which UUIDs and values to use?

There are two main ways to find out the UUID and reference values: via an API query or via a database query.

Via API query:

method: GET
OKAPI endpoint: /oa/refdata/
Optional parameters that can be appended to endpoint:
- max=<integer> ← maximum number of refdata categories to include in the response
- offset=<integer> ← in the case of paging through a list of refdata categories (i.e. if max is lower than the total number of categories), where to retrieve the next page of results from

Anyone with a Folio account on the system and the appropriate permissions can access this API endpoint. It is more a question of "how". An GUI API client such as Postman can be used.

An excerpt from the API response looks like this:

Leipzig example

more data ...

{
        "id""8aaa80fe859fe8ee0185a0ea5d980023",
        "desc""Work.OaStatus",
        "internal"false,
        "values": [
            {
                "id""8aaa80fe859fe8ee0185a0ea5d9b0024",    ← UUID for OA status "Gold"
                "value""gold",
                "label""Gold"
            },
            {
                "id""8aaa80fe859fe8ee0185a0ea5da00025",    ← UUID for OA status "Hybrid"
                "value""hybrid",
                "label""Hybrid"
            },
            {
                "id""8aaa80fe859fe8ee0185bac6b6c9004f",    ← UUID for OA status "Subscribe to Open"
                "value""subscribe_to_open",
                "label""Subscribe to Open (S2O)"
            }
        ]
    },

... more data

Via database query:

Direct database access is controlled by the system's administrator. A database tool like DBeaver can be used.

The query via direct database access can also provide a concise overview.

Preparing the source files

With everything prepared and knowing the UUID reference values, the source files for the transformation can be created. In this example we are assuming an Excel/CSV files.

The source files contains all OA relevant data which can be mapped in the OA app (hopefully all). This data needs to be enriched with the UUID and other reference values to have them ready for transformation into JSON.

In Leipzig we created six source files for:

  • Journals
  • People
  • Publication requests, Charges and Checklist items (with iterative enrichment of UUIDs for 

The source files from Leipzig look like this:

Leipzig example: Journal source file

A:Title
B:eISSN
C:pISSN
D:OA Status
E:Indexed in DOAJ
Frontiers in Aerospace Engineering
2813-2831
goldno

Leipzig example: People

A:givenNames
B:familyName
C:mainEmail
D:institutionLevel1Id
Lukas
Müller
xxx@uni-leipzig.de
8aaa80aa85c4472e0185e331db060010

Above data must exist to import a Publication request.

Leipzig example: Publication requests


This table include all data about Publication requests, Charges and Checklist items, including enriched reference values.
But it does not yet include BI:chargeOwnerId, which is the UUID of the Request itself.
A:requestStatus
B:requestDate
C:externalId1
D:externalId2
E:correspondingAuthor:partyOwner:mainEmail
F:correspondingAuthorPartyOwnerId
G:correspondingAuthorRole
requestContact: partyOwner: mainEmail
I:requestContactPartyOwnerId
J:requestContactRole
correspondingInstitutionLevel1
L:correspondingInstitutionLevel1Id
M:correspondingInstitutionLevel2
N:doi
O:licenseId
P:publicationTitle
Q:publicationTypeId
publisher
S:publisherId
work: issn: electronic
U:workId
V:workOAStatusId
workIndexedInDOAJ: value
X:workIndexedInDOAJId
Y:publicationStatus1Id
Z:statusDate1
AA:statusNote1
AB:publicationStatus2Id
AC:statusDate2
AD:statusNote2
AE:publicationStatus3Id
AF:statusDate3
AG:statusNote3
AH:publicationStatus4Id
AI:statusDate4
AJ:checklist1DefinitionId
AK:checklist1OutcomeValue
AL:checklist2DefinitionId
AM:checklist2OutcomeValue
AN:charge1CategoryId
AO:charge1StatusId
AP:charge1AmountValue
AQ:charge1AmountBaseCurrency
AR:charge1Tax
AS:charge1PayersPayer1Id
AT:charge1PayersPayer1Amount
AU:charge1PayersPayer1Note
AV:charge1PayersPayer2Id
AW:charge1PayersPayer2Amount
AX:charge1PayersPayer2Note
AY:charge2CategoryId
AZ:charge2StatusId
BA:charge2AmountValue
BB:charge2AmountBaseCurrency
BC:charge2Tax
BD:charge2PayersPayer1Id
BE:charge2PayersPayer1Amount
BF:charge2PayersPayer2Id
BG:charge2PayersPayer2Amount
BH:requestNumber
BI:chargeOwnerId
open
2023-01-03
OAP-2023-190

1234@uni-leipzig.de
5c3052b8-6bba-49b7-bbd3-ed6938c7aea3
corresponding_author
1234@uni-leipzig.de
5c3052b8-6bba-49b7-bbd3-ed6938c7aea3
request_contact
Wirtschaftswissenschaften
8aaa80aa85c4472e0185e331fd270011
Institut für Service und Relationship Management
10.3390/su15032272
8aaa809385efeeed0185f3644cb5000d
Motivational Facets of Sustainable Entrepreneurship: A Systematic Review and Future Research Agenda
8aaa80fe859fe8ee0185a0ea5e06003e
Multidisciplinary Digital Publishing Institute (MDPI)
8aaa844885f4499b01862b148293000f
2071-1050
1d6571a4-de0b-422a-b8f4-1751f95be5e5
8aaa80fe859fe8ee0185a0ea5d9b0024
yes
8aaa80fe859fe8ee0185a0ea5d8e0021
8aaa809385efeeed0185f2b8b09f0007
2023-01-01




8aaa809385efeeed0185f2b835e50004
2023-01-26
Rechnung liegt vor, Splitt beantragt/26.1.
8aaa809385efeeed0185f2b85e5e0005
2023-01-25
39adb789-3e95-4590-80df-860e0ae540ca
yes
da97441a-a990-4320-b3a5-56782dfa3ed7
yes
8aaa80fe859fe8ee0185a0ea5d81001e
8aaa80fe859fe8ee0185a0ea5d70001a
2001
EUR
19
8aaa809385efeeed0185f37391370015
2000

8aaa809385efeeed0185f374547e0017
381.19
Dr. XXX XXX; XXX XXX , XXX, Norwegen; Rechnung liegt vor, Splitt beantragt/26.1.2023









OAM-173


From the column F:correspondingAuthorPartyOwnerId it can be seen that the data of the persons must first exist, the UUIDs must be extracted from the Folio system and enriched in the source file. Extracting the UUIDs can also be done with a direct database query. Here it is important that we have a matching point with the source data, in this case the email address. The enrichment can be done with the Excel function VLOOKUP, in German SVERWEIS. 

Such lookup tabels may be necessary for columns U:workId and BI:chargeOwnerId as well.

Leipzig rexample: lookup table People eMail<->UUID


p_main_email
p_id
1234@uni-leipzig.de
5c3052b8-6bba-49b7-bbd3-ed6938c7aea3

identifier
work_id
2071-1050
1d6571a4-de0b-422a-b8f4-1751f95be5e5

pr_request_number
pr_id
OAM-173
a5b1e770-bc56-4b93-ba91-e95d46d3b42c


Above data must exist to import a Charge, since Charges are linked to Requests. Therefore, we use the same source file enriched with column BI:chargeOwnerId (last column):

Leipzig example: Same source file like Publication requests but now including BI:chargeOwnerId


A:requestStatus
B:requestDate
C:externalId1
D:externalId2
E:correspondingAuthor:partyOwner:mainEmail
F:correspondingAuthorPartyOwnerId
G:correspondingAuthorRole
requestContact: partyOwner: mainEmail
I:requestContactPartyOwnerId
J:requestContactRole
correspondingInstitutionLevel1
L:correspondingInstitutionLevel1Id
M:correspondingInstitutionLevel2
N:doi
O:licenseId
P:publicationTitle
Q:publicationTypeId
publisher
S:publisherId
work: issn: electronic
U:workId
V:workOAStatusId
workIndexedInDOAJ: value
X:workIndexedInDOAJId
Y:publicationStatus1Id
Z:statusDate1
AA:statusNote1
AB:publicationStatus2Id
AC:statusDate2
AD:statusNote2
AE:publicationStatus3Id
AF:statusDate3
AG:statusNote3
AH:publicationStatus4Id
AI:statusDate4
AJ:checklist1DefinitionId
AK:checklist1OutcomeValue
AL:checklist2DefinitionId
AM:checklist2OutcomeValue
AN:charge1CategoryId
AO:charge1StatusId
AP:charge1AmountValue
AQ:charge1AmountBaseCurrency
AR:charge1Tax
AS:charge1PayersPayer1Id
AT:charge1PayersPayer1Amount
AU:charge1PayersPayer1Note
AV:charge1PayersPayer2Id
AW:charge1PayersPayer2Amount
AX:charge1PayersPayer2Note
AY:charge2CategoryId
AZ:charge2StatusId
BA:charge2AmountValue
BB:charge2AmountBaseCurrency
BC:charge2Tax
BD:charge2PayersPayer1Id
BE:charge2PayersPayer1Amount
BF:charge2PayersPayer2Id
BG:charge2PayersPayer2Amount
BH:requestNumber
BI:chargeOwnerId
open
2023-01-03
OAP-2023-190

1234@uni-leipzig.de
5c3052b8-6bba-49b7-bbd3-ed6938c7aea3
corresponding_author
1234@uni-leipzig.de
5c3052b8-6bba-49b7-bbd3-ed6938c7aea3
request_contact
Wirtschaftswissenschaften
8aaa80aa85c4472e0185e331fd270011
Institut für Service und Relationship Management
10.3390/su15032272
8aaa809385efeeed0185f3644cb5000d
Motivational Facets of Sustainable Entrepreneurship: A Systematic Review and Future Research Agenda
8aaa80fe859fe8ee0185a0ea5e06003e
Multidisciplinary Digital Publishing Institute (MDPI)
8aaa844885f4499b01862b148293000f
2071-1050
1d6571a4-de0b-422a-b8f4-1751f95be5e5
8aaa80fe859fe8ee0185a0ea5d9b0024
yes
8aaa80fe859fe8ee0185a0ea5d8e0021
8aaa809385efeeed0185f2b8b09f0007
2023-01-01




8aaa809385efeeed0185f2b835e50004
2023-01-26
Rechnung liegt vor, Splitt beantragt/26.1.
8aaa809385efeeed0185f2b85e5e0005
2023-01-25
39adb789-3e95-4590-80df-860e0ae540ca
yes
da97441a-a990-4320-b3a5-56782dfa3ed7
yes
8aaa80fe859fe8ee0185a0ea5d81001e
8aaa80fe859fe8ee0185a0ea5d70001a
2001
EUR
19
8aaa809385efeeed0185f37391370015
2000

8aaa809385efeeed0185f374547e0017
381.19
Dr. XXX XXX; XXX XXX , XXX, Norwegen; Rechnung liegt vor, Splitt beantragt/26.1.2023









OAM-173
a5b1e770-bc56-4b93-ba91-e95d46d3b42c

This applies also to Checklist items, since Checklist items are 'updated' information of the actual request.

OA data schema

The following are JSON files (schemas) and how the data correlate with the Leipzig Excel/CSV source file. 

Leipzig example: Journal JSON mapping

{
  "title": "SPALTE-A",
  "instances": [
    {
      "ids": [
        {
          "ns": "issn",
          "id": "SPALTE-B" 
        }
      ],
      "subType": "electronic" 
    },
    {
      "ids": [
        {
          "ns": "issn",
          "id": "SPALTE-C" 
        }
      ],
      "subType": "print" 
    }
  ],
  "indexedInDOAJ": "SPALTE-E",
  "oaStatus": "SPALTE-D",
  "type": "serial" 
}

Leipzig example: People JSON mapping

to do

Leipzig example: Request JSON mapping

{
    "publicationStatuses": [
        {
            "publicationStatus": {
                "id": "Y:publicationStatus1Id" 
            },
            "statusDate": "Z:statusDate1",
            "statusNote": "AA:statusNote1" 
        },
        {
            "publicationStatus": {
                "id": "AB:publicationStatus2Id" 
            },
            "statusDate": "AC:statusDate2",
            "statusNote": "AD:statusNote2" 
        },
        {
            "publicationStatus": {
                "id": "AE:publicationStatus3Id" 
            },
            "statusDate": "AF:statusDate3",
            "statusNote": "AG:statusNote3" 
        },
        {
            "publicationStatus": {
                "id": "AH:publicationStatus4Id" 
            },
            "statusDate": "AI:statusDate4" 
        }
    ],
    "fundings": [],
    "requestStatus": "A:requestStatus",
    "closureReason": null,
    "requestDate": "B:requestDate",
    "externalRequestIds": [
        {
            "externalId": "C:externalId1" 
        },
        {
            "externalId": "D:externalId2" 
        }
    ],
    "correspondingAuthor": {
        "partyOwner": {
            "id": "F:correspondingAuthorPartyOwnerId",
            "alternateEmails": [],
            "institutionLevel1": {
                "id": "L:correspondingInstitutionLevel1Id" 
            }
        },
        "role": "G:correspondingAuthorRole" 
    },
    "correspondingInstitutionLevel1": {
        "id": "L:correspondingInstitutionLevel1Id" 
    },
    "useCorrespondingAuthor": false,
    "requestContact": {
        "partyOwner": {
            "id": "I:requestContactPartyOwnerId",
            "alternateEmails": []
        },
        "role": "J:requestContactRole" 
    },
    "correspondingInstitutionLevel2": "M:correspondingInstitutionLevel2",
    "doi": "N:doi",
    "license": {
        "id": "O:licenseId" 
    },
    "publicationTitle": "P:publicationTitle",
    "publicationType": {
        "id": "Q:publicationTypeId" 
    },
    "publisher": {
        "id": "S:publisherId" 
    },
    "work": {
        "id": "U:workId" 
    },
    "workIndexedInDOAJ": {
        "id": "X:workIndexedInDOAJId" 
    },
    "workOAStatus": {
        "id": "V:workOAStatusId" 
    },
    "requestNumber": "BH:requestNumber",
    "subtype": null,
    "bookDateOfPublication": null,
    "bookPlaceOfPublication": null,
    "agreement": null
}

Charge1 JSON mapping

{
    "tax": "AR:charge1Tax",
    "discountType": {
        "id": "8aaa80fe859fe8ee0185a0ea5d5e0017" 
    },
    "exchangeRate": {
        "coefficient": 1,                               --->>> type:number
        "toCurrency": "EUR" 
    },
    "amount": {
        "baseCurrency": "AQ:charge1AmountBaseCurrency",
        "value": "AP:charge1AmountValue" 
    },
    "category": {
        "id": "AN:charge1CategoryId" 
    },
    "chargeStatus": {
        "id": "AO:charge1StatusId" 
    },
    "description": "OA-Migration",
    "discount": 0,                                     --->>> type:number
    "discountNote": "",
    "paymentPeriod": "2023",
    "payers": [{
        "payerAmount": AT:charge1PayersPayer1Amount,   --->>> type:number
        "payer": {
            "id": "AS:charge1PayersPayer1Id" 
        },
        "payerNote": "AU:charge1PayersPayer1Note" 
    }, {
        "payerAmount": AW:charge1PayersPayer2Amount,   --->>> type:number
        "payer": {
            "id": "AV:charge1PayersPayer2Id" 
        },
        "payerNote": "AX:charge1PayersPayer2Note" 
    }],
    "owner": {
        "id": "BI:chargeOwnerId" 
    }
}

Checklist item JSON mapping

{
    "id": "BI:chargeOwnerId",
    "checklist": [
        {
            "outcome": {
                "value": "AK:checklist1OutcomeValue" 
            },
            "notes": [],
            "definition": {
                "id": "AJ:checklist1DefinitionId" 
            }
        },
        {
            "outcome": {
                "value": "AM:checklist2OutcomeValue" 
            },
            "notes": [],
            "definition": {
                "id": "AL:checklist2DefinitionId" 
            }
        }
    ]
}

All we need is JSON conversion


Where to POST, PUT, DELETE?

All available OA API endpoints and methods are listed here: https://github.com/folio-org/mod-oa/blob/master/service/src/main/okapi/ModuleDescriptor-template.json

Mainly these might be of interest:

APIUse
GET {{OKAPI}}/oa/refdata?max=100to get all reference data values and UUIDs
POST {{OKAPI}}/oa/works/citationto post journal data
GET {{OKAPI}}/oa/worksto get journal data
GET {{OKAPI}}/oa/works/{{UUID}}to get journal data based on a specific journal UUID
POST {{OKAPI}}/oa/partyto post people data
GET {{OKAPI}}/oa/partyto get people data
GET {{OKAPI}}/oa/party/{{UUID}}to get people data based on a specific person UUID
POST {{OKAPI}}/oa/publicationRequestto post publication requests
GET {{OKAPI}}/oa/publicationRequestto get publication requests
PUT {{OKAPI}}/oa/publicationRequestto update requests, i.e. to import checklist items
GET {{OKAPI}}/oa/publicationRequest/{{UUID}}to get a publication request based on a specific request UUID
POST {{OKAPI}}/oa/chargesto post charges

The actual import can be done e.g. with Postman (Runner function) or ask your local IT, systems librarian or library network.

How to know and learn what the API needs?

The Folio frontend also uses the API (OKAPI) to create, update, and delete records. Using the Firefox Developer Tools / Network Analysis, the corresponding API calls from the frontend to OKAPI can be viewed, i.e. API method, endpoint and the actual payload (JSON). This can help to understand how to prepare corresponding imports for migration.