...
Automate linking in quickMARC app
Introduction
This document outlines the design of a backend feature, which will allow users to automatically validate/update/create links for MARC bib fields to an authority record when editing a MARC bib record.
More details on a feature: [UXPROD-3874] and spike story: [MODELINKS-79]
Requirements
Functional Requirements
- The API must allow finding all applicable MARC authorities to control the MARC bib record based on the $0 subfield of the bib's fields.
- The API must provide only suggestions for links, and must not save any new data, saving will be performed by a user.
- The API must allow saving the MARC bib record even if linking a MARC bib field to a MARC authority record was unsuccessful.
- The API must allow sending a MARC record for links assignment with already existing links.
- The API must allow sending a MARC record for links assignment with not saved links and changes to the bib record.
- The API must provide a notification of what fields were successfully linked and what fields that are applicable for linking failed to link.
- The API must provide different types of failures for failed fields.
Non-Functional Requirements
- Automate linking should take no longer than ~2 seconds
Architecture
An API endpoint will be implemented in mod-quick-marc to provide UI with suggested links for the record. mod-quick-marc will work as a proxy module, that will call the mod-entities-links newly created endpoint. All main linking logic will be implemented in mod-entities-links. mod-search will be used as a search service for finding applicable MARC authorities. mod-source-record-storage will be used for fetching data needed to construct controllable fields in the MARC bib record. All interaction between modules is via HTTP.
Image Added
Code Block |
---|
title | Component Diagram Source |
---|
collapse | true |
---|
|
@startuml
skinparam componentStyle rectangle
[User Interface]
package "Backend" {
[Okapi] --> [mod-entities-links]
[mod-entities-links] ..> [Okapi]
[Okapi] --> [mod-search]
[mod-search] ..> [Okapi]
[Okapi] --> [mod-quick-marc]
[mod-quick-marc] ..> [Okapi]
[Okapi] --> [mod-source-record-storage]
[mod-source-record-storage] ..> [Okapi]
}
[User Interface] --> [Okapi]
[Okapi] --> [User Interface]
database "PostgreSql" {
[entities-links]
[source-record-storage]
}
database "OpenSearch/ElasticSearch" {
[authorities]
}
[mod-entities-links] <--> [entities-links]
[mod-source-record-storage] <--> [source-record-storage]
[mod-search] <--> [authorities]
@enduml |
Data Flow and Processing
Image Added
Code Block |
---|
language | text |
---|
title | Sequence Diagram Source |
---|
collapse | true |
---|
|
@startuml
title QuickMarc Autolinking
participant UI as ui
participant "quick-marc" as qm
participant "entities-links" as el
participant "search" as ms
participant "source-record-storage" as rs
autonumber
ui -> qm ++ : request record\nwith links
qm -> qm: convert
qm -> el ++ : request assign links
el -> el : get linking rules
el -> el : extract $0s
el -> ms ++ : search authorities (without count)
ms --> el -- : authorities
el -> rs ++ : get records by external ids
rs --> el -- : source records
el -> el : set links data
el --> qm -- : record with links
qm -> qm -- : convert
qm --> ui : record with links
@enduml |
- The UI sends a request to the backend API.
- mod-quick-marc receives the request and converts the record into an SRS-like format.
- mod-quick-marc sends a request to the mod-entities-links API.
- mod-entities-links receives the request and fetches linking rules from the database using cache.
- From MARC bib fields that are applicable for linking according to linking rules, $0 subfield values are extracted.
- $0 values are used for search authorities in mod-search. The current mod-search endpoint also calculates a number of already existing links in the instance index, this should be omitted to speed up the process. TBD: authorities naturalId is exist in the internal database table 'authority_data'. Should we use this data before doing a search in mod-search?
- mod-entities-links receives a collection of authority records and prepares a request to the mod-source-record-storage.
- mod-entities-links sends a request to the mod-source-record-storage bulk endpoint.
- mod-entities-links receives a collection of authority source records.
- mod-entities-links analyze results, prepare data for links according to linking rules, and set constructed links into the record.
- mod-quick-marc receives the record with links.
- mod-quick-marc converts the record into the appropriate format.
- UI receives the record with suggested links.
API Design
mod-quick-marc
POST /records-editor/links/suggestion
This endpoint will be used to find and provide UI with valid links for a record. The request will include a JSON payload with the record data:
Code Block |
---|
title | Request body |
---|
collapse | true |
---|
|
{
"marcFormat": "BIBLIOGRAPHIC",
"leader": "01587ccm a2200361 4500",
"fields": [
{
"tag": "001",
"content": "393893"
},
{
"tag": "100",
"content": "$a 393893 $b test $0 n1234567890 $9 312da284-a8fd-4c84-ae90-927539d6df93",
"indicators": [
"1",
"2"
],
"link": {
"authorityId": "312da284-a8fd-4c84-ae90-927539d6df93",
"authorityNaturalId": "n1234567890",
"linkingRuleId": 1,
"status": "ACTUAL"
}
},
{
"tag": "100",
"content": "$a 393893 $b test $0 n1234567890 $9 312da284-a8fd-4c84-ae90-927539d6df93",
"indicators": [
"1",
"2"
],
"link": {
"authorityId": "312da284-a8fd-4c84-ae90-927539d6df93",
"authorityNaturalId": "n1234567890",
"linkingRuleId": 1,
"status": "ERROR"
}
},
{
"tag": "600",
"content": "$a 393893 $b test",
"indicators": [
"1",
"2"
]
}
]
} |
The response will include suggested links with the status "NEW"; fixed data and status "ACTUAL" for links, that had the status "ERROR"; links with the status "ERROR" and cause type for fields where a link can't be assigned.
Code Block |
---|
title | Response body |
---|
collapse | true |
---|
|
{
"marcFormat": "BIBLIOGRAPHIC",
"leader": "01587ccm a2200361 4500",
"fields": [
{
"tag": "001",
"content": "393893"
},
{
"tag": "100",
"content": "$a 393893 $b test $0 n1234567890 $9 312da284-a8fd-4c84-ae90-927539d6df93",
"indicators": [
"1",
"2"
],
"link": {
"authorityId": "312da284-a8fd-4c84-ae90-927539d6df93",
"authorityNaturalId": "n1234567890",
"linkingRuleId": 1,
"status": "ACTUAL"
}
},
{
"tag": "110",
"content": "$a 393893 $b updated $0 n1234567890 $9 312da284-a8fd-4c84-ae90-927539d6df93",
"indicators": [
"1",
"2"
],
"link": {
"authorityId": "312da284-a8fd-4c84-ae90-927539d6df93",
"authorityNaturalId": "n1234567890",
"linkingRuleId": 1,
"status": "NEW"
}
},
{
"tag": "600",
"content": "$a 393893 $b test",
"indicators": [
"1",
"2"
],
"link": {
"status": "ERROR",
"errorCauseCode": "101"
}
}
]
} |
Error cause types:
Error cause code | Description |
---|
101 | applicable authority was not found |
102 | 2 or more applicable authorities were found |
103 | auto linking feature is disabled |
TBD |
|
mod-entities-links
POST /links-suggestions/marc
Code Block |
---|
title | Request body |
---|
collapse | true |
---|
|
{
"records": [
{
"fields": [
{
"001": "393893"
},
{
"100": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart, Wolfgang Amadeus,"
},
{
"d": "1756-1791."
},
{
"0": "12345"
},
{
"9": "b9a5f035-de63-4e2c-92c2-07240c88b817"
}
],
"linkStatus": "ACTUAL"
}
},
{
"110": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart"
}
]
}
}
],
"leader": "01706ccm a2200361 4500"
}
]
} |
The response will include suggested links with the status "NEW"; fixed data and status "ACTUAL" for links, that had the status "ERROR"; links with the status "ERROR" and cause type for fields where a link can't be assigned.
Code Block |
---|
title | Request body |
---|
collapse | true |
---|
|
{
"records": [
{
"fields": [
{
"001": "393893"
},
{
"100": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart, Wolfgang Amadeus,"
},
{
"d": "1756-1791."
},
{
"0": "12345"
},
{
"9": "b9a5f035-de63-4e2c-92c2-07240c88b817"
}
],
"linkStatus": "ACTUAL"
}
},
{
"110": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart"
},
{
"0": "12345"
},
{
"9": "b9a5f035-de63-4e2c-92c2-07240c88b817"
}
],
"linkStatus": "NEW"
}
},
{
"130": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart"
}
],
"linkStatus": "ERROR",
"errorStatusCode": "101"
}
}
],
"leader": "01706ccm a2200361 4500"
}
]
} |
mod-source-record-storage
POST /source-storage/batch/parsed-records/fetch
Code Block |
---|
title | Request body |
---|
collapse | true |
---|
|
{
"conditions": {
"ids": [
"312da284-a8fd-4c84-ae90-927539d6df93",
"934fee76-89e5-4046-89f0-d812e5368e1c"
],
"idType": "EXTERNAL"
},
"data": {
"fieldsRange": "010,100-199"
},
"recordType": "MARC_AUTHORITY"
}
|
The response will include collection of records found by conditions, records will contains all related to a record ids and only fields that are included in fieldsRange field.
Code Block |
---|
title | Response body |
---|
collapse | true |
---|
|
{
"records": [
{
"id": "c56b70ce-4ef6-47ef-8bc3-c470bafa0b8c",
"externalIdsHolder": {
"authorityId": "b9a5f035-de63-4e2c-92c2-07240c89b817"
},
"recordType": "MARC_AUTHORITY",
"recordState": "ACTUAL",
"parsedRecord": {
"id": "c9db5d7a-e1d4-11e8-9f32-f2801f1b9fd1",
"content": {
"fields": [
{
"010": {
"ind1": " ",
"ind2": " ",
"subfields": [
{
"a": "2001000234"
}
]
}
},
{
"100": {
"ind1": "/",
"ind2": "/",
"subfields": [
{
"a": "Mozart, Wolfgang Amadeus"
},
{
"d": "1756-1791"
}
]
}
},
{
"110": {
"ind1": "1",
"ind2": "0",
"subfields": [
{
"a": "Works"
}
]
}
}
],
"leader": "01706ccm a2200361 4500"
}
}
}
],
"totalRecords": 1
}
|
mod-search
GET /search/authorities
New query parameter to add:
Parameter | Type | Note |
---|
includeNumberOfTitles | boolean (default = true) | If true do not perform a search for a number of linked instances |
Performance
Considerations
- Using mod-search for searching by naturalId instead of just doing a search in mod-source-record-storage has to decrease response time when the number of records in the system is more than 1M. (Using mod-search will be needed for possible future requirements to have automated linking not only by $0 but by some other data)
- Disabling the linked instances counting for the mod-search authority request have to decrease the time of response.
- Having only required fields in the mod-source-record-storage response will decrease the size of data that has to be transferred via HTTP. The necessity of this should be tested to define if such processing will decrease performance. 2 options there: get the record as jsonb from the marc_records table and retain only needed fields or construct a record field-by-field from the marc_indexers partitioned table.
- Using mod-search and mod-source-record-storage bulk endpoints will decrease response time.
Testing
Performance testing has to be done on the environment with:
- > 1M authority records
- > 1M MARC-based instance records
- Prepared MARC bib records that have >50 fields that are applicable for linking and all these fields should have $0 values matched to existing in the system authorities..
Tests are needed for:
- 1 request/sec
- 10 requests/sec
- 100 requests/sec
- 1000 requests/sec
Info |
---|
Based on testing results some performance improvements could be suggested if it will be required. |