Automate linking in quickMARC app
Introduction
This document outlines the design of a backend feature, which will allow users to automatically validate/update/create links for MARC bib fields to an authority record when editing a MARC bib record.
More details on a feature: [UXPROD-3874] and spike story: [MODELINKS-79]
Requirements
Functional Requirements
- The API must allow finding all applicable MARC authorities to control the MARC bib record based on the $0 subfield of the bib's fields.
- The API must provide only suggestions for links, and must not save any new data, saving will be performed by a user.
- The API must allow saving the MARC bib record even if linking a MARC bib field to a MARC authority record was unsuccessful.
- The API must allow sending a MARC record for links assignment with already existing links.
- The API must allow sending a MARC record for links assignment with not saved links and changes to the bib record.
- The API must provide a notification of what fields were successfully linked and what fields that are applicable for linking failed to link.
- The API must provide different types of failures for failed fields.
Non-Functional Requirements
- Automate linking should take no longer than ~2 seconds
Architecture
An API endpoint will be implemented in mod-quick-marc to provide UI with suggested links for the record. mod-quick-marc will work as a proxy module, that will call the mod-entities-links newly created endpoint. All main linking logic will be implemented in mod-entities-links. mod-search will be used as a search service for finding applicable MARC authorities. mod-source-record-storage will be used for fetching data needed to construct controllable fields in the MARC bib record. All interaction between modules is via HTTP.
Data Flow and Processing
- The UI sends a request to the backend API.
- mod-quick-marc receives the request and converts the record into an SRS-like format.
- mod-quick-marc sends a request to the mod-entities-links API.
- mod-entities-links receives the request and fetches linking rules from the database using cache.
- From MARC bib fields that are applicable for linking according to linking rules, $0 subfield values are extracted.
- $0 values are used for search authorities in mod-search. The current mod-search endpoint also calculates a number of already existing links in the instance index, this should be omitted to speed up the process. TBD: authorities naturalId is exist in the internal database table 'authority_data'. Should we use this data before doing a search in mod-search?
- mod-entities-links receives a collection of authority records and prepares a request to the mod-source-record-storage.
- mod-entities-links sends a request to the mod-source-record-storage bulk endpoint.
- mod-entities-links receives a collection of authority source records.
- mod-entities-links analyze results, prepare data for links according to linking rules, and set constructed links into the record.
- mod-quick-marc receives the record with links.
- mod-quick-marc converts the record into the appropriate format.
- UI receives the record with suggested links.
API Design
mod-quick-marc
POST /records-editor/links/suggestion
This endpoint will be used to find and provide UI with valid links for a record. The request will include a JSON payload with the record data:
The response will include suggested links with the status "NEW"; fixed data and status "ACTUAL" for links, that had the status "ERROR"; links with the status "ERROR" and cause type for fields where a link can't be assigned.
Error cause types:
Error cause code | Description |
---|---|
101 | applicable authority was not found |
102 | 2 or more applicable authorities were found |
TBD |
mod-entities-links
POST /links-suggestions/marc
The response will include suggested links with the status "NEW"; fixed data and status "ACTUAL" for links, that had the status "ERROR"; links with the status "ERROR" and cause type for fields where a link can't be assigned.
mod-source-record-storage
POST /source-storage/batch/parsed-records/fetch
The response will include collection of records found by conditions, records will contains all related to a record ids and only fields that are included in fieldsRange field.
mod-search
GET /search/authorities
New query parameter to add:
Parameter | Type | Note |
---|---|---|
includeNumberOfTitles | boolean (default = true) | If true do not perform a search for a number of linked instances |
Performance
Considerations
- Using mod-search for searching by naturalId instead of just doing a search in mod-source-record-storage has to decrease response time when the number of records in the system is more than 1M. (Using mod-search will be needed for possible future requirements to have automated linking not only by $0 but by some other data)
- Disabling the linked instances counting for the mod-search authority request have to decrease the time of response.
- Having only required fields in the mod-source-record-storage response will decrease the size of data that has to be transferred via HTTP. The necessity of this should be tested to define if such processing will decrease performance. 2 options there: get the record as jsonb from the marc_records table and retain only needed fields or construct a record field-by-field from the marc_indexers partitioned table.
- Using mod-search and mod-source-record-storage bulk endpoints will decrease response time.
Testing
Performance testing has to be done on the environment with:
- > 1M authority records
- > 1M MARC-based instance records
- Prepared MARC bib records that have >50 fields that are applicable for linking and all these fields should have $0 values matched to existing in the system authorities..
Tests are needed for:
- 1 request/sec
- 10 requests/sec
- 100 requests/sec
- 1000 requests/sec