Authority API Moving

Overview

The main goal of this document is to provide the benefits of moving the Authority API from the mod-inventory-storage module to the mod-entities-links.

NFRs:

  1. Data integrity: Centralized storage for authority data should remove possible inconsistencies between microservices.
  2. Maintainability: Implementation of features related to handling and managing authorities should not require overhead on crosschecking between mod-inventory-storage and mod-entities-links.
  3. Performance: Removal of HTTP requests and Kafka interaction overhead should improve the performance of linking and import.

Baseline Architecture

Interactions between the mod-entities-links and mod-inventory-storage modules:

  1. Event Consumption: The mod-entities-links module consumes events from the mod-inventory-storage module when authority records are updated or deleted. This ensures that the links stay synchronized with any changes happening in the inventory module.

  2. Statistical Authority Data Storage: The mod-entities-links module stores statistical authority data by partially copying certain fields that are used for linking, link creating/updating, and automating linking. So module will maintain an almost full copy of the inventory database when most authorities are being linked.

  3. Integration with mod-search: The mod-entities-links module makes calls to the mod-search module to retrieve authority data for suggesting links in the automate linking feature. 

Target Architecture

1. Required

  1. Rename mod-entities-links to mod-authority-manager.
  2. Fully move Authority API, Authority Note Types API, and Authority Source Files API from mod-inventory-storage to mod-authority-manager. This API provides just CRUD operations and does not have any business logic.
  3. Move authority reindex API.
  4. Adjust mod-authority-manager to use an internal database instead of interacting with mod-inventory-storage and mod-search.
  5. Disable the above APIs in mod-inventory-storage and remove APIs implementation and enable it in mod-authority-manager. The dependent UI and BE modules will not experience any differences.
  6. Create a migration script for existing authorities.

Estimation

PointEstimation
1.11
1.2

5

1.33
1.45
1.52
1.65
Total21

2. Optional/Future

  1. Consume data-import authority events to increase the performance of authority data-import flow. (8 SPs)
  2. Simplify authority stats generating
  3. Move mapping rules

Benefits

Moving the Authority API from the mod-inventory-storage module to the mod-entities-links module can bring several benefits, particularly in terms of reducing dependencies, minimizing interactions, and eliminating duplication of authorities. Here are some arguments to support this movement:

  1. Dependency Reduction: By moving the Authority API to the mod-entities-links module, the number of dependencies required by the mod-inventory-storage module can be reduced. This streamlining of dependencies can lead to a more modular and efficient architecture.

  2. Minimizing Interactions: Currently, mod-entities-links and mod-inventory-storage modules may have significant interactions related to authorities. This can result in increased complexity and potential performance bottlenecks. By consolidating the Authority API within the mod-entities-links module, the number of interactions between the modules can be minimized, leading to improved performance and better code maintenance.

  3. Eliminating Duplication: Having authorities stored in both the mod-entities-links and mod-inventory-storage databases can lead to data redundancy and potential synchronization issues. Merging the Authority API into the mod-entities-links module allows for a single source of truth for authority-related data. This eliminates the need for duplicating authorities and ensures data consistency across the system.

  4. Simplified Development and Maintenance: Moving the Authority API to the mod-entities-links module can simplify the development and maintenance process. Developers will have a clearer understanding of where to find and update authority-related functionality, leading to improved code maintainability and reduced development efforts.

  5. Improved Scalability and Extensibility: With a more streamlined architecture and reduced dependencies, the mod-entities-links module can become more scalable and extensible. The separation of concerns and elimination of duplication allows for easier integration of new features and enhancements, facilitating future system growth.

  6. Enhanced Data Integrity: By centralizing authority-related operations within the mod-entities-links module, stricter data integrity checks and validations can be enforced. This ensures that authority data remains consistent and accurate throughout the system.

  7. New functionality: Moving the Authority API from the mod-inventory-storage module to the mod-entities-links module will enable effective management of links and provide new capabilities, such as generating blind authority reports and creating a facet in the MARC Authority app for linked and not-linked authorities. 
  8. Timeliness: It is essential to initiate the migration of the Authority API now, before further authority functionality is implemented. As the system evolves, introducing more features and functionalities can lead to increased complexity and potential conflicts. By moving the Authority API at this stage, you can establish a solid foundation and ensure that future authority-related functionalities align with the consolidated approach. This proactive approach saves time and effort by avoiding potential rework and mitigating compatibility issues in the future.

Overall, moving the Authority API from the mod-inventory-storage module to the mod-entities-links module offers the advantages of reduced dependencies, minimized interactions, elimination of duplication, simplified development and maintenance, improved scalability, and enhanced data integrity. These benefits contribute to a more efficient, maintainable, and robust system architecture. 

Open development questions:

  1. How to handle permissions?
  2. How to handle Poppy release migrations? 


Questions for POs


AreaQuestionAnswer
Duplicate IdentifierDo we want to implement authority validations that prevent saving an authority record if a similar authority already exists in the system, based on either the identifier (naturalId or 001/010a) or the heading?

KG - Not for this initial implementation. There is some logic we need to support for LOC related to 010 always having 12 characters. Also based on looking at some of the National Library of Poland authority records, we might have a situation where the 001/010a is the same as LOC. We need to more authority file analysis before implementation.

MM: We have spoken with NUKAT (Poland) and we suggest them that they should move fields 010 to 035, because of Folio rules. So far we have no feedback from them. If it will be my system I would prevenet saving record with the same content in fields 001/010a. But do not prevent saving record with the same heading. 

KG:  Marcin Mystkowski, I agree but I think we need to do some analysis to the libraries that have already loaded authority records into FOLIO. Also we will need to do this for LOC as it is a requirement to prevent duplicates. So Pavlo Smahin - we want to do it but I think we need more requirements analysis. Is it okay to implement this requirement as a phase 2 so we have time to define requirements? 

Multiple Headings and TypesDo we want to enforce a validation rule that restricts saving an authority record if it contains multiple headings of the same type (e.g., several personal names) or multiple types (e.g., a combination of personal name and geographic title)?

KG: Yes. We need a rule that the authority record can only have one 1XX. I thought this rule was already in place. Or did I misunderstand the question?   Pavlo Smahin, before we implement, let me check on handling a few of the 1XX we do not support now.  

NOTE - We will support more than the 1XX values we support today. I have received feedback that some customers have authority records whereby 1XX is not on the list outlined in MARC authority documentation: 100, 110, 111, 130, 147, 148, 150, 151, 155, 162, 180, 181, 182, and 185  (https://www.loc.gov/marc/authority/ad1xx3xx.html)

MM: Yes UXPROD-4375 - Getting issue details... STATUS point 5.2

Tracing Field ConsistencyDo we want to implement a validation that ensures the "see from" and "see also from" tracing fields accurately reflect the heading? For example, if the heading is a personal name, should the tracing field be a meeting name?

KG: So the question is the following, should we apply a validation rule that if the heading is 100 that the 4XX must be 400 and the 5XX must be 500? No. See examples 

https://lccn.loc.gov/no2007000953

https://lccn.loc.gov/no2019024399

MM: We did it in our legacy system, after few moths we had to gave up with this validation So no. Check https://lccn.loc.gov/n79043402

NOTE - When we allow for creating a local authority via UI or support creating local authority records via DI then we should consider tenant level MARC validation rules related to this question. 

Duplicate Headings

Do we want to prevent saving an authority record if a similar heading with the same heading type already exists in the system?

For example, having 2 records that have "Apple Inc." in 110 field.

KG: 

Pavlo Smahin  - can you provide an example? 

Hey Pavlo Smahin  - Allow the save. Eventually we will have a duplicate headings report that allow the cataloger to make the correction if necessary. 

Duplicate Tracing FieldsDo we want to address duplicates in search results by cleaning up duplicate tracing fields? For instance, if multiple 400 fields with the same values exist in a MARC record, should the search results remove the duplicates?

KG:  No. The example seems very edge case. I cannot imagine this will happen very often. We can always support as a tenant level MARC validation rules check AND/OR a report that allows catalogers to correct these issues.  

MM: If the record has got the same 4xx with the same content it looks like mistake. If we will remove duplicates librarian won't be able to see his mistake and easily correct the data