Validation Service
Summary
Goal: When a user creates/updates a record then more robust MARC validation should happen. Feature intent is to define default validation rules and allow the library (tenant-level setting) to set custom rules. "Q" release focus will be on defining the technical design approach.
To be considered:
The tech design should not be MARC-centric meaning that it should be flexible enough to assign a set of validation rules per record type + format.
The tech design should also allow for any flow that creates/updates MARC to opt-in to using these validation rules
This feature will however focus on MARC. Additional features must be created to support other formats.
Technical design will focus on MARC bibliographic and MARC authority records.
Requirements
Functional Requirements
Prod tickets:
UXPROD-3940: MARC21 record validation support - Phase 1Closed
UXPROD-4549: MARC21 record validation support - Phase 2 - ImplementationIn Review
Arch tickets:
Non-functional Requirements
NFR Scorecard: UXPROD-4549 NFR Scorecard
Reusability:
validation should be reusable by DI or UI
Extensibility:
support for different formats
Performance:
UI Validation should support at least 30 simultaneous users ECS and Non-ECS
should not impact DI process significantly
Configurability:
tenants should be able to configure additional validation rules
Solution Options
Option | Status | Description | Pros | Cons | |
---|---|---|---|---|---|
1 | Extend Existing Validation mechanisms |
| |||
2 | Create Validation Library |
| |||
3 | Create Validation Module | Target |
|
|
Target Architecture
Option 2. Create Validation Library
The target architecture is aimed to provide a loosely coupled solution that can be reused by different FOLIO modules and provide a common approach to validation logic. It will require the following actions:
Extract validation logic into a library
Provide the capability to configure the set of rules through external configuration, and extend configuration by user-defined rules
Component diagram
UI Sequence Diagram
For UI communication validation should be persisted in the database only when the marc document is saved. This means that when there are validation errors then there is no need to persist validation results.
Option 3. Create Validation Module
The target architecture is aimed to provide a loosely coupled solution that can be reused by different FOLIO modules and provide the capability for extension with other bibliographic formats like BIBFRAME. It will require the following actions:
Extract validation logic into a new module
Provide the capability to configure the set of rules through external configuration, and extend configuration by user-defined rules
Provide an asynchronous way of communication for the data-import module to reduce the performance impact on the procedure
Component Diagram
The target solution is aimed to provide an independent mechanism for the validation capability of different formats. The reasoning for the implementation of the validation mechanism as an independent module is following:
Independent scalability: The data import procedure might create a significant load on the validation mechanism and thus will affect validation functionality for UI interactions
Context boundaries: The new module provides the capability to support different types of configurable validations for multiple data formats. The synchronous and asynchronous APIs would allow other modules to reuse the module capabilities without affecting MARC-specific functionality that is used in
mod-quick-marc
UI Sequence Diagram
For UI communication validation should be persisted in the database only when the marc document is saved. This means that when there are validation errors then there is no need to persist validation results.
DI Sequence Diagram
Data-import validation should be done in an asynchronous way through Kafka topic. Validation results should be persisted with the document identifier and data import job identifier.
ERD Diagram
The approach is aimed at persisting validation profiles and validation rules in the database. Persisting validation rules in the database allows flexibility to configure custom rules by a tenant. The diagram below describes the entity relationships:
Tenant - is represented in the module as a database schema name
Validation profile - distinguishes interaction method from other modules/UI modules. As they are represented by different interfaces they can be omitted from the database structure.
Validation Rule (Marc Rule) - represents preconfigured rules and custom tenant-level rules for validations in the database. Preconfigured rules should be provided as a configuration file in the resources of the module or as a db migration script, and custom rules - should be persisted through the REST APIs.
Validation Result - represents a result of validation for a single validation rule applied to a single object.
Sample DB Schema
REST API for tenant-level rules configuration
The REST API domain model should be built similarly to the existing specification approach that is used in other ILS’s with alignment to FOLIO specifics. Samples:
https://format.gbv.de/schema/avram/specification#record-validation
MARC21 structure in JSON - Metadata Quality Assessment Framework
POC:
The goal of POC is to understand how to technically allow custom rules configuration and what mechanisms would allow customization and extension of validations for different types of rules and document formats. Results are the following:
Format for system level and custom validations can be represented in
json
format. Below is an example:Serialization and deserialization of different types of rules can be achieved with
JsonTypeInfo
andJsonSubTypes
annotations offaster-xml-jackson
libraryPOC Source Code:
Validation Rules in JSON format:
Questions:
How and in what form validation results should be displayed? Some report form or UI?
UI validation results should be represented as described in https://folio-org.atlassian.net/browse/MODQM-414
Is it required to save historical data for validations of different versions of the same document?
No. Only the latest version validation results should be present