Validation Service

Summary

Goal: When a user creates/updates a record then more robust MARC validation should happen. Feature intent is to define default validation rules and allow the library (tenant-level setting) to set custom rules. "Q" release focus will be on defining the technical design approach.

To be considered:

  • The tech design should not be MARC-centric meaning that it should be flexible enough to assign a set of validation rules per record type + format. 

  • The tech design should also allow for any flow that creates/updates MARC to opt-in to using these validation rules 

  • This feature will however focus on MARC. Additional features must be created to support other formats. 

  • Technical design will focus on MARC bibliographic and MARC authority records. 

Requirements

Functional Requirements

MARC validation - Logic

Non-functional Requirements

  1. Reusability - validation should be reusable by DI or UI

  2. Extensibility- support for different formats

  3. Performance - should not impact DI process significantly

  4. Configurability - tenants should be able to configure additional validation rules

Solution Options

Option

Status

Description

Pros

Cons

Option

Status

Description

Pros

Cons

1

Extend Existing Validation mechanisms







  • Hard-to-reuse mechanism

2

Create Validation Library







  • Persistence of the validation results requires additional implementation in particular module

3

Create Validation Module

Target



  • Allows extensibility and possibility to extend the module with validations for additional formats (e.g. BIBFRAME)

  • Inter-service communication lag

Target Architecture

Option 2. Create Validation Library

The target architecture is aimed to provide a loosely coupled solution that can be reused by different FOLIO modules and provide a common approach to validation logic. It will require the following actions:

  • Extract validation logic into a library

  • Provide the capability to configure the set of rules through external configuration, and extend configuration by user-defined rules

Component diagram

UI Sequence Diagram

For UI communication validation should be persisted in the database only when the marc document is saved. This means that when there are validation errors then there is no need to persist validation results.

 

Option 3. Create Validation Module

The target architecture is aimed to provide a loosely coupled solution that can be reused by different FOLIO modules and provide the capability for extension with other bibliographic formats like BIBFRAME. It will require the following actions:

  • Extract validation logic into a new module

  • Provide the capability to configure the set of rules through external configuration, and extend configuration by user-defined rules

  • Provide an asynchronous way of communication for the data-import module to reduce the performance impact on the procedure

Component Diagram

The target solution is aimed to provide an independent mechanism for the validation capability of different formats. The reasoning for the implementation of the validation mechanism as an independent module is following:

  1. Independent scalability: The data import procedure might create a significant load on the validation mechanism and thus will affect validation functionality for UI interactions

  2. Context boundaries: The new module provides the capability to support different types of configurable validations for multiple data formats. The synchronous and asynchronous APIs would allow other modules to reuse the module capabilities without affecting MARC-specific functionality that is used in mod-quick-marc



UI Sequence Diagram

For UI communication validation should be persisted in the database only when the marc document is saved. This means that when there are validation errors then there is no need to persist validation results.



DI Sequence Diagram

Data-import validation should be done in an asynchronous way through Kafka topic. Validation results should be persisted with the document identifier and data import job identifier.

ERD Diagram

The approach is aimed at persisting validation profiles and validation rules in the database. Persisting validation rules in the database allows flexibility to configure custom rules by a tenant. The diagram below describes the entity relationships:

  • Tenant - is represented in the module as a database schema name

  • Validation profile - distinguishes interaction method from other modules/UI modules. As they are represented by different interfaces they can be omitted from the database structure.

  • Validation Rule (Marc Rule) - represents preconfigured rules and custom tenant-level rules for validations in the database. Preconfigured rules should be provided as a configuration file in the resources of the module or as a db migration script, and custom rules - should be persisted through the REST APIs.

  • Validation Result - represents a result of validation for a single validation rule applied to a single object.

Sample DB Schema

 

 

 

diagram-20240410-055046.png

 

create table marc_rules ( id uuid primary key, source varchar not null, --enum of <<bib, authority>>, tag varchar not null, type varchar not null, --enum of <<system-level, local, non-local>>, label varchar not null, repeatable boolean not null, required boolean not null, help_url varchar, validations jsonb, -- [json] list of applicable validations, result_type varchar, --enum of <<ERROR, WARNING>> unique (source, tag, type) ); create table indicators ( id uuid primary key, rule_id uuid not null, indicator smallint, label varchar, codes jsonb, --[json] map of label-value, foreign key (rule_id) references marc_rules (id) ); create table subfields ( id uuid primary key, rule_id uuid not null, subfield char, label varchar not null, repeatable boolean not null, required boolean not null, help_url varchar, validations jsonb, -- [json] list of applicable validations, result_type varchar, --enum of <<ERROR, WARNING>> codes jsonb, --[json] map of label-value, foreign key (rule_id) references marc_rules (id) ); create table validation_results ( id uuid primary key, rule_id uuid not null, request_id UUID, --external id result_type varchar, --enum of <<ERROR, WARNING>> message varchar, foreign key (rule_id) references marc_rules (id) );

REST API for tenant-level rules configuration

The REST API domain model should be built similarly to the existing specification approach that is used in other ILS’s with alignment to FOLIO specifics. Samples:

POC:

The goal of POC is to understand how to technically allow custom rules configuration and what mechanisms would allow customization and extension of validations for different types of rules and document formats. Results are the following:

  1. Format for system level and custom validations can be represented in json format. Below is an example:

  2. Serialization and deserialization of different types of rules can be achieved with JsonTypeInfo and JsonSubTypes annotations of faster-xml-jackson library

  3. POC Source Code:

  4. Validation Rules in JSON format:

{ "rules": [ { "id": "no-empty-fields", "type": "MARC", "fieldType": "NON_LOCAL", "fieldSelector": "ANY", "validations": [ { "type": "NOT_EMPTY" } ] }, { "id": "leader-validations", "type": "MARC", "fieldSelector": "LDR", "fieldType": "SYSTEM", "validations": [ { "type": "LENGTH", "length": 24 }, { "type": "ALLOWED_VALUES", "position": 5, "values": [ "a", "c", "d" ] } ] }, { "id": "1xx-unique", "type": "MARC", "fieldSelector": "101", "fieldType": "SYSTEM", "validations": [ { "type": "NON_REPEATABLE" } ] }, { "id": "system-001", "type": "MARC", "fieldSelector": "001", "fieldType": "SYSTEM", "validations": [ { "type": "REQUIRED" }, { "type": "NON_REPEATABLE" } ] }, { "id": "system-010", "type": "MARC", "fieldSelector": "010", "fieldType": "SYSTEM", "validations": [ { "type": "SUBFIELD_EXACTLY_ONE", "subfield": "a" } ] }, { "id": "custom", "type": "MARC", "fieldSelector": "999", "fieldType": "LOCAL", "validations": [ { "type": "REQUIRED", "dependsOn": { "tag": "010", "subfield": "a" } } ] } ] }

Questions:

  1. How and in what form validation results should be displayed? Some report form or UI?

    1. UI validation results should be represented as described in https://folio-org.atlassian.net/browse/MODQM-414

  2. Is it required to save historical data for validations of different versions of the same document?

    1. No. Only the latest version validation results should be present