Source Record Matching

Source Record Matching

This document will illustrate a design for enhancements to matching in Data Import. This design will allow Data Import to support multiple results being returned by a Match Profile during a Data Import job execution.

 

Design

Source Record Storage Matching by REST API

A new Endpoint will be introduce to allow matching of source records in mod-source-records-storage

URL: /source-storage/records/matching

Method: POST

Body:

Property

Description

Valid Values

Default Value

Property

Description

Valid Values

Default Value

logical_operator

logical operator that will be used to combine defined filters.

"AND", "OR"

"AND"

filters → values

Collection of values to match for equality

 

 

filters → field

MARC field to match

"000" - "999"

 

filters → indicator1

MARC indicator1 to match

"0" - "9", "a" - "z", "*", 

 

filters → indicator2

MARC indicator2 to match

"0" - "9", "a" - "z", "*"

 

filters → subfield

MARC subfield to match

single character

 

filters → matchType

Match type to consider for the query

EXACTLY_MATCHES, EXISTING_VALUE_CONTAINS_INCOMING_VALUE, INCOMING_VALUE_CONTAINS_EXISTING_VALUE, EXISTING_VALUE_BEGINS_WITH_INCOMING_VALUE, INCOMING_VALUE_BEGINS_WITH_EXISTING_VALUE, EXISTING_VALUE_ENDS_WITH_INCOMING_VALUE, INCOMING_VALUE_ENDS_WITH_EXISTING_VALUE

EXACTLY_MATCHES

filters → qualifier

Match qualifier

BEGINS_WITH, ENDS_WITH, CONTAINS

 

filters → comparisonPartType

only compare part of a value

NUMERICS_ONLY, ALPHANUMERICS_ONLY

Null

limit

Maximum number of records to return

 

 

offset

starting point when returning records

 

 

 

Sample Request
POST /source-storage/records/matching { "logical_operator": "AND", "filters": [ { "values": ["science", "technology"], "field": "245", "indicator1": "0", "subfield": "a", "matchType": "EXACTLY_MATCHES", "qualifier": "BEGINS_WITH", "comparisonPartType": "NUMERICS_ONLY" }, { "values": ["biology"], "field": "650", "subfield": "a", "matchType": "EXACTLY_MATCHES" } ], "limit": 5, "offset": 0 }

 

Match Processing

Location

Here is a list of the different Match Profiles possible at the time of writing, and where processing occurs:

Match Profile

Processing Location

Match Profile

Processing Location

MARC BIB TO MARC BIB 

mod-source-record-storage

MARC BIB TO (Instance/Holdings/Item)

mod-inventory

MARC AUTH TO MARC AUTH

mod-source-record-storage

Static Value TO MARC BIB

mod-source-record-storage

Static Value TO MARC AUTH

mod-source-record-storage

Static Value TO (Instance/Holdings/Item)

mod-inventory

The assignments above work do not work well when multiple results can be returned by a match profile. This is further complicated by sub-matches that could occur in other processing locations other than the originating match profile's processing location. It is not ideal to transport multiple result objects via one Kafka message due to size constraints and multiple kafka message to represent one match profile result will introduce more complexity. Match profile result retrieval should shift from Kafka messages to REST API calls.

Eligible locations for processing will be defined for each match profile. Inventory objects will supported by REST calls to mod-inventory-storage while MARC records will be supported by REST calls to mod-source-record-storage

Location Determination

A group of match profiles will begin at the first encountered match profile, continue through its sub-matches if available and terminate at match profiles right before action profiles. For example:

In the job profile diagram provided, there are two types of profiles: match profiles and action profiles. One match profile serves as a sub-match to another. The specific match profiles to be considered are "MARC TO MARC" and "STATIC VALUE TO INSTANCE." These profiles have predefined eligible locations. It needs to be determined whether mod-inventory can process this group. If so, the "MARC TO MARC" match profile will be forwarded to mod-inventory.

When all members of a match profile group can not be processed by one DI module, an error should be thrown prior to saving the job proflie.