Source Record Matching

This document will illustrate a design for enhancements to matching in Data Import. This design will allow Data Import to support multiple results being returned by a Match Profile during a Data Import job execution.


Design

Source Record Storage Matching by REST API

A new Endpoint will be introduce to allow matching of source records in mod-source-records-storage

URL: /source-storage/records/matching

Method: POST

Body:

PropertyDescriptionValid ValuesDefault Value
logical_operatorlogical operator that will be used to combine defined filters."AND", "OR""AND"
filters → valuesCollection of values to match for equality

filters → fieldMARC field to match"000" - "999"
filters → indicator1

MARC indicator1 to match

"0" - "9", "a" - "z", "*", 
filters → indicator2

MARC indicator2 to match

"0" - "9", "a" - "z", "*"
filters → subfieldMARC subfield to matchsingle character
filters → matchType

Match type to consider for the query

EXACTLY_MATCHES, EXISTING_VALUE_CONTAINS_INCOMING_VALUE, INCOMING_VALUE_CONTAINS_EXISTING_VALUE, EXISTING_VALUE_BEGINS_WITH_INCOMING_VALUE, INCOMING_VALUE_BEGINS_WITH_EXISTING_VALUE, EXISTING_VALUE_ENDS_WITH_INCOMING_VALUE, INCOMING_VALUE_ENDS_WITH_EXISTING_VALUEEXACTLY_MATCHES
filters → qualifier

Match qualifier

BEGINS_WITH, ENDS_WITH, CONTAINS
filters → comparisonPartTypeonly compare part of a valueNUMERICS_ONLY, ALPHANUMERICS_ONLYNull
limitMaximum number of records to return

offsetstarting point when returning records


Sample Request
POST /source-storage/records/matching

{
  "logical_operator": "AND",
  "filters": [
    {
      "values": ["science", "technology"],
      "field": "245",
      "indicator1": "0",
      "subfield": "a",
      "matchType": "EXACTLY_MATCHES",
      "qualifier": "BEGINS_WITH",
	  "comparisonPartType": "NUMERICS_ONLY"
    },
    {
      "values": ["biology"],
      "field": "650",
      "subfield": "a",
      "matchType": "EXACTLY_MATCHES"
    }
  ],
  "limit": 5,
  "offset": 0
}


Match Processing

Location

Here is a list of the different Match Profiles possible at the time of writing, and where processing occurs:

Match ProfileProcessing Location
MARC BIB TO MARC BIB mod-source-record-storage

MARC BIB TO (Instance/Holdings/Item)

mod-inventory

MARC AUTH TO MARC AUTH

mod-source-record-storage
Static Value TO MARC BIBmod-source-record-storage
Static Value TO MARC AUTHmod-source-record-storage
Static Value TO (Instance/Holdings/Item)mod-inventory

The assignments above work do not work well when multiple results can be returned by a match profile. This is further complicated by sub-matches that could occur in other processing locations other than the originating match profile's processing location. It is not ideal to transport multiple result objects via one Kafka message due to size constraints and multiple kafka message to represent one match profile result will introduce more complexity. Match profile result retrieval should shift from Kafka messages to REST API calls.

Eligible locations for processing will be defined for each match profile. Inventory objects will supported by REST calls to mod-inventory-storage while MARC records will be supported by REST calls to mod-source-record-storage

Location Determination

A group of match profiles will begin at the first encountered match profile, continue through its sub-matches if available and terminate at match profiles right before action profiles. For example:

In the job profile diagram provided, there are two types of profiles: match profiles and action profiles. One match profile serves as a sub-match to another. The specific match profiles to be considered are "MARC TO MARC" and "STATIC VALUE TO INSTANCE." These profiles have predefined eligible locations. It needs to be determined whether mod-inventory can process this group. If so, the "MARC TO MARC" match profile will be forwarded to mod-inventory.

When all members of a match profile group can not be processed by one DI module, an error should be thrown prior to saving the job proflie.