Source Record Matching
This document will illustrate a design for enhancements to matching in Data Import. This design will allow Data Import to support multiple results being returned by a Match Profile during a Data Import job execution.
Design
Source Record Storage Matching by REST API
A new Endpoint will be introduce to allow matching of source records in mod-source-records-storage
URL: /source-storage/records/matching
Method: POST
Body:
Property | Description | Valid Values | Default Value |
---|---|---|---|
logical_operator | logical operator that will be used to combine defined filters. | "AND", "OR" | "AND" |
filters → values | Collection of values to match for equality | ||
filters → field | MARC field to match | "000" - "999" | |
filters → indicator1 | MARC indicator1 to match | "0" - "9", "a" - "z", "*", | |
filters → indicator2 | MARC indicator2 to match | "0" - "9", "a" - "z", "*" | |
filters → subfield | MARC subfield to match | single character | |
filters → matchType | Match type to consider for the query | EXACTLY_MATCHES, EXISTING_VALUE_CONTAINS_INCOMING_VALUE, INCOMING_VALUE_CONTAINS_EXISTING_VALUE, EXISTING_VALUE_BEGINS_WITH_INCOMING_VALUE, INCOMING_VALUE_BEGINS_WITH_EXISTING_VALUE, EXISTING_VALUE_ENDS_WITH_INCOMING_VALUE, INCOMING_VALUE_ENDS_WITH_EXISTING_VALUE | EXACTLY_MATCHES |
filters → qualifier | Match qualifier | BEGINS_WITH, ENDS_WITH, CONTAINS | |
filters → comparisonPartType | only compare part of a value | NUMERICS_ONLY, ALPHANUMERICS_ONLY | Null |
limit | Maximum number of records to return | ||
offset | starting point when returning records |
POST /source-storage/records/matching { "logical_operator": "AND", "filters": [ { "values": ["science", "technology"], "field": "245", "indicator1": "0", "subfield": "a", "matchType": "EXACTLY_MATCHES", "qualifier": "BEGINS_WITH", "comparisonPartType": "NUMERICS_ONLY" }, { "values": ["biology"], "field": "650", "subfield": "a", "matchType": "EXACTLY_MATCHES" } ], "limit": 5, "offset": 0 }
Match Processing
Location
Here is a list of the different Match Profiles possible at the time of writing, and where processing occurs:
Match Profile | Processing Location |
---|---|
MARC BIB TO MARC BIB | mod-source-record-storage |
MARC BIB TO (Instance/Holdings/Item) | mod-inventory |
MARC AUTH TO MARC AUTH | mod-source-record-storage |
Static Value TO MARC BIB | mod-source-record-storage |
Static Value TO MARC AUTH | mod-source-record-storage |
Static Value TO (Instance/Holdings/Item) | mod-inventory |
The assignments above work do not work well when multiple results can be returned by a match profile. This is further complicated by sub-matches that could occur in other processing locations other than the originating match profile's processing location. It is not ideal to transport multiple result objects via one Kafka message due to size constraints and multiple kafka message to represent one match profile result will introduce more complexity. Match profile result retrieval should shift from Kafka messages to REST API calls.
Eligible locations for processing will be defined for each match profile. Inventory objects will supported by REST calls to mod-inventory-storage while MARC records will be supported by REST calls to mod-source-record-storage
Location Determination
A group of match profiles will begin at the first encountered match profile, continue through its sub-matches if available and terminate at match profiles right before action profiles. For example:
In the job profile diagram provided, there are two types of profiles: match profiles and action profiles. One match profile serves as a sub-match to another. The specific match profiles to be considered are "MARC TO MARC" and "STATIC VALUE TO INSTANCE." These profiles have predefined eligible locations. It needs to be determined whether mod-inventory can process this group. If so, the "MARC TO MARC" match profile will be forwarded to mod-inventory.
When all members of a match profile group can not be processed by one DI module, an error should be thrown prior to saving the job proflie.