Done
Details
Details
Assignee
Doug Loynes
Doug LoynesReporter
Doug Loynes
Doug LoynesLabels
Priority
Story Points
5
Sprint
None
Development Team
Citation
TestRail: Cases
Open TestRail: Cases
TestRail: Runs
Open TestRail: Runs
Created December 17, 2024 at 10:37 PM
Updated March 4, 2025 at 7:51 PM
Resolved January 9, 2025 at 9:30 PM
Current situation: The presence of punctuation in some MARC fields affects fingerprinting algorithms; such that the system creates new resources that would otherwise match against existing resources.
The purpose of this card is to apply normalization rules to remove punctuation from original MARC records to help prevent creating new resources unintentionally. This will help to keep the data graph smaller and cleaner.
The scope of this card is limited to applying punctuation normalization to MARC fields that have been fully mapped within the Linked Data Editor.
The JSON file - copied below - is an exhaustive list of normalization routines used in BiblioGraph.
Each line includes one or more punctuation marks in between brackets. The rule instructs the system to look for trailing punctuation for the preceding subfield in the MARC record. If any of the punctuation marks is found at the end of the preceding subfield, strip the punctuation mark from the end of the subfield.
Example
"240$f": ["."] → if the incoming MARC record has a 240 $f field, check the subfield immediately preceding the 240 $f for the presence of a trailing period (“.”). If the preceding subfield ends in a period, strip the period from the value of the preceding subfield.
NOTE: Always ignore the first subfield of each MARC tag.
Multiple rules could apply to any given subfield. Where there are multiple rules, apply the rules one at a time following this priority list:
lookups = [
'XX{}${}'.format(tag[2], code),
'{}XX$X'.format(tag[0]),
'{}XX${}'.format(tag[0], code),
'{}X$X'.format(tag[:2]),
'{}X${}'.format(tag[:2], code),
'{}${}'.format(tag, code),
]
List of punctuation normalization rules, by MARC tag.
For the following MARC tags, remove the punctuation from the last sub field.