STORY: Normalization MARC Authority record

Description

covers normalization rules for removing punctuation in MARC bib records.

And adds rules for parentheses and square brackets.

The purpose of this card is to apply comparable normalization rules to MARC Authority records to ensure consistency / uniformity for transforming MARC data into data graph data.

Punctuation normalization rule - preceding subfield

1. Check the MARC Authority record for the presence of a subfield from the list below.

"XX0$c": [","],

"XX0$d": [" ;", ","],

"XX0$v": [" ;", ","],

"110$b": ["."],

"111$c": [" :"],

"111$d": [","],

"111$q": ["."]

2. When a subfield from the list exists in the MARC Authority record, check for any trailing punctuation in the subfield that precedes the matching subfield.

e.g. 111$d": [","] → If the MARC Authority record includes a 111$d subfield, the rule says to look for a trailing comma in the subfield that immediately precedes the 111$d subfield.

Where there is a match, strip the trailing punctuation from the subfield. Else keep the value intact.

3. Repeat process for each punctuation normalization rule that applies to a particular MARC field

NOTE: Normalization rules involving a period mark (.) need further inspection before stripping a trailing period mark from a subfield. These rules are outside the scope of this card and addressed, separately, in . For the scope of this card, treat trailing periods like any other punctuation per the normalization rules.

 

Examples:

Naoe, Hiroji

Original MARC Authority record

Check for punctuation normalization rule

From the list, there is a match on the rule for 100 $d

"XX0$d": [" ;", ","],

which instructs to check for the presence of a trailing semicolon (;) or comma (,) in the preceding subfield.

Inspect preceding subfield for trailing punctuation that matches punctuation normalization rule

The preceding subfield (100 $a) does have a trailing comma, which should be stripped to normalize the value.

Rinse and repeat: Check for any other punctuation normalization rules for the record

No other punctuation normalization rules apply

Complete normalization

MARC Authority record normalized to

100 1 $a Naoe, Hiroji $d 1917-1994 after applying normalization rule

 

Gautama Buddha

Original MARC Authority record

Check for normalization rules

Matches on 100 $c and 100 $d

“XX0$c": [","]

Inspect preceding subfield for trailing punctuation that matches normalization rule

For 100 $c rule, check for trailing comma in preceding subfield → 100 $b

Rinse and repeat

For 100 $d rule, check for either trailing semicolon or trailing comma in preceding subfield → 100 $c

Changes to original MARC after applying

100 3 $a Gautama Buddha $b II $c Holy Roman Emperor $d 1194-1250 $j Jfield $q (Claudius Ceccon) $u Ufield

Environment

None

Potential Workaround

None

Attachments

10

Checklist

hide

Activity

Show:

Tetiana Kovalchuk January 22, 2025 at 11:32 AM

Tested on edev diku.

Build version: mod-linked-data-1.0.1-SNAPSHOT.0257b6f

Test cases and evidences attached.

Done

Details

Assignee

Reporter

Labels

Priority

Story Points

Sprint

Development Team

Citation

Release

Sunflower (R1 2025)

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created January 2, 2025 at 9:46 PM
Updated March 13, 2025 at 9:20 PM
Resolved January 24, 2025 at 12:13 PM
TestRail: Cases
TestRail: Runs