STORY: Normalization - Parentheses and Square Brackets

Description

The purpose of this card is to apply a normalization rule to MARC bibliographic records to remove parentheses in selected MARC fields, so that the Linked Data Editor is consistent with Builde.

Currently - if an incoming MARC bibliographic record includes parentheses, the parentheses are retained as part of the data graph. For the purposes of creating new resources (or matching to existing resources) in a data graph, Builde states that parentheses should be stripped during processing.

For example, consider author M. H. Abrams, who wrote Natural supernaturalism in 1971.

The incoming MARC bibliographic record includes 100$q, which lists an alternative name - Meyer Howard - in parentheses.

For consistency with Builde, the parentheses around the alternative name value should be stripped during processing, so that the end value is “Meyer Howard”.

Normalization rules

1. Remove parentheses / brackets

  • opening and/or closing parenthesis : ( )

  • opening and/or closing square brackets : [ ]

if any occur in MARC subfields (list follows below).

2. Remove whitespace

  • Specifically, remove any enclosing whitespace remaining after step 1

 

Example:

Original MARC record

$260 $a [ Washington, DC] : $b John Snow International

After Step 1

$260 $a  Washington, DC : $b John Snow International

(260$a has an extra space after removing square brackets around Washington, DC)

After Step 2:

$260 $a Washington, DC : $b John Snow International

 

 

LIST OF MARC SUBFIELDS FOR NORMALIZATION

Generalized MARC subfields

XX0$q

1XX$e

1XX$4

6XX$e

7XX$4

71X$e

79X$e

Specific MARC subfields

100$d

100$2

100$3

100$4

100$5

110$b

110$2

110$3

110$4

110$5

111$c

111$d

111$n

111$2

111$3

111$4

111$5

130$s

130$0

130$2

130$3

130$5

245$h

250$a

260$a

260$b

260$c

260$e

260$f

260$g

264$a

264$b

264$c

336$$

337$$

338$$

490$a

611$c

611$d

611$n

630$0

700$e

700$q

700$2

711$c

711$d

711$n

730$h

730$0

740$h

760$h

765$h

800$d

800$2

810$d

810$2

811$d

811$2

830$h

830$0

NOTE: The parenthesis or bracket do not have to be “paired” in a subfield to trigger this normalization rule; nor do the characters need to occur as leading or trailing in the subfield.

Environment

None

Potential Workaround

None

Attachments

7
  • 23 Jan 2025, 11:00 AM
  • 23 Jan 2025, 11:00 AM
  • 23 Jan 2025, 11:00 AM
  • 23 Jan 2025, 11:00 AM
  • 23 Jan 2025, 11:00 AM
  • 23 Jan 2025, 11:00 AM
  • 02 Jan 2025, 05:32 PM

Checklist

hide

Activity

Show:

Tetiana Kovalchuk January 23, 2025 at 12:17 PM
Edited

Tested on folio sprint test env

Build version: Linked Data Module (mod-linked-data-1.0.1-SNAPSHOT.99)

Verified for supported fields (260, 264, 100, 610, 711, etc.)

Test cases and evidences added.

Done

Details

Assignee

Reporter

Labels

Priority

Story Points

Sprint

Development Team

Citation

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created January 2, 2025 at 5:32 PM
Updated March 4, 2025 at 7:51 PM
Resolved January 24, 2025 at 12:11 PM
TestRail: Cases
TestRail: Runs