MODQM-112 Spike: Support a way to express a literal $

MODQM-112 - Getting issue details... STATUS

Overview

QM (mod-quick-marc) record has MARC fields with schema: 

quickMarc Field
"properties": {
	"tag": {
		"description": "MARC record tag name",
		"type": "string",
		"pattern": "^[0-9]{3}$",
		"example": "001"
	},
	"content": {
		"description": "MARC record tag content",
		"type": "object",
		"example": "$a 393893 $b 4545"
	},
	"indicators": {
		"description": "MARC record tag indicator values",
		"type": "array",
		"items": {
			"type": "string"
		},
		"minItems": 0,
		"maxItems": 2
	}
}

Examples:

Example 1
{
	"tag": "006",
	"content": {
		"Type": "p",
		"Form": "a"
	},
	"indicators": []
}
Example 2
{
	"tag": "001",
	"content": "393893",
	"indicators": []
}
Example 3
{
	"tag": "100",
	"content": "$a Mozart, Wolfgang Amadeus, $d 1756-1791.",
	"indicators": [
		"1",
		" "
	]
}

In fields like in example #3 ', $' with the following alphabetic or numeric character is used as a delimiter for subfields. On UI it looks like this:

SRS (mod-source-record-storage) record has a different schema for MARC fields. Examples:

Example 1
{
	"001": "393893"
}
Example 2
{
	"100": {
		"ind1": "1",
		"ind2": " ",
		"subfields": [
			{
				"a": "Mozart, Wolfgang Amadeus,"
			},
			{
				"d": "1756-1791."
			}
		]
	}
}

When QM is received a request to edit a MARC record, it parsed the field's content trying to detect the subfield's code and subfield's data. In cases when the subfield's data contains a '$' sign, QM defines it as a delimiter, sends it to SRS, and that causes data corruption.

Goals

  1. Find a way to determine a literal $ not as a subfield delimiter
  2. Identify any impact to records already stored in SRS. Is needed a plan to support existing records? Or is just a re-import the only option?

Approach

Option 1: Use another symbol or combination of symbols as a subfields delimiter or escape $ in the subfield's data

PROS

CONS

Minimum time effortUser will be responsible for correct symbols input

Covers not all cases because there could be any symbol or combination in the subfield's data

Option 2: Change QM field schema and changes to UI

1. Change QM field schema

There is 3 type of fields that could be in MARC record in QM:

  • data field (has subfields and indicators)
  • variable control field (has variable-length string content)
  • fixed control field (has fixed-length content that split into fixed-length items)

For each of these fields, we should have different schemas with only one common field "tag". Data field's subfields will be separate objects. With this approach there will be no need to parse QM record field to detect subfields, all symbols in subfield will save as it is.

Base field schema

{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Base MARC record field",
"type": "object",
"properties": {
"tag": {
"description": "MARC record tag name",
"type": "string",
"pattern": "^[0-9]{3}$",
"example": "001"
}
},
"required": [
"tag"
],
"additionalProperties": false
}

Variable control field schema 

{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Variable control field",
"type": "object",
"allOf": [{
"$ref": "field.json"
}],
"properties": {
"content": {
"description": "Variable control field content",
"type": "string",
"example": "393893"
}
},
"required": [
"content"
],
"additionalProperties": false
}

Fixed control field schema

{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Fixed control field",
"type": "object",
"allOf": [{
"$ref": "field.json"
}],
"properties": {
"content": {
"description": "Fixed control field content",
"type": "array",
"items": {
"$ref": "fixedLengthItem.json"
}
}
},
"required": [
"content"
],
"additionalProperties": false
}
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Fixed length item",
"type": "object",
"properties": {
"code": {
"description": "Fixed length item code",
"type": "string",
"example": "AccM"
},
"name": {
"description": "Fixed length item name",
"type": "string",
"example": "Accompanying Matter",
"readOnly": true
},
"position": {
"description": "Fixed length item position",
"type": "integer",
"example": 7,
"readOnly": true
},
"length": {
"description": "Fixed length item length",
"type": "integer",
"example": 6,
"readOnly": true
},
"isArray": {
"description": "Indicates that item should display as array",
"type": "boolean",
"example": true,
"readOnly": true
},
"content": {
"description": "Fixed length item content",
"type": "string",
"example": "abcdef"
}
},
"required": [
"code",
"content"
],
"additionalProperties": false
}

Data field schema

{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Data field",
"type": "object",
"allOf": [{
"$ref": "field.json"
}],
"properties": {
"subfields": {
"description": "Data field subfields",
"type": "array",
"items": {
"$ref": "subfield.json"
}
},
"indicators": {
"description": "Data field indicators",
"type": "array",
"items": {
"type": "string",
"maxLength": 1
},
"minItems": 0,
"maxItems": 2
}
},
"required": [
"subfields",
"indicators"
],
"additionalProperties": false
}

{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "MARC record data field subfield",
"type": "object",
"properties": {
"code": {
"description": "Subfield code",
"type": "string",
"pattern": "^[0-9a-z]$",
"example": "a"
},
"value": {
"description": "Subfield value",
"type": "string",
"example": "native"
}
},
"required": [
"code",
"value"
],
"additionalProperties": false
}

Example of QM record after field schema change

{
"parsedRecordId": "c9db5d7a-e1d4-11e8-9f32-f2801f1b9fd1",
"parsedRecordDtoId": "c56b70ce-4ef6-47ef-8bc3-c470bafa0b8c",
"suppressDiscovery": false,
"instanceId": "b9a5f035-de63-4e2c-92c2-07240c89b817",
"leader": "01706ccm\\a2200361\\\\ 4500",
"fields": [
{
"tag": "001",
"content": "in0000123"
},
{
"tag": "005",
"content": "20141107001016.0"
},
{
"tag": "006",
"content": [
{
"code": "Type",
"name": "Type",
"content": "с",
"position": 0,
"length": 1,
"isArray": false
},
{
"code": "AccM",
"name": "Accompanying Matter",
"content": "ghjdsa",
"position": 7,
"length": 6,
"isArray": true
}
]
},
{
"tag": "035",
"subfields": [
{
"code": "a",
"value": "(OCoLC)63611770"
},
{
"code": "b",
"value": "393893"
}
],
"indicators": [
"i",
"d"
]
},
{
"tag": "245",
"subfields": [
{
"code": "a",
"value": "How to earn $100"
},
{
"code": "b",
"value": "today"
}
],
"indicators": [
"i",
"d"
]
}
]
}

2. UI updates

Change logic to new schemes

Backend will send data to UI and received requests with formatted according to new schemes. So logic on UI must be rewritten.

UI update vision

In QM on UI there will be some button like "+" in each field that supports subfields. If user wants to add new subfield, user should press "+", input subfield code and only after that input subfield content. 

This or some other change to UI is needed to help UI make it possible work with new schemes. 

PROS

CONS

Solves all possible problems with symbolsMore time effort from BE and FE teams

Dividing fields by there type will help for future development

and to support other types of records


Rejection of the current schemes probably should improve performance

since there will be no regex parsing of fields


Answering questions

QuestionAnswer

Is there any impact of scheme changes to other modules?

No
Is there any impact to records already stored in SRS. Is needed a plan to support existing records? Or is just a re-import the only option?

Records that already stored in SRS and have '$' sign in subfield are already corrupted.

There is no solution to fix them. Re-import is the only option.

Identify all modules/services involvedmod-quick-marc and ui-quick-marc only
How do data import and SRS work with '$' sign?

To data import marc records comes in raw format that have special structure in which it is known exactlywhen data starts and finished, there is special not-printable symbols for this.

SRS store records in format where each subfield is a separate field.