Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Table of Contents | ||
---|---|---|
|
Goal and requirements
The goals of importing MARC Authority records into Source Record Storage (SRS) are:
- Create new MARC authority records
- Update/Modify existing MARC authority records using matching/mapping profiles
- Search imported MARC authority records by means of SRS MARC Query API
Requirements
- Supported file extensions/formats: All files with Data type = MARC
- Honor MARC Field protection entries
- Support mapping profiles to populate authority records
- Support match profiles that allow matching MARC authority records
- Support job profiles to execute import requests
- Import actions to support:
- Create record
- Update entire record
- Modify record (not for initial release)
- Delete record (not for initial release)
- When a user imports then
- generate a HRID
- generate UUID
- store imported authority records in SRS
Creation of MARC authority records
Creation of new MARC authority records should be similar to the process of creating new MARC Bib records in the sense that both of these records share the same format which can be accepted by SRM/SRS. Any extensions to the process that customize MARC Bib record creation are done either by external modules (like mod-inventory
) or hidden behind general interfaces (like EventProcessor
). Having that in mind it can be assumed that MARC Authority records creation should follow the same pattern as for MARC Bib. So let's revise a very high level view of the creation flow.
High level description of existing MARC Bib records creation flow
Drawio | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
- SRM receives request to process a batch of MARC records in raw format (Json/Marc/XML)
- entry point:
EventDrivenChunkProcessingServiceImpl
- existing job execution entry initialized if necessary and its status is set to
PARSING_IN_PROGRESS
- incoming records parsed from raw format and kept in the objects of
Record
type with other supplemental data
- entry point:
DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED
event is sent to notify that raw records have been parsed- journal records created for the parsed records
2. SRS retrieves parsed records from the appropriate event topic and saves them to MARC records database
- entry point:
ParsedMarcChunksKafkaHandler
- parsed records transformed into DB representation and saved
- raw records saved
- initial generation created
DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED
event is sent to notify that new records have been saved
- entry point:
3. SRM gets saved records from the event topic and notifies other consumers that the records are created
- entry point:
StoredMarcChunksKafkaHandler
DI_SRS_MARC_BIB_RECORD_CREATED
event is sent for each parsed record with the following payload
- entry point:
context
property contains:
- encoded Record object mapped to
MARC_BIBLIOGRAPHIC
entity type
- encoded Record object mapped to
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
{ "id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0", "snapshotId":"06efa5b0-6d59-41bc-8207-801df6fbf22f", "matchedId":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0", "generation":0, "recordType":"MARC", "rawRecord":{ "id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0", "content":"01240cas a2200397 4500001000700000005001700007008004100024010001700065022001400082035002600096035002200122035001100144035001900155040004400174050001500218082001100233222004200244245004300286260004700329265003800376300001500414310002200429321002500451362002300476570002900499650003300528650004500561655004200606700004500648853001800693863002300711902001600734905002100750948003700771950003400808\u001e366832\u001e20141106221425.0\u001e750907c19509999enkqr p 0 a0eng d\u001e \u001fa 58020553 \u001e \u001fa0022-0469\u001e \u001fa(CStRLIN)NYCX1604275S\u001e \u001fa(NIC)notisABP6388\u001e \u001fa366832\u001e \u001fa(OCoLC)1604275\u001e \u001fdCtY\u001fdMBTI\u001fdCtY\u001fdMBTI\u001fdNIC\u001fdCStRLIN\u001fdNIC\u001e0 \u001faBR140\u001fb.J6\u001e \u001fa270.05\u001e04\u001faThe Journal of ecclesiastical history\u001e04\u001faThe Journal of ecclesiastical history.\u001e \u001faLondon,\u001fbCambridge University Press [etc.]\u001e \u001fa32 East 57th St., New York, 10022\u001e \u001fav.\u001fb25 cm.\u001e \u001faQuarterly,\u001fb1970-\u001e \u001faSemiannual,\u001fb1950-69\u001e0 \u001fav. 1- Apr. 1950-\u001e \u001faEditor: C. W. Dugmore.\u001e 0\u001faChurch history\u001fxPeriodicals.\u001e 7\u001faChurch history\u001f2fast\u001f0(OCoLC)fst00860740\u001e 7\u001faPeriodicals\u001f2fast\u001f0(OCoLC)fst01411641\u001e1 \u001faDugmore, C. W.\u001fq(Clifford William),\u001feed.\u001e03\u001f81\u001fav.\u001fi(year)\u001e40\u001f81\u001fa1-49\u001fi1950-1998\u001e \u001fapfnd\u001fbLintz\u001e \u001fa19890510120000.0\u001e2 \u001fa20141106\u001fbm\u001fdbatch\u001felts\u001fxaddfast\u001e \u001flOLIN\u001faBR140\u001fb.J86\u001fh01/01/01 N\u001e\u001d" }, "parsedRecord":{ "id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0", "content":"{\"leader\":\"01338cas a2200409 4500\",\"fields\":[{\"001\":\"in00000000001\"},{\"008\":\"750907c19509999enkqr p 0 a0eng d\"},{\"005\":\"20210213170746.7\"},{\"010\":{\"subfields\":[{\"a\":\" 58020553 \"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"022\":{\"subfields\":[{\"a\":\"0022-0469\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(CStRLIN)NYCX1604275S\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(NIC)notisABP6388\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"366832\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(OCoLC)1604275\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"040\":{\"subfields\":[{\"d\":\"CtY\"},{\"d\":\"MBTI\"},{\"d\":\"CtY\"},{\"d\":\"MBTI\"},{\"d\":\"NIC\"},{\"d\":\"CStRLIN\"},{\"d\":\"NIC\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"050\":{\"subfields\":[{\"a\":\"BR140\"},{\"b\":\".J6\"}],\"ind1\":\"0\",\"ind2\":\" \"}},{\"082\":{\"subfields\":[{\"a\":\"270.05\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"222\":{\"subfields\":[{\"a\":\"The Journal of ecclesiastical history\"}],\"ind1\":\"0\",\"ind2\":\"4\"}},{\"245\":{\"subfields\":[{\"a\":\"The Journal of ecclesiastical history.\"}],\"ind1\":\"0\",\"ind2\":\"4\"}},{\"260\":{\"subfields\":[{\"a\":\"London,\"},{\"b\":\"Cambridge University Press [etc.]\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"265\":{\"subfields\":[{\"a\":\"32 East 57th St., New York, 10022\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"300\":{\"subfields\":[{\"a\":\"v.\"},{\"b\":\"25 cm.\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"310\":{\"subfields\":[{\"a\":\"Quarterly,\"},{\"b\":\"1970-\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"321\":{\"subfields\":[{\"a\":\"Semiannual,\"},{\"b\":\"1950-69\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"362\":{\"subfields\":[{\"a\":\"v. 1- Apr. 1950-\"}],\"ind1\":\"0\",\"ind2\":\" \"}},{\"570\":{\"subfields\":[{\"a\":\"Editor: C. W. Dugmore.\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"650\":{\"subfields\":[{\"a\":\"Church history\"},{\"x\":\"Periodicals.\"}],\"ind1\":\" \",\"ind2\":\"0\"}},{\"650\":{\"subfields\":[{\"a\":\"Church history\"},{\"2\":\"fast\"},{\"0\":\"(OCoLC)fst00860740\"}],\"ind1\":\" \",\"ind2\":\"7\"}},{\"655\":{\"subfields\":[{\"a\":\"Periodicals\"},{\"2\":\"fast\"},{\"0\":\"(OCoLC)fst01411641\"}],\"ind1\":\" \",\"ind2\":\"7\"}},{\"700\":{\"subfields\":[{\"a\":\"Dugmore, C. W.\"},{\"q\":\"(Clifford William),\"},{\"e\":\"ed.\"}],\"ind1\":\"1\",\"ind2\":\" \"}},{\"853\":{\"subfields\":[{\"8\":\"1\"},{\"a\":\"v.\"},{\"i\":\"(year)\"}],\"ind1\":\"0\",\"ind2\":\"3\"}},{\"863\":{\"subfields\":[{\"8\":\"1\"},{\"a\":\"1-49\"},{\"i\":\"1950-1998\"}],\"ind1\":\"4\",\"ind2\":\"0\"}},{\"902\":{\"subfields\":[{\"a\":\"pfnd\"},{\"b\":\"Lintz\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"905\":{\"subfields\":[{\"a\":\"19890510120000.0\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"948\":{\"subfields\":[{\"a\":\"20141106\"},{\"b\":\"m\"},{\"d\":\"batch\"},{\"e\":\"lts\"},{\"x\":\"addfast\"}],\"ind1\":\"2\",\"ind2\":\" \"}},{\"950\":{\"subfields\":[{\"l\":\"OLIN\"},{\"a\":\"BR140\"},{\"b\":\".J86\"},{\"h\":\"01/01/01 N\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"999\":{\"subfields\":[{\"s\":\"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0\"},{\"i\":\"f8c62933-95ab-4a16-bbe5-5b27558c758a\"}],\"ind1\":\"f\",\"ind2\":\"f\"}}]}", "formattedContent":"LEADER 01293cas a2200409 4500\n001 366832\n005 20141106221425.0\n008 750907c19509999enkqr p 0 a0eng d\n010 $a 58020553 \n022 $a0022-0469\n035 $a(CStRLIN)NYCX1604275S\n035 $a(NIC)notisABP6388\n035 $a366832\n035 $a(OCoLC)1604275\n040 $dCtY$dMBTI$dCtY$dMBTI$dNIC$dCStRLIN$dNIC\n050 0 $aBR140$b.J6\n082 $a270.05\n222 04$aThe Journal of ecclesiastical history\n245 04$aThe Journal of ecclesiastical history.\n260 $aLondon,$bCambridge University Press [etc.]\n265 $a32 East 57th St., New York, 10022\n300 $av.$b25 cm.\n310 $aQuarterly,$b1970-\n321 $aSemiannual,$b1950-69\n362 0 $av. 1- Apr. 1950-\n570 $aEditor: C. W. Dugmore.\n650 0$aChurch history$xPeriodicals.\n650 7$aChurch history$2fast$0(OCoLC)fst00860740\n655 7$aPeriodicals$2fast$0(OCoLC)fst01411641\n700 1 $aDugmore, C. W.$q(Clifford William),$eed.\n853 03$81$av.$i(year)\n863 40$81$a1-49$i1950-1998\n902 $apfnd$bLintz\n905 $a19890510120000.0\n948 2 $a20141106$bm$dbatch$elts$xaddfast\n950 $lOLIN$aBR140$b.J86$h01/01/01 N\n999 ff$sf9d76822-ae0e-4a65-9b3b-47b7e991e5d0\n\n" }, "deleted":false, "order":0, "externalIdsHolder":{ "instanceId":"f8c62933-95ab-4a16-bbe5-5b27558c758a", "instanceHrid":"in00000000001" }, "additionalInfo":{ "suppressDiscovery":false }, "state":"ACTUAL" } |
- mapping rules
- mapping parameters
4. SRS consumes each created record and applies
- entry point:
DataImportKafkaHandler
- registered event processors applied if applicable to the record
InstancePostProcessingEventHandler
ModifyRecordEventHandler
MarcBibliographicMatchEventHandler
- if all profiles applied to the record
DI_COMPLETED
event is sent to signal that the import process is finished (in case of errorDI_ERROR
event published)
- entry point:
5. SRM accepts completion event and finalize the process for the record
- entry point:
RecordProcessedEventHandlingServiceImpl
- update general job execution progress with the record result
- save journal record if necessary
- update final job execution status if the record is the last one to be imported
- entry point:
Creation flow modification
In general current flow should be applicable to MARC Authority records importing. Some issues have been identified. But the list is not complete and further investigation might reveal another problems. Known limitations are related to implicit use of MARC Bib record type:
- SRM always publishes Data Import Events with Record mapped to MARC Bib type
- Event naming includes MARC Bib type explicitly. There is no other type support inside the naming neither mechanism to identify the required event by record type
SRM always publishes Data Import Events with MARC Bib type
SRM contains class which is used by service to publish record events: RecordsPublishingServiceImpl
. This class prepares event payload in the following way:
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
private DataImportEventPayload prepareEventPayload(Record record, ProfileSnapshotWrapper profileSnapshotWrapper, JsonObject mappingRules, MappingParameters mappingParameters, OkapiConnectionParams params, String eventType) { HashMap<String, String> dataImportEventPayloadContext = new HashMap<>(); dataImportEventPayloadContext.put(MARC_BIBLIOGRAPHIC.value(), Json.encode(record)); dataImportEventPayloadContext.put("MAPPING_RULES", mappingRules.encode()); dataImportEventPayloadContext.put("MAPPING_PARAMS", Json.encode(mappingParameters)); return new DataImportEventPayload() .withEventType(eventType) .withProfileSnapshot(profileSnapshotWrapper) .withCurrentNode(profileSnapshotWrapper.getChildSnapshotWrappers().get(0)) .withJobExecutionId(record.getSnapshotId()) .withContext(dataImportEventPayloadContext) .withOkapiUrl(params.getOkapiUrl()) .withTenant(params.getTenantId()) .withToken(params.getToken()); } |
The code in line #5 encodes Record and places it into the context map with the key = "MARC_BIBLIOGRAPHIC". This is done regardless of the real record type which can be one of the following:
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public enum EntityType { MARC_BIBLIOGRAPHIC("MARC_BIBLIOGRAPHIC"), MARC_HOLDINGS("MARC_HOLDINGS"), MARC_AUTHORITY("MARC_AUTHORITY"), EDIFACT_INVOICE("EDIFACT_INVOICE"), DELIMITED("DELIMITED"), INSTANCE("INSTANCE"), HOLDINGS("HOLDINGS"), ITEM("ITEM"), ORDER("ORDER"), INVOICE("INVOICE"), STATIC_VALUE("STATIC_VALUE"); private final String value; } |
The enumeration already defines required type for MARC Authority records, it just has to be detected from the record and placed into the payload. Detection mechanism implemented in MarcRecordAnalyzer.java from data-import-utils
module. It'll allow to put a record into the context with the appropriate type value as a key.
Event naming includes MARC Bib type only, no support for other types
Data import defines a list of available events in DataImportEventTypes enum:
Code Block | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
public enum DataImportEventTypes { DI_RAW_MARC_BIB_RECORDS_CHUNK_READ("DI_RAW_MARC_BIB_RECORDS_CHUNK_READ"), DI_MARC_BIB_FOR_UPDATE_RECEIVED("DI_MARC_BIB_FOR_UPDATE_RECEIVED"), DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED("DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED"), DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED("DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED"), DI_SRS_MARC_BIB_RECORD_CREATED("DI_SRS_MARC_BIB_RECORD_CREATED"), DI_SRS_MARC_BIB_INSTANCE_HRID_SET("DI_SRS_MARC_BIB_INSTANCE_HRID_SET"), DI_SRS_MARC_AUTHORITY_RECORD_CREATED("DI_SRS_MARC_AUTHORITY_RECORD_CREATED"), DI_SRS_MARC_BIB_RECORD_UPDATED("DI_SRS_MARC_BIB_RECORD_UPDATED"), DI_SRS_MARC_BIB_RECORD_MODIFIED("DI_SRS_MARC_BIB_RECORD_MODIFIED"), DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING("DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING"), DI_SRS_MARC_BIB_RECORD_MATCHED("DI_SRS_MARC_BIB_RECORD_MATCHED"), DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING("DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING"), DI_SRS_MARC_BIB_RECORD_NOT_MATCHED("DI_SRS_MARC_BIB_RECORD_NOT_MATCHED"), DI_INVENTORY_INSTANCE_CREATED("DI_INVENTORY_INSTANCE_CREATED"), DI_INVENTORY_INSTANCE_MATCHED("DI_INVENTORY_INSTANCE_MATCHED"), DI_INVENTORY_INSTANCE_NOT_MATCHED("DI_INVENTORY_INSTANCE_NOT_MATCHED"), DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING("DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING"), DI_INVENTORY_INSTANCE_UPDATED("DI_INVENTORY_INSTANCE_UPDATED"), DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING("DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING"), DI_INVENTORY_ITEM_CREATED("DI_INVENTORY_ITEM_CREATED"), DI_INVENTORY_ITEM_MATCHED("DI_INVENTORY_ITEM_MATCHED"), DI_INVENTORY_ITEM_NOT_MATCHED("DI_INVENTORY_ITEM_NOT_MATCHED"), DI_INVENTORY_ITEM_UPDATED("DI_INVENTORY_ITEM_UPDATED"), DI_INVENTORY_HOLDING_CREATED("DI_INVENTORY_HOLDING_CREATED"), DI_INVENTORY_HOLDING_MATCHED("DI_INVENTORY_HOLDING_MATCHED"), DI_INVENTORY_HOLDING_NOT_MATCHED("DI_INVENTORY_HOLDING_NOT_MATCHED"), DI_INVENTORY_HOLDING_UPDATED("DI_INVENTORY_HOLDING_UPDATED"), DI_ERROR("DI_ERROR"), DI_COMPLETED("DI_COMPLETED"); } |
There are events that are issued upon general purpose actions. Exmples of such events are:
DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED
orDI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED
They are different from other events in a sense that they can be applied to MARC records of any type. For instance, parsing of incoming MARC record into common format doesn't depend on the record type. The same is true for MARC record saving in SRS database.
Because of the above it makes sence to generalize some events by removing "_BIB_" part from the name, for instance:
DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED
becomesDI_RAW_MARC_RECORDS_CHUNK_PARSED
DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED
becomesDI_PARSED_MARC_RECORDS_CHUNK_SAVED
Once the general flow completes, the process can be customized for particular type of records with specific events that include "_BIB_" or "_AUTH_" or other qualifiers. This separation is supposed to happen after record creation, so the events to notify about a record's been successfully created have to have type qualifier inside:
DI_SRS_MARC_BIB_RECORD_CREATED
DI_SRS_MARC_AUTH_RECORD_CREATED
This should bring more control on record processing customization and limit the number of unwanted executions of the services that are interested in one type of record but not in the other.
Creation profile
Similar to MARC Bib records there has to be a default data import profile for MARC Authority records. Unlike bibliographic records importing though there is no need to create any additional entities, like inventory records, during authority records import. So the profile could be simplified to job profile which includes single action: Create MARC Authority in SRS.
Mock ups for the default profile:
- Job profile
- Action profile
For the reference, default Job/Action profiles for MARC Bib record import:
Code Block | ||||
---|---|---|---|---|
| ||||
{
"id": "6409dcff-71fa-433a-bc6a-e70ad38a9604",
"name": "Default - Create instance and SRS MARC Bib",
"description": "This job profile creates SRS MARC Bib records and corresponding Inventory Instances using the library's default MARC-to-Instance mapping. It can be edited, duplicated, or deleted.",
"dataType": "MARC",
"deleted": false,
"userInfo": {
"firstName": "System",
"lastName": "System",
"userName": "System"
},
"parentProfiles": [],
"childProfiles": [],
"metadata": {
"createdDate": "2021-01-14T14:00:00.000+00:00",
"createdByUserId": "00000000-0000-0000-0000-000000000000",
"updatedDate": "2021-01-14T15:00:00.462+00:00",
"updatedByUserId": "00000000-0000-0000-0000-000000000000"
}
} |
Code Block | ||||
---|---|---|---|---|
| ||||
{
"action": "CREATE",
"childProfiles": [],
"deleted": false,
"description": "This action profile is used with FOLIO's default job profile for creating Inventory Instances and SRS MARC Bibliographic records. It can be edited, duplicated, or deleted.",
"folioRecord": "INSTANCE",
"id": "f8e58651-f651-485d-aead-d2fa8700e2d1",
"metadata": {
"createdByUserId": "00000000-0000-0000-0000-000000000000",
"createdDate": "2021-01-14T14:00:00.000+00:00",
"updatedByUserId": "00000000-0000-0000-0000-000000000000",
"updatedDate": "2021-01-14T15:00:00.462+00:00"
},
"name": "Default - Create instance",
"parentProfiles": [],
"userInfo": {
"firstName": "System",
"lastName": "System",
"userName": "System"
}
} |
Open questions:
- Do we need some mapping rules and mapping parameters? If yes, can the rules be empty to avoid code modification?
Generation of HRID
TBD
Uncovered Areas
Below are the business areas that mentioned in the requirements but not covered by this spike:
- Updating MARC Authority records
- existing flow and modifications required to support Authority records
- matching profiles
- mapping rules and parameters
- Searching MARC Authority records with SRS MARC Query API