Import Inventory Instances and MARCcat bib records in MARC format

Description

Q2 2019 Data Import Priority 1 of 8

Note: While UI input is preferred, this will only cover command line input to start, and using the standard MARC data mapper

Requirements:

  1. CLI ability to load MARC bibs to SRS (and some instructions)

  2. For now, use the existing MARC mapper for Instance. In UXPROD-1577, will update that mapping to the newer default MARC bib-to-Instance mapping

  3. Sort out the 001/003/035/999 handling for newly created SRS records?

    • Move non-FOLIO 001 to 035 $a field (only if unique 035)

    • Move non-FOLIO 003 to prefix on 035 $a field? e.g. 001 12345 and 003 OCoLC becomes 035 (OCoLC)12345 [A-M doublechecking with DI workgroup]

    • Create 001 using the sequential HRID generator

    • Create 003 using the tenant default (new MARC settings area, with tenant-level 003 and subfield preferences)

    • Create 999 field with $s [SRS UUID] (will add 999 $i with Instance UUID and 999 $m with MARCcat UUID when they are created), so complete 999 field will look like: 999 $i [Instance UUID] $m [MARCcat UUID] $s [SRS UUID] (see MODSOURMAN-95 and MODSOURMAN-96)

  4. Figure out what happens with the old mod-data-loader - retire? repurpose?

Per Harry, aim for at least these performance metrics:

  • 100 records/second

  • or 1,000 records/second on empty system - turn off indexing, load, and then reindex

May need to review mod-data-loader and decide if it is replaced by the data-import-loader, or if the mod-data-loader mapping just needs updating or what

Note that MARCcat will not happen until SRS Integration with MARCcat (UXPROD-1595)

Priority

Fix versions

Development Team

Folijet

Assignee

Solution Architect

Parent Field Value

None

Parent Status

None

defines

is defined by

Checklist

hide

TestRail: Results

Activity

Show:

Ann-Marie Breaux June 26, 2019 at 4:53 AM

Split the backend pubsub work into its own feature (https://folio-org.atlassian.net/browse/UXPROD-1806#icft=UXPROD-1806)

Anne L. Highsmith April 11, 2019 at 12:36 PM

Yes, , I envision it as you describe it.

Ann-Marie Breaux April 11, 2019 at 5:09 AM

Yes, thank you for the clarification. So as part of data migration, to deal with having totally different records with 001 of 1001, you're thinking some sort of qualifier to be able to distinguish them, so main1001 vs med1001, or something like that - and then that qualifier would carry down into the 035 (along with whatever 003 value) when the records came into FOLIO, and the FOLIO HRID was assigned. Is that right? And to be clear, since you'll just have 1 Inventory/MARC Database in FOLIO (right?) then the previous main1001 and med1001 records would end up with completely new and different FOLIO HRIDs in the 001 field.

And then dealing with bibliographic duplicates once all the bib records are together in FOLIO is a completely separate issue (and not one I think we've worked through yet, but probably OK to park it for now). My guess is that we might need something similar to OCLC (or Jira for that matter), where a previous HRID is still accessible and searchable, even if that record is merged with another record.

Anne L. Highsmith April 10, 2019 at 4:49 PM

We haven't yet made any decisions about how to handle bibliographic duplicates, so I can't answer your question.

This may be obvious, but in case not, let me clarify my example of why this is an issue for us. It doesn't have to do with bibliographic duplicates; it has to do with the fact that the 001 field as created in and exported from Voyager is simply a sequential integer. TAMU Main Libraries bib record with field 001 = 1001 may be a record for an engineering dissertation done in 1953; TAMU Medical Sciences Library bib record with field 001 = 1001 might be the serial record for JAMA. But they're both field 001 = 1001, so the 035 $a would be identical under the original proposal, without some kind of qualifier.

Ann-Marie Breaux April 10, 2019 at 6:50 AM

Thanks for your comments, I have adjusted the number handling proposal based on your suggestions. One question - when you load your existing MARC records to FOLIO, if there is a record in both database for the same resource, will you be consolidating those into 1 MARC record (and surfacing as 1 Instance), or will you be loading those as 2 separate MARC records (and surfacing as 2 separate Instances)?

I think the next step will be to review/confirm this with the Data Import subgroup and maybe with MM SIG. My goal is to get the requirements finalized by the end of the week, so that I can start writing the developer stories for this feature.

Done

Details

Reporter

Estimation Notes and Assumptions

Wayne and Shale (and MARCcat) may already have many of the bits and pieces for this already, but I'm not sure how much we got and how much is missing. I'm assuming analysis and implementation work to be done in the area of batch loading on top of existing records. To the extend this is to be user configurable, the estimate assumes this is covered by some of the other UXPROD issues.

Analysis Estimate

Large < 10 days

Analysis Estimator

Front End Estimate

Small < 3 days

Front End Estimator

Front-End Confidence factor

Low

Back End Estimate

XXL < 30 days

Back End Estimator

Rank: FLO (MVP Sum 2020)

R1

Rank: 5Colleges (Full Jul 2021)

R1

Rank: Cornell (Full Sum 2021)

R1

Rank: Chalmers (Impl Aut 2019)

R1

Rank: GBV (MVP Sum 2020)

R1

Rank: hbz (TBD)

R1

Rank: TAMU (MVP Jan 2021)

R1

Rank: Chicago (MVP Sum 2020)

R1

Rank: Leipzig (ERM Aut 2019)

R1

Rank: MO State (MVP June 2020)

R1

Rank: U of AL (MVP Oct 2020)

R1

Rank: Leipzig (Full TBD)

R1

Rank: Lehigh (MVP Summer 2020)

R1

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created May 25, 2018 at 8:33 AM
Updated September 16, 2020 at 9:17 PM
Resolved June 26, 2019 at 4:23 PM
TestRail: Cases
TestRail: Runs