Batch Importer (Bib/Acq) (UXPROD-47)

[UXPROD-665] Import Inventory Instances and MARCcat bib records in MARC format Created: 25/May/18  Updated: 16/Sep/20  Resolved: 26/Jun/19

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: Q2 2019
Parent: Batch Importer (Bib/Acq)

Type: New Feature Priority: P2
Reporter: Ann-Marie Breaux (Inactive) Assignee: Ann-Marie Breaux (Inactive)
Resolution: Done Votes: 0
Labels: crossrmapps, data-import, instances, marccat, marcimport, split
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Defines
defines UXPROD-47 Batch Importer (Bib/Acq) Analysis Complete
is defined by MODSOURMAN-95 SPIKE: Figure out 999 Instance and SR... Closed
is defined by UIDATIMP-204 MARC Bib load results in invalid date... Closed
is defined by UIDATIMP-209 Show all records in log including err... Closed
is defined by MODDATAIMP-105 SPIKE: Event payload container Closed
is defined by MODDATAIMP-106 SPIKE: Job processing engine Closed
is defined by MODDATAIMP-107 SPIKE: Create event handler Closed
is defined by MODDATAIMP-110 SPIKE: Find "MARC -> Inventory Instan... Closed
is defined by MODDATAIMP-111 Set status for upload definition as n... Closed
is defined by MODDATAIMP-112 Data Import to Inventory integration ... Closed
is defined by MODDATAIMP-113 SPIKE: Investigate Inventory API (mod... Closed
is defined by MODDATAIMP-114 SPIKE: Check Data Import performance. Closed
is defined by MODDATAIMP-115 SPIKE: Check Inventory performance. Closed
is defined by MODDATAIMP-116 Update fileDefinition status on uploa... Closed
is defined by MODDATAIMP-117 Finalize PoC with event approach and ... Closed
is defined by MODDATAIMP-126 Add readers for all files that can co... Closed
is defined by MODDICONV-11 Populate sample data Closed
is defined by MODSOURCE-40 Inventory instance’s data structure s... Closed
is defined by MODSOURMAN-93 Design and develop a bespoke processo... Closed
is defined by MODSOURMAN-94 Integrate extracted logic from data-l... Closed
is defined by MODSOURMAN-99 MARC 999 field: Put SRS UUID Closed
is defined by UIDATIMP-185 Create temporary MARC Bib Load option... Closed
is defined by UIDATIMP-186 Landing page changes when temporary M... Closed
is defined by UIDATIMP-187 Add Log button to log entries in seco... Closed
is defined by UIDATIMP-198 Handle broken tests issue caused by s... Closed
Relates
relates to UXPROD-1805 SRS MARC-Inventory Instance relations... Closed
relates to UXPROD-2078 SRS MARC-Inventory Instance relations... Closed
relates to UXPROD-2207 SRS MARC-Inventory Instance relations... Closed
relates to UXPROD-1479 Simple UI for edit of the default MAR... Draft
relates to UXPROD-1397 Generating a MARC bibliographic recor... Closed
relates to UXPROD-1806 NFR: Data Import Pub-Sub (Event Drive... Closed
relates to UXPROD-2012 NFR: Data Import Pub-Sub (Event Drive... Closed
relates to UXPROD-2115 Define human readable identifiers (HR... Closed
relates to MODSOURMAN-115 After Instances are created the field... Closed
relates to UXPROD-1499 Accept files in JSON for intital migr... Closed
Epic Link: Batch Importer (Bib/Acq)
Analysis Estimate: Large < 10 days
Analysis Estimator: Niels Erik Nielsen
Front End Estimate: Small < 3 days
Front End Estimator: Viktor Soroka
Front-End Confidence factor: Low
Back End Estimate: XXL < 30 days
Back End Estimator: Taras Spashchenko
Estimation Notes and Assumptions: Wayne and Shale (and MARCcat) may already have many of the bits and pieces for this already, but I'm not sure how much we got and how much is missing.

I'm assuming analysis and implementation work to be done in the area of batch loading on top of existing records.

To the extend this is to be user configurable, the estimate assumes this is covered by some of the other UXPROD issues.
Development Team: Folijet
Rank: Chalmers (Impl Aut 2019): R1
Rank: Chicago (MVP Sum 2020): R1
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: FLO (MVP Sum 2020): R1
Rank: GBV (MVP Sum 2020): R1
Rank: hbz (TBD): R1
Rank: Lehigh (MVP Summer 2020): R1
Rank: Leipzig (Full TBD): R1
Rank: Leipzig (ERM Aut 2019): R1
Rank: MO State (MVP June 2020): R1
Rank: TAMU (MVP Jan 2021): R1
Rank: U of AL (MVP Oct 2020): R1

 Description   

Q2 2019 Data Import Priority 1 of 8

Note: While UI input is preferred, this will only cover command line input to start, and using the standard MARC data mapper

Requirements:

  1. CLI ability to load MARC bibs to SRS (and some instructions)
  2. For now, use the existing MARC mapper for Instance. In UXPROD-1577 Closed , will update that mapping to the newer default MARC bib-to-Instance mapping
  3. Sort out the 001/003/035/999 handling for newly created SRS records?
    • Move non-FOLIO 001 to 035 $a field (only if unique 035)
    • Move non-FOLIO 003 to prefix on 035 $a field? e.g. 001 12345 and 003 OCoLC becomes 035 (OCoLC)12345 [A-M doublechecking with DI workgroup]
    • Create 001 using the sequential HRID generator
    • Create 003 using the tenant default (new MARC settings area, with tenant-level 003 and subfield preferences)
    • Create 999 field with $s [SRS UUID] (will add 999 $i with Instance UUID and 999 $m with MARCcat UUID when they are created), so complete 999 field will look like: 999 $i [Instance UUID] $m [MARCcat UUID] $s [SRS UUID] (see MODSOURMAN-95 Closed and MODSOURMAN-96 Closed )
  4. Figure out what happens with the old mod-data-loader - retire? repurpose?

Per Harry, aim for at least these performance metrics:

  • 100 records/second
  • or 1,000 records/second on empty system - turn off indexing, load, and then reindex

May need to review mod-data-loader and decide if it is replaced by the data-import-loader, or if the mod-data-loader mapping just needs updating or what

Note that MARCcat will not happen until SRS Integration with MARCcat ( UXPROD-1595 Closed )



 Comments   
Comment by Cate Boerema (Inactive) [ 11/Jun/18 ]

Ann-Marie Breaux, Wayne Schneider, shale99 I am trying to understand how this differs from UXPROD-145 Closed . Won't any Inventory instances and MARCcat bib records being re-imported into FOLIO just be MARC records and, as such, covered by UXPROD-145 Closed ? Or is this about matching existing records for record updates as opposed to new records? Thanks!

Comment by Ann-Marie Breaux (Inactive) [ 19/Jun/18 ]

Hi Cate BoeremaUXPROD-145 is an older feature that encompassed lots of different things: inventory instances, holdings, items, authorities - in MARC and delimited formats. The newer features (in the 600s) break 145 into more discrete elements. Should we maybe add 145's details to the Batch Importer epic ( UXPROD-47 Analysis Complete ) and delete 145?

Comment by Cate Boerema (Inactive) [ 20/Jun/18 ]

We definitely don't want duplicate and overlapping features so I think that's a good idea. We should wait until after the Gap Analysis is complete (should be by June 29th), as UXPROD-145 Closed has already received some prioritizations. Can you put an alert on your calendar to take care of this afterwards? Thanks!

Comment by Hkaplanian [ 17/Jul/18 ]

Can I assume this is MARC-21 only and we will need to add another feature for MARCXML?

Comment by Anne L. Highsmith [ 08/Apr/19 ]

I'd like to comment on the requirements for point 3, especially sub point ◦Move non-FOLIO 001 to 035 $a field (only if unique 035)". The requirement for the 001 to be unique would create a serious problem for Texas A&M because in the course of migrating to FOLIO we are migrating bibliographic records from two different voyager databases. That means that the 001 fields from the bib records, marc holdings records, and authority records will overlap tween the two databases, so I cannot guarantee them to be unique.

I would also suggest that in creating a 035 from 001 & 003, that you follow the convention to use the 003 as a prefix to the 001 rather than putting the 003 in 035 $be. Although I can't see that the LC MARC Documentation requires it, it is a very common means of dealing with 001/003 data.

Example: 001 00964332
003 (OCoLC)
035 field after transform & load: (OCoLC)00964332

If you drop the requirement that subfield a be unique and use the 003 data as a prefix to the 001, then I can put the database code in the 003, create a unique 035 subfield a and still be able to tell the records from both databases apart.

Comment by Ann-Marie Breaux (Inactive) [ 10/Apr/19 ]

Thanks for your comments, Anne L. Highsmith I have adjusted the number handling proposal based on your suggestions. One question - when you load your existing MARC records to FOLIO, if there is a record in both database for the same resource, will you be consolidating those into 1 MARC record (and surfacing as 1 Instance), or will you be loading those as 2 separate MARC records (and surfacing as 2 separate Instances)?

I think the next step will be to review/confirm this with the Data Import subgroup and maybe with MM SIG. My goal is to get the requirements finalized by the end of the week, so that I can start writing the developer stories for this feature.

Comment by Anne L. Highsmith [ 10/Apr/19 ]

We haven't yet made any decisions about how to handle bibliographic duplicates, so I can't answer your question.

This may be obvious, but in case not, let me clarify my example of why this is an issue for us. It doesn't have to do with bibliographic duplicates; it has to do with the fact that the 001 field as created in and exported from Voyager is simply a sequential integer. TAMU Main Libraries bib record with field 001 = 1001 may be a record for an engineering dissertation done in 1953; TAMU Medical Sciences Library bib record with field 001 = 1001 might be the serial record for JAMA. But they're both field 001 = 1001, so the 035 $a would be identical under the original proposal, without some kind of qualifier.

Comment by Ann-Marie Breaux (Inactive) [ 11/Apr/19 ]

Yes, Anne L. Highsmith thank you for the clarification. So as part of data migration, to deal with having totally different records with 001 of 1001, you're thinking some sort of qualifier to be able to distinguish them, so main1001 vs med1001, or something like that - and then that qualifier would carry down into the 035 (along with whatever 003 value) when the records came into FOLIO, and the FOLIO HRID was assigned. Is that right? And to be clear, since you'll just have 1 Inventory/MARC Database in FOLIO (right?) then the previous main1001 and med1001 records would end up with completely new and different FOLIO HRIDs in the 001 field.

And then dealing with bibliographic duplicates once all the bib records are together in FOLIO is a completely separate issue (and not one I think we've worked through yet, but probably OK to park it for now). My guess is that we might need something similar to OCLC (or Jira for that matter), where a previous HRID is still accessible and searchable, even if that record is merged with another record.

Comment by Anne L. Highsmith [ 11/Apr/19 ]

Yes,Ann-Marie Breaux , I envision it as you describe it.

Comment by Ann-Marie Breaux (Inactive) [ 26/Jun/19 ]

Split the backend pubsub work into its own feature ( UXPROD-1806 Closed )

Generated at Fri Feb 09 00:09:30 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.