Migration Tools (UXPROD-850)

[UXPROD-1713] Data Migration. Migrate bibliographic records with pre-determined HRIDs Created: 20/May/19  Updated: 26/Oct/20  Resolved: 23/Oct/20

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: None
Parent: Migration Tools

Type: New Feature Priority: P3
Reporter: Charlotte Whitt Assignee: Ian Walls
Resolution: Done Votes: 0
Labels: Inventory, SRS, marccat, migration-load, po-mvp, round_iv
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to UXPROD-2115 Define human readable identifiers (HR... Closed
relates to UXPROD-2116 Define human readable identifiers (HR... Closed
relates to UXPROD-2114 Define human readable identifiers (HR... Closed
Epic Link: Migration Tools
PO Rank: 119
PO Ranking Note: Use of legacy IDs is necessary to maintain links between records. Comes immediately after being able to load biblio/holdings data (UXPROD-559)
Rank: Chalmers (Impl Aut 2019): R5
Rank: Chicago (MVP Sum 2020): R1
Rank: Cornell (Full Sum 2021): R4
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: GBV (MVP Sum 2020): R2
Rank: Lehigh (MVP Summer 2020): R1
Rank: TAMU (MVP Jan 2021): R1
Rank: U of AL (MVP Oct 2020): R2

 Description   

Usecase:
Several libraries (uChicago, Cornell and others) need to be able to take their existing ILS bibids with them, when migrating to FOLIO.

The jira feature originate from:
Gap Analysis 2019 Missing Features - https://docs.google.com/spreadsheets/d/1b3VY1EUOAyoySuEaT0lJfKUBYLMUPZM6PDSBnGiMxYk/edit#gid=739330010

Original order # 45



 Comments   
Comment by Anya [ 23/May/19 ]

FC: would this be in inventory or in Marc data store? We thought we would store this in a 9xx field.

Comment by Charlotte Whitt [ 24/May/19 ]

Anya - yes correct stored in SRS in MARC stroagen in the MARC tag 999 $i, and populated in Inventory.

I updated the Epic link to: Migration Tools from Legacy ILS – patty.wanninger would that be the correct epic to link to, when a feature is about Data Migration?

Comment by Christopher Creswell [ 05/Jun/19 ]

It's worth mentioning that identifiers from previous systems can be stored in FOLIO in the "Identifiers" section in Inventory. It's also possible to specify the HRID for an instance record in Inventory by disabling database triggers during migration. The automatically generated ones are created by a database trigger currently. I believe Dale Arntson has done this. This doesn't apply to instances that are "surfacing" from source record storage, though, I believe.

Comment by Ann-Marie Breaux (Inactive) [ 06/Jun/19 ]

Christopher Creswell Thank you for this reminder. We still need to sort out the 001/003/035 handling in SRS MARC records, which we'll do in Q3.

I'd definitely like to advocate that unless there is a compelling reason to keep your previous system ID as the FOLIO HRID, that the previous ID moves to an 035 in the MARC record and Other Identifier with type Control number or System control number in the Instance record. New system, new number can hopefully be the default, but if we have to retain the previous system number as the FOLIO HRID, then we will make sure that we can.

Comment by Jason Kovari [ 06/Jun/19 ]

We have used bibids for various activities, including the association of digitized-to-original-physical materials, various digital preservation workflows, URLs in our discovery environment and more. While one could point out that this practice was based on a false assumption of persistence, it is the case nonetheless. Moving these to another field either in MARC storage OR in Inventory will create a case whereby we need to map our legacy bibids to another identifier in all of the various use cases where bibids entered our workflows - this will entail significant effort and would miss all possible places where these bibids have (especially for users who may have lists of our catalog URLs stored in their documentation). There is a middle ground whereby we create one set of logic for materials acquired prior to FOLIO whereby a Voyager bibid is used from whichever field in whichever store AND a different set of logic for materials acquired post-FOLIO implementation; however, this seems like a terrible solution. I understand that migrating legacy system identifiers entails migration work and development; however, this would simply shift this significant effort post-implementation for us… and I suspect possibly for others, as well.

Comment by Jacquie Samples [ 06/Jun/19 ]

Duke would also like to rank this as a go-live requirement. Similarly to the situation at Cornell, we have long-used system numbers as identifiers among linked entities.We would like to migrate our system numbers (in MARC 001) in order to maintain links. Moving these to another field in MARC storage would cause our records to become dis-associated meaning that related records, such as bound-withs and boxed-withs, will lose their "parents" and "children" records. This will cause some resources to be as good as lost. We also need to have the holding system numbers migrate "in place" in the MARC001 for the same reasons types of reasons.

Comment by Ann-Marie Breaux (Inactive) [ 06/Jun/19 ]

Jason Kovari and Jacquie Samples I definitely hear you, and know that some (maybe most?) libraries will need to retain previous system IDs in the MARC 001 and as the primary HRID in the Inventory records. I didn't mean to imply anything else. And most everyone has ranked it as go-live, so we'll definitely be working on it. We'll sort out management of FOLIO-generated and previous-system-migrated 001/003/035 in Q3. I just haven't had a chance to get it sorted out yet, but I'll turn my attention back to it shortly. And in the meantime, we do have the handling for the UUIDs sorted out, so we're covered fine from a FOLIO/technical perspective.

Comment by Jason Kovari [ 06/Jun/19 ]

Glad to hear the commitment, Ann-Marie Breaux; this is certainly important. I was perhaps reading your advocacy for new FOLIO HRIDs as more of an indication than I should. Thanks for hearing the need!

Comment by Theodor Tolstoy (One-Group.se) [ 04/Jul/19 ]

Jacquie Samples IMHO, I would propose that you try to replace your legacy ID:s with new ones as part of the migrations process, as your use cases seem to be around connecting things that would reside in Inventory and can/should/must be expressed with UUID:s anyway.

The case for external systems is another question, and with the Chalmers implementation I certainly would like FOLIO to treat the ID:s (001:s) from the Union catalog as first-class-citizen-ids.

Comment by Jacquie Samples [ 11/Jul/19 ]

Hi Theodor Tolstoy (One-Group.se),

Thanks for your note and suggestion. We have hundreds of thousands of records in our database that use the current IDs as linking fields based on these system numbers; that is hundreds of thousands of Legacy IDs used as the primary linking data. The current situation with linked entities, which I mentioned above, makes replacing the system numbers while retaining the contextual linking is not easily undertaken. It is the context of multiple bibliographical entities related to one item record, for example, that makes replacing our legacy IDs problematic unless a very sophisticated tool is created in order to facilitate the creation of UUIDs, and replacing the legacy IDs in every place they exist currently (BIB, HOL, ITM) to retain the relationships between and among these entities.

Comment by Charlotte Whitt [ 30/Jul/19 ]

Hi Ian Walls - I noticed that you have only given this 'core function' PO rank "2". That seams very low for a feature with PO-mvp.

Comment by Ian Walls [ 30/Jul/19 ]

Apologies; misunderstood the order of PO ranks. Re-ranking

Comment by Jenn Colt [ 26/Aug/19 ]

We've lowered our rank because we've been told this will be part of data import.

Comment by Charlotte Whitt [ 27/Aug/19 ]

Ann-Marie Breaux - will you confirm?

... this will be part of data import

As you know, then this feature is important for uChicago and several other libraries.

Thanks much

Comment by Jenn Colt [ 27/Sep/19 ]

I think the message I got today is that we actually won't be able to do this through data import.

Comment by Ann-Marie Breaux (Inactive) [ 30/Sep/19 ]

From Christie Thomas in Slack: We are turning off triggers for loading data at data migration, but would not want to do that on an ongoing basis; there may be a better way ...

From jroot in Slack: I can tell you at A&M, that’s what we intend to do from my understanding. This is apart of our migration tool project.

And additional conversation from Slack:
jroot We are building a “mod-external-reference-resolver” for this process.

Jenn Colt
For the HRID/001 or for UUIDs?
Is the bib to instance mapping maintained in there as well?

jroot
Jenn Colt let me forward your question to Jeremy Huff or Ryan Laddusaw as I do not know.

Jenn Colt
Okay. Nick here has been working with them as well. I'll catch up with folks next week. I'm just finding rebuilding something that already exists a little mind bending.

jroot
My understanding is this is to help along our migration process/efforts using the workflow engine, data extraction and data import modules.

Jenn Colt
That makes sense. I just think hitting endpoints as part of a workflow is different from rebuilding the functionality of those endpoints. AM's team has done a whole bunch of work on the mapping, all of which we will need to rebuild outside of folio if we want to keep our HRIDs is how this is sounding to me.

Ryan Laddusaw
mod-external-reference-resolver was created to help us keep track of external references (likely Voyager ids in our case) and the corresponding FOLIO UUIDs. We wanted to be able to do our migration in pieces, but needed to be able to resolve the FOLIO UUIDs of things that had already been migrated.

Jenn Colt
so you are not needing to keep your existing bibids as the bibid in FOLIO?

Ryan Laddusaw
I'm not sure about that. We haven't really started working on bib records.

Jenn Colt
(and yes the resolving between for instance voyager item ids and FOLIO item uuids we're doing but just with a table at the moment)

Ryan Laddusaw
This was intended to help in cases like migrating a purchase order when the vendor was already migrated so we can look up the FOLIO UUID for the vendor and make sure the PO links up with the right vendor.

Jenn Colt
Yeah that makes sense
The issue with the bibids in particular is that they are a public ID that has been used in a bunch of places so we want to maintain them, as opposed to things like vendor ids that are just private

Ryan Laddusaw
Our migration workflows let us setup processors that we use to build the request object we want to create in FOLIO. I don't think there would be a problem with maintaining the 001 field's value.

Jenn Colt
I have been told that at the moment if you put an HRID in the request FOLIO throws it away unless you turn off database triggers. it sounds like Chicago feels turning them off during migration is okay. Maybe turning them off is part of your loading process too?

Ryan Laddusaw
I don't think we've gotten to the point where we've encountered that issue yet. We would definitely want FOLIO to respect the HRID if we set it in a request.
Sounds like we'd need to turn them off or modify the storage module to respect provided HRIDs

Jenn Colt
Yeah

Comment by Jenn Colt [ 30/Sep/19 ]

I just want to insert that I don't think "turn off database triggers" should be considered a way to close this issue until someone posts here that they've done it and explored the consequences and think it's fine, including fine in the "I'm willing to do this two days before I go live" way. As far as I can tell, Chicago is the only institution who has gotten far enough to say.

Comment by Ann-Marie Breaux (Inactive) [ 30/Sep/19 ]

Hi Jenn Colt I wasn't trying to suggest that it definitely was the right solution - just wanted to preserve the Slack convo in this relevant Jira.

I do think this would be a good topic for the Sys-Ops/Data Migration subgroup and/or the Implementers group. What say you, Dale Arntson Ian Walls Karen Newbery?

Thank you!

Comment by Ian Walls [ 30/Sep/19 ]

I think this is worth talking about at Data Migration.

For my part, I don't think migration tools should be messing with your MARC data, and that includes the 001. We need a commonly-known key value to link up data; it doesn't have to be Human Readable, so long as it's unique. UUIDs are great for that, and if there is a way to provide a comprehensive map from legacy ID to FOLIO ID that can be handed back to the migration tool, fine, but that does add an extra mapping step to the migration process that will need to be programmed in.

Comment by Jenn Colt [ 30/Sep/19 ]

Human readable refers to the HRID on an instance.

The mapping between legacy to FOLIO UUID can be (is being) handled in different ways, that is not my primary concern or the topic of this issue (as far as I can tell). Our goal is to take the 001 on our current MARC records (which is also called the bibid) and keep that as the 001 on the FOLIO MARC record, and also have that 001 become the HRID on its corresponding instance (give or take a prefix on the HRID). Matching the 001 and instance HRID at migration will create behavior that is consistent with new records imported later on by data import. Just to summarize my concerns and then I'll let others have at it:

  • Data import is currently specced to always move an incoming 001 to an 035, therefore preventing us from using that process at migration if we want to keep our 001 and requiring us to build a parallel process outside of FOLIO that maps our MARC to instances for migration. People have this underway in varying stages. No one has a complete system AFAIK.
  • If you decide to create your own JSON for instances and add the instances and MARC records separately to FOLIO, you cannot currently set the HRID on the JSON you construct to match the 001 unless you disable database triggers. This, to me, feels like a hack. I am open to being wrong on that point.
Comment by Ian Walls [ 22/Oct/20 ]

For my migrations so far, I've had no problem mapping the MARC 001 to the FOLIO Instance HRID and having it persist when loaded into the database. This is making use of the bulk Instance loading API (/instance-storage/batch/synchronous) and the code written by Theodor Tolstoy (One-Group.se) housed at https://github.com/FOLIO-FSE/MARC21-To-FOLIO.

Comment by Ian Walls [ 23/Oct/20 ]

By making use of the bulk Instance loading endpoint enables this, so marking as Done.

Generated at Fri Feb 09 00:17:53 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.