/
2024-1-17 Data Import Subgroup meeting
2024-1-17 Data Import Subgroup meeting
Recordings are posted Here (2022+) and Here (pre-2022) Slack channel for Q&A, discussion between meetings
Requirements details Here Additional discussion topics in Subgroup parking lot
Attendees: Ryan Taylor Jennifer Eustis Aaron Neslin Christie Thomas Raegan Wiechert Autumn Faulkner Ellis Butler Lynne Fors Robert Pleshar Taylor Smith Sara Colglazier
Notetaker: Jennifer Eustis, Corrie Hutchinson, Christie Thomas
Links:
- Poppy import planning dashboard
- Poppy timeline
- Quesnelia import planning dashboard (still being defined)
- Quesnelia timeline
- Folijet Current Development Board
- Folijet (Data import) Bug Dashboard
Agenda:
- Quesnelia Status Update
- Quick update on how Quesnelia is moving along so far. The focus is on reliability and stability and reducing the number of bugs. Recently, the team has been working on tickets released as CSP for Poppy for some fixes. In progress, the team has started work on UXPROD-4471 (remove step of initial saving of incoming records to SRS) which is part of the architecture support for stability and reliability, and UXPROD-4303 (set instance/bib record for deletion). Upcoming will be focusing on bug fixes in support of marc to marc matching and continuing with architecture work, UXPROD-4257.
- Quesnelia Status Update
- Question: Are all the bugs really bugs or are they a new feature or a change in feature? Ryan is evaluating these bugs with the team. Ryan will notify SME's about any changes.
- Question: It's difficult to have these discussions. As users, we are frustrated and dissatisfied. The lack of documentation and decisions is hard. Who gets to decide what is functionality? There is a concern about solving data import and user dissatisfaction without a common understanding. The tool is inconsistent and undocumented. There is no shared understanding of the expected functionality. We should prioritize getting there and return to our last week's discussion. Ryan agrees about documentation in that we need to focus on this. If this isn't on the agenda, please bring this up and make it a priority. We can have a session on brainstorming on documentation.
- Question: People like the idea of spending next week on documentation. Are we looking for clarifying for our group only or all users? What is the audience? End user documentation is important. We need a roadmap with a list of all the functionality of how it is supposed to work. The functional roadmap is a separate piece of documentation. This is like a functionality audit to help our work and ultimately this will help end user documentation. We should focus on functionality audit like a spreadsheet that we build out. The higher level roadmap is being looked at by Ryan.
- Question:
Action:
- Get volunteers to create a spreadsheet and start brainstorming
- UXPROD-2742: MARC-MARC matching enhancements: Narrowing multiple matches & Bug work
- Discuss use case scenario provided by Jennifer Eustis.
- The 5 Colleges consortium has a shared instance. We often shared bibliographic records. To keep track of these records, we use what we are calling a container code in the MARC field 035 9\$a. This code has the prefix of one of the schools in the consortium (if the eResource is for 1+ then we use 5c or FC as the prefix) followed by some designation and a unique number. For example, UMass' Springer container code is (UMSPR)document_identifier. For our JSTOR open access, we use FCJSTOROA followed by the linking text. Sometimes we purchase packages that aren't exactly the same coverage. Typically in that case, we add our own container codes for the shared resource. For example a bib record might have (UMSPR)uniqueID and (ACSPING)uniqueID and so on. We need to be able to match on the existing 035 9\$a as an exact match where the existing marc srs can potentially have more than more 035 9\$a and our match might be the 1st, 2nd or 3rd in the order of those 035 9\$a's in the existing record. The issue is to be able to match to a repeating field. This applies to a marc to marc match as well as a marc to instance system control number match.
- The issue is the repeating field. For Chicago, what has worked is to not do an exact but contains in the match. Another example from 5C is the duplication of the numbers. In Sara's case, there were 2 035 9\'s that were different enough. Trying to use 035 9\ as a match point to create additional items/holdings and it gets stuck with multiple 035 9\s. Sara has to make sure there are not 035 9\s for that process. Is it just the case of having multiple 035 9\s or just multiple 035 9\s to match on? No matter how many 035 9\s are in the incoming and existing SRS, the exact match should be found. Create records where there are 2 035 9\s that are different. Create the srs and instance. Then there's an additional file to create holdings/items for that bib for each school. Trying to match on a unique 035 9\ on a record that has 2 unique 035 9\s. If the non unique 035 9\ is there, the job fails. Another example is the OCLC number where we have a lot of duplicates (OCLC with the index ocn,ocm,on or without).
- Discuss use case scenario provided by Jennifer Eustis.
- MODDATAIMP-879: Data Import removes duplicate 856s in SRS
- Field protections come into play here. We need to understand and define deduplication. Last week we talked about deleting the duplicate from the srs record and dedup on the display record. Do we know if it is deduping if every subfield is exactly the same? Robert will include that in the tests from Chicago. This is something that is critical to get documented. What is happening currently. The 655's are deduping when mapped from the srs to the instance. We need to know what FOLIO does to understand how to plan changes or added functionality.
- Chicago: But if we have two in the incoming record and they are very similar, say 856s that are the same but different $x data then we would not want to deduplicate the 856. If I have an incoming record with multiple 856s I would not want FOLIO to deduplicate at all unless my program indicated that. It indicates that we have two owned versions of that ebook, for instance, on the same platform but a part of two different collections.
- Clarify expectations on removal of duplicates.
- Need to clarify internally what we mean by de-duplication. De-duplication on the mapping to the inventory or holdings record vs removing duplicate data from the incoming srs records.
- Also need to make sure that the requirements between de-duplicating within an import record and de-duplication of protected fields when overlaying an existing FOLIO record and a new record being imported.
- Need to clarify what we mean by de-duplicate. It can be unclear what we are talking about
- UXPROD-2742: MARC-MARC matching enhancements: Narrowing multiple matches & Bug work
- Christie Thomas will create some dummy data to illustrate deduping 856s.
Upcoming meetings/agenda topics:
Chat: