2024-4-24 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Taylor Smith, Jennifer Eustis, Raegan Wiechert, Christie Thomas, Ryan Taylor, Corrie Hutchinson, Yael Hod Robert Scheier Peter Martinez Jeanette Kalchik Lynne Fors Jenn Colt Ellis Butler

Notetaker: Corrie Hutchinson

Links:

Agenda: 

TopicWhoMeeting NotesRelated JiraDecisions and Actions

Announcements:


Poppy CSP 4 is out now: https://folio-org.atlassian.net/jira/dashboards/10405

A CSP#5 is planned ; what is included and when it will be released is TBD.

Request from Raegan for a presentation or data on the difference between an ECS & non-ECS environment for a variety of modules/functions.

  • For Data Import, no big difference.

  • Ryan to talk to other POs about ECS request.
Agenda Cleanup

Review of items under 'Notes from previous meetings....' section.

  • Do we want to go through the Google spreadsheet related to UXPROD-4303?
    • Yes, keep topic on the agenda.
  • MARC Modifications 
    • Keep on agenda
    • Jennifer E to test in Quesnelia as time allows
  • Look to start spreading out some of these items into the next few meetings.
  • Removing UXPROD-4704 for now ; will reintroduce once it's back on the schedule
  • MODDATAIMP-897 : discuss at the same as MARC Modifications
  • Removing 'Missing Action Profiles in Job Profile after Poppy Migration' from the list as addressed
  • MODDICORE-386 (matching on qualifier) 
    • Keep on agenda ; Ryan to research and come back to the group 
  • Documentation
    • A recurring conversation/topic ; removing from list as a specific agenda item
  • De-duplication (MODDATAIMP-879)
    • Keep on agenda ; Ryan to research and come back to the group

Question on how the agenda is determined? 

  • Latest topics on Slack?  Discussion/consensus on what to discuss?  How do we prioritize the backlog?  General discussion on how to pick topics.
  • Good idea?  Ryan to start planning agenda items a few weeks out and post on SIG page(s).

Future agenda items : 

  • Results of priority (voting) exercise for Ramsons tickets ; Ryan planning to discuss on 5/1/2024
  • Start discussion of tickets & issues to be addressed in Sunflower

  • Ryan : start planning agenda items a few weeks ahead.
  • Ryan : look into MODDICORE-386
  • Ryan : look into MODDATAIMP-879
Review issue reported by Christie:
Quantity and Orders via Data Import (Currently planned for Poppy CSP)
all

Latest as of 4/24:

  • Decision was made to revert Quantity/Location 'Check' & 'Overlay' processes that were introduced in Poppy.
  • This will essentially restore Order + Inventory entity Job profile functionality to the way it behaves in Orchid today.

Additional aspect for Subgroup to discuss:

  • A similar Check & Overlay process that occurs today is to align Material Type on the Order with Material Type of created Items (when Items are created as part of Data Import job)
    • Current functionality: If Purchase Order Status is set to 'Open', then Material Type field is greyed out in Order mapping. If Items are created as part of Data Import job, then the Material Type values in the Items will be used to fill in the Material Type field in POL.
    • Proposed functionality: If Purchase Order Status is set to 'Open', then Material Type field will not be greyed out and will remain mappable. This will allow the Order to remain source of truth for POL details and aligns with how we will be handling Holdings & Items against Quantity/Location discrepancies.
      • NOTE: In the event that the Material Type of created Items are different from the Material Type mapped in the Order, this could trigger Mod-orders to create additional redundant Items (similar to behavior seen if Order Location & Holdings Locations mismatch). So the responsibility is on the user to ensure that Order details and created Item details match.
  • Feedback from subgroup : 
    • Request for documentation on the checks performed by Data Import (why, when, how, etc.)
    • Hard to respond without knowing more about why DI was designed as it is now.
    • Ryan : we are learning together ; thank you for your patience.
    • Ryan : to dig up instructions on how to create multiple items
    • No objections to proposed functionality ; Ryan to move forward and keep everyone in the group as the work progresses. 
      • Proposed functionality centers on previous discussions around the order record being the source of truth.

Previous notes from 4/3:

 Click here to expand...

Proposed next steps to address:


Scenario - Items - Differing Quantities: 
  • Given a Data Import Job profile is run to create Orders, Instances, Holdings, & Items
  • And the Item quantity within Order details differs from quantity of created Items
  • Then FOLIO defers to mod-orders to create Item entities based on Order mapping, which will be linked to the Purchase Order Line (POL)
Scenario - Holdings:
  • Given a Data Import Job profile is run to create Orders, Instances, Holdings, & Items
  • And the Location details within the Order differ from those of created Holdings
  • Then FOLIO defers to mod-orders to create new Holdings entities based on Order mapping, which will be linked to the Purchase Order Line (POL)


If we move forward with the above, please consider the following details when using Data Import to create Orders with Inventory entities:

  • If locations & quantities differ between Order details and the Holdings & Items created by Data Import, then mod-orders will be triggered to create new Holdings & Items that do match the locations & quantities described in the Order. 
  • This scenario would lead to creation of erroneous/redundant Holdings or Items that are not linked to the POL of the Order.


Meeting notes from 4/3/2024 - 

Ryan Taylor shared a MIRO diagram for processing Orders via Data Import. 

Q: How are multiple items created as a part of the current process? Ryan - it is dictated by the inventory import profiles. 

Q: What is the end scenario for when there is a job profile that is set to only create an instance and holdings. The end state would be an additional holdings and item would be created because the quantity does not match the number of items. If items are not defined should not be getting an unnecessary item created, but suspicious that an additional holdings would be created. It is thought that if the location matches what is in the profile that a second holdings / item will not be created. Need clarification on whether there will be additional holdings / items created. 

Q: Why does the process not respect the quantity in the order import profile? Why can the order quantity and the location quantity is not be the same? May need to look at this from a different perspective. 

It was noted that the business logic requires the connection of the order to Inventory at the point of creation, so it makes sense for the order to "have control" of the processing. 

It was also noted that the only difference between creating an order in pending vs open is that the mapping of the inventory records should come from the field mapping profiles rather than the defaults in the order import app. 

The Orders (ol and pol) have a certain business logic where the quantity is linked to what is in Inventory in addition to Receiving and Finance apps. Orders should be the source of truth as this would allow for this.

Ryan has some questions to bring back to the dev team. How can we make it simpler? Can you start in Inventory and then create the Order and associate the inventory with the POL?


Previous notes from 3/27:

 Click here to expand...
  • 3/27 - Review related scenario raised by Christie in Slack
    Christie presented her DI job profile in her local environment in Poppy CSP 2. The profile worked as expected for single volume monograph orders. This profile creates an order and the quantity physical is being mapped from 980$q, an instance with "uncataloged" status, a holdings, and an item. When there is more than one quantity physical is greater than 1, then in the POL that quantity physical is 1 rather than the number from the mapping in the 984$q.  We can write this up as a separate bug and let the developers determine if it is part of MODDATAIMP-1010 or a separate bug. The fact that only 1 item is being creating needs to be investigated even when the quantity has more than 1. This is different and related to MODDATAIMP-1010. This could be related to an issue in CSP2 where if conditional mapping was used, the number of holdings and items created was incorrect. Could it also be an issue where there is only 1 980 so that there is just 1 holdings and items? Could it be looking for the presence for multiple 980s? This could make sense in regular bib imports but not creating orders. What happens if there is no conditional in the mapping for quantity? Christie will try this out. When there is no conditional, the quantity is still one. When you remove the create items and holdings, then the quantity is what is being mapped. The location maps correctly from the mapping in orders to holdings record.

What happened and how do we move forward? What is the underlying issue? This is a complex scenario where we need to drill into an architecture level. Can we document what we have that can be understood? Does it make sense for real live scenarios? In the immediate time frame, the goal is to make sure we don't get quantity zero. If it is thread and if it is bigger issue, do we have short term workarounds? At the moment, not sure how this will be addressed as there seems to a divide as to how this works. How large of a task is it to investigate and avoid issues that we're seeing for the short term?

We need to talk to Dennis because location and cost quantity are tied together. What would happened if this was severed? This makes sense.

There's a bigger issue that needs to be unpacked. We have the immediacy of continuing to do our work. We need an understanding of what we expect.

If we order 10 copies of something, then the quantity should say 10.

For an order to be open, then you have to create inventory. For an order to be in the pending state, then the Order App does the creation of the inventory records.



Previous notes from 3/20:

 Click here to expand...

In discussing this issue with the Folijet team, I've learned that the described behavior is a result of requirements to help avoid Item duplicates in support of the Multiples enhancements found in Poppy. 

Would the following logic/scenario make sense as a possible path forward?

  • If Job profile contains 'Create' Action profiles for Orders, Instance, and/or Holdings, then POL Quantity value should be controlled by mapping.
  • If Job profile contains 'Create' Action profiles for Orders, Items, and/or Instance, and/or Holdings , then POL Quantity value should be controlled by the number of Items created.

Discussion :

  • Based on previous conversation, assumption is that the first bullet is the ideal logic.
  • Questions : what scenario is requiring this complication?  Why wouldn't the quantity in the order always match the quantity in the ingest file?
  • Reasoning for current situation is unclear.
  • Cost and quantity in an ingest file are related.  Using the # of items instead of the values in the incoming file breaks the logic and expectation of a user.
    • Standing orders are a good example : one (quantity = 1) set for $X dollars instead of $X dollars per item
    • Orders can be for a set or a part; practice varies by library & vendor
    • Controlling the quantity by the mapping leaves these values up to the library ; maximum flexibility
  • Confounding variable could be that the quantity ordered must match the quantity by location.
    •  Recommendation to talk to Dennis (PO of Acquisitions) for clarity, further information.
  • Question from Ryan : should the number of holdings records created have an impact on quantity?
    • The quantity in the POL should always come from the mapping provided by a library.
    • There is a situation where two items could be ordered, destined for different locations corresponding to multiple holdings records, but both locations aren't known during the time of order.   
    • Locations are often assigned as part of the cataloging process, not the acquisitions process.
    • The order record is the source of truth.  


Previous notes from 3/13:

 Click here to expand...
  • Was this behaving differently in Orchid?
  • Behavior seen in this bug is that if Create profiles for Holdings or Items are included in the Job profile, then Quantity value is controlled by the number of Items created. So if Holdings profile is included, but not Items, then Quantity will result as 0.
    • Does this make sense to you that Quantity should be controlled by Items created as part of job or should it always be controlled by the Order mapping?


Previous notes from 2/28:

 Click here to expand...

When creating electronic orders are being created through data import, the electronic resource quantity is not being mapped from marc or from a default value in the mapping profile and the funds are not being encumbered.   Test in Poppy CSP1 in local UChicago environment and in Poppy bugfest. See 

MODDATAIMP-1010 - Getting issue details... STATUS

For Stanford, Acquisition method is not mapping for Purchase. They are also not getting quantity or encumbrances. Also, order type is not mapping for Purchase when it is provided as a default in the import profile for orders. 

Discussion notes:

No one has used the order imports in Orchid. 

Feedback that the quantity should come from the profile/order and not the number of items. Common scenarios where items are not going to be created. 

When processing orders the quantity is always controlled from the order.

MODDATAIMP-1010 - Getting issue details... STATUS

Earlier related work:

UXPROD-2741 - Getting issue details... STATUS

MODORDERS-876 - Getting issue details... STATUS

MODORDERS-881 - Getting issue details... STATUS

  • Ryan: provide link to multiple item creation instructions.
Mark Arnold's question from 4/9/24: Error message when loading record with 710 2\$w

Original post in Slack: https://folio-project.slack.com/archives/CA39M62BZ/p1712679155100209

Error Mark posted : 
io.vertx.core.impl.NoStackTraceThrowable: {"errors":[{"message":"must not be null","type":"1","code":"javax.validation.constraints.NotNull.message","parameters":[{"key":"contributors[9].name","value":"null"}]}]}


"I managed to get the record to load. Originally it had a 710 field that looked like this:
=710 2\$wnne
Subfield 'w' is not defined for 710, so I changed the field to this:
=710 2\$awnne"

  • Theories / input :
    • Subfield 'w' is not valid.  But DI isn't doing validation?
    • An issue with the display of the 710?  
    • The MARC → Instance mapping of the 710 expects certain fields.
    • Similar to an issue at Chicago with the 100 $4 MODSOURMAN-1085 - Getting issue details... STATUS  
      • The fix for this ticket is in Quesnelia.  Has testing been done on this issue in Quesnelia-bugfest to see if it is addressed as part of this ticket?  (larger issue)
  • General discussion on how/when FOLIO validates MARC







Notes from previous meetings...





Review/Discuss: UXPROD-4303: Set instance/bib record for deletionRyan

Review and discuss feedback from initial testing of new 'Set record for deletion' action for Instances.
See notes from DI Lab sessions here: 

Discussion : 

  • Ryan has reviewed the spreadsheet of feedback and shared some already with the dev team.  
  • MODSOURCE-756 : newly submitted ticket 
    • Folijet is reviewing the ticket and options for addressing it
    • Hoping to address with a simple BE update
  • In some instances, neither 'suppress' or 'delete' covers all needs.  
  • Advocation for a separate section in the 'Actions' button for the deletion actions.
  • Lots of open questions concerning dependencies (outlined on the spreadsheet).  Important for future phases of this project (not part of Phase 1).
    • Note : ensure that item status is accounted for in reviewing dependencies ; i.e. items on loan 
  • Should the default behavior be that the user is expected to delete holdings and items before the instance? i.e. dependencies aren't as important
    • Use case could be that holdings and items are marked for deletion separately; last step is to mark instance for deletion and clean all of them out at that point
    • Discussion on various internal procedures for withdrawing and deleting records
    • Practices in place in FOLIO to address the current inability to delete records leads some libraries to prefer the idea of deleting instances, holdings, and items in one step.
    • Practices can vary between libraries based on type of material: physical vs. electronic
  • No consensus on when to use a 'hard' vs. 'soft' delete.
  • Discussion on ability and need to delete the SRS & Instance separately. 


Notes from 3/13 meeting:

 Click here to expand...

Set bib record for deletion is in place in snapshot. 

New set record for deletion action. Need a specific permission to have access to this. 9Separate from inventory All permissions.

Click the action to suppress the record from discovery, staff suppress the record, and set the LDR 05 to d. New SRS property "deleted" set to true. 

Staff suppress will now be defaulted to No, so will not show up in search. 

Q: What happens to attached orders? A: Nothing right now, but Ryan will check on impact on associated records. 

Q: Can you do this batch with a delete file in Data Import? A: No. Just an individual, manual action to take. Step 1 of longer term plans for full deletion options. 

Q: Are other steps going to be continued before this becomes part of a release? A: Will be included as is in Quesnalia. Holdings and items should not be affected as part of this release. Concern: Hacving holdings and items available with the instance being deleted. Cannot imagine a situation in which you want to delete an instance and still have the holdings and item information available with a search. Also, if an item is checked out, it should block you from deleting a bib. Or on hold. Or in course reserves. Not interactive link with inventory or circulation control. 

Follow up Q: Would there be a script available to get this added to Instance records a library has already marked for delete? 

Q: What is the push this out in its partially developed state? A: Someone would use it to delete duplicate instances in the system. These things usually do not have holdings or items.

Q: What would happen if a deleted record was matched and updated as a part of a data import? A: Believe that these records are not discoverable via data import matching and updates, but will check on that.

Q: How do holdings and items show up in search if the instance is not available in the search? A: notetaker is unclear of the answer to this question

Q: Can you reverse this process or undelete things? A: Manually edit the instance to undo the suppression, but cannot edit the marc once it has been set to deletion. Triggered by the status to deletion. Question about whether we need to revisit this. This functionality should be robustly described in the documentation so users fully understand the implications of deleting something. 

Suggestion: The warning toast should be delete rather than just suppressed from discovery and staff suppressed because there are implications like not being able to edit the srs marc

Suggestion to postpone until Ramsons.

UXPROD-4303 - Getting issue details... STATUS

MODSOURCE-756 - Getting issue details... STATUS


MARC Modification Testing Jennifer Eustis

Link to spreadsheet:

  • Making DI UI consistent with quickMarc and Bulk Edit. For example, blanks are denoted by the \ rather than a space.
  • Significant challenges with using MARC modifications with updates. One idea shared in lab was to see if we want to start with a baseline approach of how and where we need to have MARC modifications, develop that, and then build on that foundation.


Bug Review: 

MODDATAIMP-897 - Adding MARC modifications to single record overlay doesn't respect field protections

  • Discuss expectations of using Modify actions
Ryan/Jennifer/All

A number of issues were seen with doing updates. It seems there might be regressions along with functionality that doesn't work:

On Create:

  • Adding a string to the beginning or end doesn't work
  • Adding a field with multiple subfields works EXCEPT for the indicators which aren't mapped

On Update:

  • Adding a string to the beginning or end doesn't work
  • Adding a field with multiple subfields works EXCEPT for the indicators which aren't mapped
  • Modification at the end of the file didn't remove the field from the incoming record and that field mapped to be removed was in the existing srs record at the completion of the job
  • Modifications at the end of a job don't seem to work

Logs:

  • On Update, encountered a number of errors such as
    • incoming file may contain duplicates when the file only had 1 record
    • 2 rows in the log when there was only 1 record in the file and where the summary had 1 update and 1 no action error

If all this work is being done, should we create jiras for all the ways in which marc modifications don't work? Should we create jiras for modifications that worked in orchid and that no longer work in poppy? It makes sense to create jiras for things that were working that aren't working in Poppy. Let's hold off on what marc modifications should work until we get the developers' findings.


Previous notes from 2/22 and 2/14:

 Click here to expand...

Discussion notes:

Last week, the DI lab started a spreadsheet to track this functionality. The group is still working on this and expects to at the 2/22 meeting.

RT: How are the protections working and how are they expected to work? 

Example from Jennifer Eustis:  Use Case: Export, Transform, Load. Import profile includes a marc modification to delete fields, such Matches on 999 ff $i.  Ideal to have a marc modification to remove unwanted marc fields: 029, 983, etc. Then a match on instance and update of instance. Marc modification at the end of the record results in marc modification. See screenshot of profile.


Comments that marc modifications was implemented with an expectation that marc modifications should be at the beginning of the job and should act on the incoming record. 

That is true, but past conversations in data import subgroup drew out two use cases: 1 to modify an incoming record before any actions are taken and 2) to modify the final srs record after all of the actions are taken. (Delete 9xx data after it is used to update the holdings and item, for example.)

General experience right now is that marc modifications are working as expected with creates, but is not working or working but with corruption (such as the deletion of protected fields) on updates.

RT: Is part of the problem how we are approaching updates vs modifications? Updates are designed to work with FOLIO records and modifications are designed to work on incoming records.  Should updates have the same potential actions as marc modifications applying the logic to the updated record?

Right now dependencies between srs and instance and the explicit nature of the updates on instance vs marc is problematic. It is difficult to understand what is happening with updates. Process is to put them anywhere to see where they work. Whether we are updating srs, instance or both we should be able to do the same thing. 

RT: You will see different behavior from marc modifications depending on its placement in the profile. Need to a deeper dive into how the behavior changes dependent on placement. 

This would be a good candidate for the functionality / documentation audit.  If development dives into this and the DI lab group dives into this, we could then come together to identify the best way forward

MODDATAIMP-897 - Getting issue details... STATUS

UXPROD-4709 - Getting issue details... STATUS

This would be a good candidate for the functionality / documentation audit. 

If development dives into this and the DI lab group dives into this, we could then come together to identify the best way forward

  • Development Review
  • DI Lab Group Review

Partial Matching:

Subject raised by Yael

Not discussed at the 2/21 meeting.  

Previous notes from 1/31:

 Click here to expand...

Partial matching, e.g. begins with, ends with, is required but it does not function as it should regardless of how it is configured.

  • System behaves as though it only looks for exact matches.
  • Examples of use include prefixes/suffixes to an 035 added by a vendor or library to designate the source of the record.
  • University of Chicago has had same issue.  Corrie submitted MODDICORE-386 on their behalf.
  • Question as to whether this is a bug or how the system is intended to function.  Documentation is needed.
  • #12 on the Data Import Issue Tracker.  



MODDICORE-386 - Getting issue details... STATUS

Ryan will :

  • Review Jira with Folijet leads to understand current design and identify requirement gaps.
De-duplication: Continue conversation from previous session to clarify what we expect from de-duplication of field values when a record is loaded into FOLIO via Data Import.All

Ryan has discussed this with the team. He will get this in writing and will share this when done. Christie did some work as well in Poppy bugfest.

Not discussed at the 2/21 meeting.  

Previous notes from 1/24 meeting:

 Click here to expand...


Jennifer Eustis and Aaron Neslin found comments in the data-import-processing-core code that provides details about expected behavior for de-duplication.

These comments align with the behavior we are seeing except for when there is duplicate data in the incoming record. Data is being removed from the incoming record on update as well. 

Consensus seems to be that FOLIO should not be de-duplicating within the incoming record unless it is explicitly defined in an import profile.

Q: Is de-duplication something that should be able to be deactivated on a field by field basis? R: Sounds like a reasonable approach. There is also some concern that this would complicate an already complicated situation. 

Possible solution - a tool to deduplicate in another tool rather than within data import instead.

Suggestion to start with the functionality audit. RT can connect with the developers as a part of this audit. 

Q: Are we starting with how we as users expect functionality work or with how the developers expect it to work. R: Really should have both for each feature. Start from perceived / desired functionality of the users and add to it with designed functionality.  Suggestion to provide examples to the developers so that it is clear what we are expecting.

Pilot functionality audit with de-duplication and start with our understanding and then get input from the developers.



MODDATAIMP-879: Data Import removes duplicate 856s in SRS
  • RYAN: Clarify current behavior of field value de-duplication.
  • Define desired behavior of field value de-duplication (if different).
  • Christie Thomas will create some dummy data to illustrate deduping 856s.

Upcoming meetings/agenda topics:


Chat: