2024-5-1 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Christie Thomas, Robert Pleshar, Taylor Smith, Raegan Wiechert, Tess Amram, Ryan Taylor, Whitney Christopher, Kimberly Wiljanen, Corrie Hutchinson, Yael Hod, Aaron Neslin

Notetaker: Jennifer Eustis

Links:

Agenda: 

TopicWhoMeeting NotesRelated JiraDecisions and Actions

Announcements:


CSP #5 is coming out soon. The dashboard isn't working right now (https://folio-org.atlassian.net/jira/dashboards/10409)
  • Ryan will reach out to find the link and see if that link can be added to the wiki page
Review issue reported by Christie:
Quantity and Orders via Data Import (Currently planned for Poppy CSP)
all

5/1 Follow-up:

Different items need to be in separate MARC fields as shown in the example below.

Question: This isn't compatible with the Order import. Can there be more than one line that is mapping to an order? At the point of order, we don't have this information in terms of barcode, copy number, etc. Has this been tested with orders?

It hasn't been tested with Orders but only generally. This is new to Poppy. The functionality being described may not work with the workflows around Orders. Having the quantity driving the number of holdings and items seems better.

This seems good that quantity drives the number of items created. If there is an order for 3 copies from GOBI, then the amount encumbered will be 1 and won't know until someone pays the invoice. What information do we need to share with our vendors to make this work?

Christie tested multiple items but got an error about missing material types. Because of MODDATAIMP-1010, the https://folio-org.atlassian.net/browse/MODORDERS-881 was reopened. See also https://folio-org.atlassian.net/browse/UIDATIMP-1626. Referring to the order as the source of truth which is resolved in CSP #5.

Current documentation on how creation of multiple Items & Holdings via Data Import can be found at Import of MARC Bibs to create/update multiple Holdings and Items

  • Item data for different items must be in separate MARC fields, not concatenated in the same MARC field.
  • Sample record with multiple holdings/item data:

    =LDR 01262nam a2200301Ia 4500
    =001 ocm54341618\
    =003 OCoLC
    =005 20070103101904.0
    =008 010330s1798\\\\enkaf\\\\\\\\\000\0\eng\d
    =035 \\$a(Sirsi) a551407
    =035 \\$a(Sirsi) o54341618
    =049 \\$aDRUM
    =040 \\$aCUD$beng$cCUD$dDRU$dMvI
    =090 \\$aBS2095$b.S33 1798
    =130 0\$aBible.$pNew Testament$lEnglish.$sScarlett.$f1798.
    =245 12$aA translation of the New Testament from the original Greek /$chumbly attempted by Nathaniel Scarlett, assisted by men of piety & literature ; with notes.
    =260 \\$aLondon :$bPrinted by T. Gillet; and sold by Nathaniel Scarlett, No. 349, near Exeter 'Change, Strand; also F. & C. Rivington, St. Paul's Church Yard,$c1798.
    =300 \\$axi, 483, vi p., [1] folded leaf of plates :$bill. ;$c19 cm.
    =500 \\$aEngraved t.p.
    =500 \\$aIncludes Observations on some terms used in this translation: vi p. at end.
    =510 3\$aDarlow-Moule-Herbert 1433
    =700 1\$aScarlett, Nathaniel,$d1753-1802.
    =740 02$aObservations on some terms used in this translation.
    =945 \\$a34678234678246423786427$b1$hKU/CC/DI/M <===== $a = item barcode, $b = copy number, $h = holdings permanent location
    =945 \\$a34678234678246423786428$b2$hKU/CC/DI/M
    =945 \\$$a34678234678246423786429b1$hKU/CC/DI/A

Previous notes from 4/24:

 Click here to expand...

Latest as of 4/24:

  • Decision was made to revert Quantity/Location 'Check' & 'Overlay' processes that were introduced in Poppy.
  • This will essentially restore Order + Inventory entity Job profile functionality to the way it behaves in Orchid today.

Additional aspect for Subgroup to discuss:

  • A similar Check & Overlay process that occurs today is to align Material Type on the Order with Material Type of created Items (when Items are created as part of Data Import job)
    • Current functionality: If Purchase Order Status is set to 'Open', then Material Type field is greyed out in Order mapping. If Items are created as part of Data Import job, then the Material Type values in the Items will be used to fill in the Material Type field in POL.
    • Proposed functionality: If Purchase Order Status is set to 'Open', then Material Type field will not be greyed out and will remain mappable. This will allow the Order to remain source of truth for POL details and aligns with how we will be handling Holdings & Items against Quantity/Location discrepancies.
      • NOTE: In the event that the Material Type of created Items are different from the Material Type mapped in the Order, this could trigger Mod-orders to create additional redundant Items (similar to behavior seen if Order Location & Holdings Locations mismatch). So the responsibility is on the user to ensure that Order details and created Item details match.
  • Feedback from subgroup : 
    • Request for documentation on the checks performed by Data Import (why, when, how, etc.)
    • Hard to respond without knowing more about why DI was designed as it is now.
    • Ryan : we are learning together ; thank you for your patience.
    • Ryan : to dig up instructions on how to create multiple items
    • No objections to proposed functionality ; Ryan to move forward and keep everyone in the group as the work progresses. 
      • Proposed functionality centers on previous discussions around the order record being the source of truth.


Previous notes from 4/3:

 Click here to expand...

Proposed next steps to address:


Scenario - Items - Differing Quantities: 
  • Given a Data Import Job profile is run to create Orders, Instances, Holdings, & Items
  • And the Item quantity within Order details differs from quantity of created Items
  • Then FOLIO defers to mod-orders to create Item entities based on Order mapping, which will be linked to the Purchase Order Line (POL)
Scenario - Holdings:
  • Given a Data Import Job profile is run to create Orders, Instances, Holdings, & Items
  • And the Location details within the Order differ from those of created Holdings
  • Then FOLIO defers to mod-orders to create new Holdings entities based on Order mapping, which will be linked to the Purchase Order Line (POL)


If we move forward with the above, please consider the following details when using Data Import to create Orders with Inventory entities:

  • If locations & quantities differ between Order details and the Holdings & Items created by Data Import, then mod-orders will be triggered to create new Holdings & Items that do match the locations & quantities described in the Order. 
  • This scenario would lead to creation of erroneous/redundant Holdings or Items that are not linked to the POL of the Order.


Meeting notes from 4/3/2024 - 

Ryan Taylor shared a MIRO diagram for processing Orders via Data Import. 

Q: How are multiple items created as a part of the current process? Ryan - it is dictated by the inventory import profiles. 

Q: What is the end scenario for when there is a job profile that is set to only create an instance and holdings. The end state would be an additional holdings and item would be created because the quantity does not match the number of items. If items are not defined should not be getting an unnecessary item created, but suspicious that an additional holdings would be created. It is thought that if the location matches what is in the profile that a second holdings / item will not be created. Need clarification on whether there will be additional holdings / items created. 

Q: Why does the process not respect the quantity in the order import profile? Why can the order quantity and the location quantity is not be the same? May need to look at this from a different perspective. 

It was noted that the business logic requires the connection of the order to Inventory at the point of creation, so it makes sense for the order to "have control" of the processing. 

It was also noted that the only difference between creating an order in pending vs open is that the mapping of the inventory records should come from the field mapping profiles rather than the defaults in the order import app. 

The Orders (ol and pol) have a certain business logic where the quantity is linked to what is in Inventory in addition to Receiving and Finance apps. Orders should be the source of truth as this would allow for this.

Ryan has some questions to bring back to the dev team. How can we make it simpler? Can you start in Inventory and then create the Order and associate the inventory with the POL?


Previous notes from 3/27:

 Click here to expand...
  • 3/27 - Review related scenario raised by Christie in Slack
    Christie presented her DI job profile in her local environment in Poppy CSP 2. The profile worked as expected for single volume monograph orders. This profile creates an order and the quantity physical is being mapped from 980$q, an instance with "uncataloged" status, a holdings, and an item. When there is more than one quantity physical is greater than 1, then in the POL that quantity physical is 1 rather than the number from the mapping in the 984$q.  We can write this up as a separate bug and let the developers determine if it is part of MODDATAIMP-1010 or a separate bug. The fact that only 1 item is being creating needs to be investigated even when the quantity has more than 1. This is different and related to MODDATAIMP-1010. This could be related to an issue in CSP2 where if conditional mapping was used, the number of holdings and items created was incorrect. Could it also be an issue where there is only 1 980 so that there is just 1 holdings and items? Could it be looking for the presence for multiple 980s? This could make sense in regular bib imports but not creating orders. What happens if there is no conditional in the mapping for quantity? Christie will try this out. When there is no conditional, the quantity is still one. When you remove the create items and holdings, then the quantity is what is being mapped. The location maps correctly from the mapping in orders to holdings record.

What happened and how do we move forward? What is the underlying issue? This is a complex scenario where we need to drill into an architecture level. Can we document what we have that can be understood? Does it make sense for real live scenarios? In the immediate time frame, the goal is to make sure we don't get quantity zero. If it is thread and if it is bigger issue, do we have short term workarounds? At the moment, not sure how this will be addressed as there seems to a divide as to how this works. How large of a task is it to investigate and avoid issues that we're seeing for the short term?

We need to talk to Dennis because location and cost quantity are tied together. What would happened if this was severed? This makes sense.

There's a bigger issue that needs to be unpacked. We have the immediacy of continuing to do our work. We need an understanding of what we expect.

If we order 10 copies of something, then the quantity should say 10.

For an order to be open, then you have to create inventory. For an order to be in the pending state, then the Order App does the creation of the inventory records.



Previous notes from 3/20:

 Click here to expand...

In discussing this issue with the Folijet team, I've learned that the described behavior is a result of requirements to help avoid Item duplicates in support of the Multiples enhancements found in Poppy. 

Would the following logic/scenario make sense as a possible path forward?

  • If Job profile contains 'Create' Action profiles for Orders, Instance, and/or Holdings, then POL Quantity value should be controlled by mapping.
  • If Job profile contains 'Create' Action profiles for Orders, Items, and/or Instance, and/or Holdings , then POL Quantity value should be controlled by the number of Items created.

Discussion :

  • Based on previous conversation, assumption is that the first bullet is the ideal logic.
  • Questions : what scenario is requiring this complication?  Why wouldn't the quantity in the order always match the quantity in the ingest file?
  • Reasoning for current situation is unclear.
  • Cost and quantity in an ingest file are related.  Using the # of items instead of the values in the incoming file breaks the logic and expectation of a user.
    • Standing orders are a good example : one (quantity = 1) set for $X dollars instead of $X dollars per item
    • Orders can be for a set or a part; practice varies by library & vendor
    • Controlling the quantity by the mapping leaves these values up to the library ; maximum flexibility
  • Confounding variable could be that the quantity ordered must match the quantity by location.
    •  Recommendation to talk to Dennis (PO of Acquisitions) for clarity, further information.
  • Question from Ryan : should the number of holdings records created have an impact on quantity?
    • The quantity in the POL should always come from the mapping provided by a library.
    • There is a situation where two items could be ordered, destined for different locations corresponding to multiple holdings records, but both locations aren't known during the time of order.   
    • Locations are often assigned as part of the cataloging process, not the acquisitions process.
    • The order record is the source of truth.  


Previous notes from 3/13:

 Click here to expand...
  • Was this behaving differently in Orchid?
  • Behavior seen in this bug is that if Create profiles for Holdings or Items are included in the Job profile, then Quantity value is controlled by the number of Items created. So if Holdings profile is included, but not Items, then Quantity will result as 0.
    • Does this make sense to you that Quantity should be controlled by Items created as part of job or should it always be controlled by the Order mapping?


Previous notes from 2/28:

 Click here to expand...

When creating electronic orders are being created through data import, the electronic resource quantity is not being mapped from marc or from a default value in the mapping profile and the funds are not being encumbered.   Test in Poppy CSP1 in local UChicago environment and in Poppy bugfest. See 

MODDATAIMP-1010 - Getting issue details... STATUS

For Stanford, Acquisition method is not mapping for Purchase. They are also not getting quantity or encumbrances. Also, order type is not mapping for Purchase when it is provided as a default in the import profile for orders. 

Discussion notes:

No one has used the order imports in Orchid. 

Feedback that the quantity should come from the profile/order and not the number of items. Common scenarios where items are not going to be created. 

When processing orders the quantity is always controlled from the order.

MODDATAIMP-1010 - Getting issue details... STATUS

Earlier related work:

UXPROD-2741 - Getting issue details... STATUS

MODORDERS-876 - Getting issue details... STATUS

MODORDERS-881 - Getting issue details... STATUS

  • Ryan: provide link to multiple item creation instructions.
  • Ryan will clean up the instructions.
  • Ryan will make some more diagrams for the orders and how to move forward with enhancements
  • Ryan to create spike in FOLIJET team for enhancing multiple holdings and items for MARC orders
De-duplication: Continue conversation from previous session to clarify what we expect from de-duplication of field values when a record is loaded into FOLIO via Data Import.All

5/1: Folijet leads have tested update imports with duplicated 5x,8x,9x fields and confirmed that we currently de-duplicate all. This, so far as I can tell, is be design based on original requirements.

Question: What is the desired functionality of de-duplication in Data Import?

Ryan spoke to the team. The 500s, 800s, and 900s were being deduplicated. This seems to be the design based on original requirements. Do we want this every time? Do we want to control this? Do we want FOLIO to make this type of judgement call?

Are we looking at only incoming records or existing records that are being updated? Is the deduplication a 100% match or partial? Does it take into account indicators and subfields?

Corrie did testing and found all subfields have to match.

Christie has done testing and has observed that protected fields on overlay create 2 same 856s. The deduplication happens on the incoming record after the process to dedupe the FOLIO record and incoming record. In the mapping, if you have 2 same 856s, this is removed. We don't want it to be removed from the record. We need clarification on the types and levels of deduplication. Concerned about removing data in an automated way.

There might be scenarios where we want deduplication whether manual or automated at the instance level but not at the marc srs bib level. This is like subject headings. Could this be handled by the mapping of the instance from MARC? If there are multiple 856s in the srs marc bib, then the discovery system knows to display only one of the duplicates? At Chicago, it shows all of them. 

Deduplication with protected fields want to dedupe. With no protected fields, want to retain all duplicates in marc srs bib but not in the instance.


Previous notes from 1/24 meeting:

 Click here to expand...

Jennifer Eustis and Aaron Neslin found comments in the data-import-processing-core code that provides details about expected behavior for de-duplication.

These comments align with the behavior we are seeing except for when there is duplicate data in the incoming record. Data is being removed from the incoming record on update as well. 

Consensus seems to be that FOLIO should not be de-duplicating within the incoming record unless it is explicitly defined in an import profile.

Q: Is de-duplication something that should be able to be deactivated on a field by field basis? R: Sounds like a reasonable approach. There is also some concern that this would complicate an already complicated situation. 

Possible solution - a tool to deduplicate in another tool rather than within data import instead.

Suggestion to start with the functionality audit. RT can connect with the developers as a part of this audit. 

Q: Are we starting with how we as users expect functionality work or with how the developers expect it to work. R: Really should have both for each feature. Start from perceived / desired functionality of the users and add to it with designed functionality.  Suggestion to provide examples to the developers so that it is clear what we are expecting.

Pilot functionality audit with de-duplication and start with our understanding and then get input from the developers.



MODDATAIMP-879: Data Import removes duplicate 856s in SRS
  • RYAN: Clarify current behavior of field value de-duplication.
  • Define desired behavior of field value de-duplication (if different).
  • Christie Thomas will create some dummy data to illustrate deduping 856s.
  • More discussion on whether we could have a setting to dedupe on instance only rather than on marc srs bib level

Review results of feature prioritization exercise











Notes from previous meetings...





Upcoming meetings/agenda topics:


Chat: