2024-2-28 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Robert PlesharRaegan WiechertChristie ThomasYael Hod Linh ChangAaron Neslin Tess Amram Lynne Fors Taylor Smith

Notetaker: Jennifer

Links:

Agenda: 

TopicWhoMeeting NotesRelated JiraDecisions and Actions

Announcements

Jennifer/all

3 new features added to the DI Topic Tracker

  • Enforce an order of deletion: Right now we can't delete at the job level if the links are still there. See Topic Track for details.
  • Warning or error when job contains no action profiles: This might already be covered.
  • Ability to change the link to a profile

Discussion notes: 

Ryan will create new Jiras when a new topic does not have one. 

Order of deletion - put in guard rails to prevent deleting a job profile that has match or action profiles linked since you cannot delete those profiles that are linked to the job profile if the job profile has been deleted. Another aspect is how to clean up the orphaned profiles that cannot be deleted. Potential future functionality to delete child profiles as well at the same time. There may be a way to clean these up via the API, but not all institutions have expertise to work with the APIs.

Q: Does this go hand in hand with the ability to delete profiles in data export. Ryan will check in Magda about connections to the data export profiles. 


  • Ryan will create a Jira issue for these to then review them as a group
Closed Jira IssuesJennifer/all
  • Duplicate records in incoming file causes problems after overlay process with no error reported. Poppy (R2 2023)
  • Data Import log displays disconcertingly ambiguous information for successful matches on anything but 999i/instance UUID (Quesnelia)
  • Job Summary: error column does not display errors (Quesnelia)
  • Incorrect quantity is displayed in the cell of the "No Action" and "Error" rows at the individual import job's log (Quesnelia)
  • DI log for update srs bib records does not include instance update status or link to instance  (Quesnelia)
  • Status of instance is '-' in the import log after uploading file. Numbers of updated srs and instance aren't displayed in summary table (Quesnelia)

MODSOURCE-530 - Getting issue details... STATUS

MODSOURMAN-848 - Getting issue details... STATUS

UIDATIMP-1590 - Getting issue details... STATUS

MODSOURMAN-1117 - Getting issue details... STATUS

MODDATAIMP-984 - Getting issue details... STATUS

MODSOURMAN-1106 - Getting issue details... STATUS


Review issue reported by Christie:
Quantity Electronic seems to always be zero (In Poppy with CSP 1)
Christie Thomas

When creating electronic orders are being created through data import, the electronic resource quantity is not being mapped from marc or from a default value in the mapping profile and the funds are not being encumbered.   Test in Poppy CSP1 in local UChicago environment and in Poppy bugfest. See 

MODDATAIMP-1010 - Getting issue details... STATUS

For Stanford, Acquisition method is not mapping for Purchase. They are also not getting quantity or encumbrances. Also, order type is not mapping for Purchase when it is provided as a default in the import profile for orders. 

MODDATAIMP-1010 - Getting issue details... STATUS

Feature/Bug Review: 

UXPROD-4704: Stop processing the job after it was canceled by user (FKA MODSOURMAN-970)

Ryan/All

Previous notes from 2/21:

 Click here to expand...

MODSOURMAN-970 was transitioned to UXPROD-4704 after a review by development.  The fixes needed to address the issue involve multiple modules, hence why it is now a new feature. 

Target release is Ramsons and will be included on the priority review spreadsheet Ryan hopes to send out by the end of this week.  

Question : how does this impact or is impacted by the new data slicing functionality?

  • Answer : unknown; Ryan to investigate
UXPROD-4704 - Getting issue details... STATUS
  • Ryan Taylor: investigate any impacts on or by data slicing

    Answer 3/6:
    it will not have any effect to data-slicing, slicing happens prior to the processing in mod-data-import. This feature will be focused on stopping the processing of records that starts in mod-source-record-manager and further.
Bug Review: 

MODDATAIMP-897 - Adding MARC modifications to single record overlay doesn't respect field protections

  • Discuss expectations of using Modify actions
Ryan/Jennifer/All

A number of issues were seen with doing updates. It seems there might be regressions along with functionality that doesn't work:

On Create:

  • Adding a string to the beginning or end doesn't work
  • Adding a field with multiple subfields works EXCEPT for the indicators which aren't mapped

On Update:

  • Adding a string to the beginning or end doesn't work
  • Adding a field with multiple subfields works EXCEPT for the indicators which aren't mapped
  • Modification at the end of the file didn't remove the field from the incoming record and that field mapped to be removed was in the existing srs record at the completion of the job
  • Modifications at the end of a job don't seem to work

Logs:

  • On Update, encountered a number of errors such as
    • incoming file may contain duplicates when the file only had 1 record
    • 2 rows in the log when there was only 1 record in the file and where the summary had 1 update and 1 no action error

If all this work is being done, should we create jiras for all the ways in which marc modifications don't work? Should we create jiras for modifications that worked in orchid and that no longer work in poppy? It makes sense to create jiras for things that were working that aren't working in Poppy. Let's hold off on what marc modifications should work until we get the developers' findings.


Previous notes from 2/22 and 2/14:

 Click here to expand...

Discussion notes:

Last week, the DI lab started a spreadsheet to track this functionality. The group is still working on this and expects to at the 2/22 meeting.

RT: How are the protections working and how are they expected to work? 

Example from Jennifer Eustis:  Use Case: Export, Transform, Load. Import profile includes a marc modification to delete fields, such Matches on 999 ff $i.  Ideal to have a marc modification to remove unwanted marc fields: 029, 983, etc. Then a match on instance and update of instance. Marc modification at the end of the record results in marc modification. See screenshot of profile.


Comments that marc modifications was implemented with an expectation that marc modifications should be at the beginning of the job and should act on the incoming record. 

That is true, but past conversations in data import subgroup drew out two use cases: 1 to modify an incoming record before any actions are taken and 2) to modify the final srs record after all of the actions are taken. (Delete 9xx data after it is used to update the holdings and item, for example.)

General experience right now is that marc modifications are working as expected with creates, but is not working or working but with corruption (such as the deletion of protected fields) on updates.

RT: Is part of the problem how we are approaching updates vs modifications? Updates are designed to work with FOLIO records and modifications are designed to work on incoming records.  Should updates have the same potential actions as marc modifications applying the logic to the updated record?

Right now dependencies between srs and instance and the explicit nature of the updates on instance vs marc is problematic. It is difficult to understand what is happening with updates. Process is to put them anywhere to see where they work. Whether we are updating srs, instance or both we should be able to do the same thing. 

RT: You will see different behavior from marc modifications depending on its placement in the profile. Need to a deeper dive into how the behavior changes dependent on placement. 

This would be a good candidate for the functionality / documentation audit.  If development dives into this and the DI lab group dives into this, we could then come together to identify the best way forward

MODDATAIMP-897 - Getting issue details... STATUS

UXPROD-4709 - Getting issue details... STATUS

This would be a good candidate for the functionality / documentation audit. 

If development dives into this and the DI lab group dives into this, we could then come together to identify the best way forward

  • Development Review
  • DI Lab Group Review

Missing Action Profiles in Job Profile after Poppy migration: As called out in Poppy Release Notes, there is a known issue that's been observed in which some links to reusable Action Profiles might be missing from Job Profiles after Poppy migration. 


Release notes recommend the following:

  • After migration, review existing Job Profiles to verify they migrated correctly. Pay attention to reusable Action Profiles. In case issues are found, Job Profile can be updated manually. For additional information on links created for that Job Profile - execute script #15 (or follow the link), notify support.

Recommended script will provide list of Action profiles to help users manually recreate any affected Job profiles.


All

MODDINCONV-365 part of CSP#2.  

Previous notes from 2/7:

 Click here to expand...

There are 2 issues: experience of unlinking post migration and then experience of unlinking during migration. MODICONV-361 is a P1 with the hope to be released in a CSP #1. The MODICONV-365 is being investigated.

It looks like FOLIO system job profiles are being affected in terms of actions being unlinked. 5C saw that the default ISRI overlay wasn't working correctly. When we checked the default system job profile there were no actions profiles.

Ryan confirmed this issue only affects Action profiles. It is difficult to know how common this is. For 361, the behavior seems consistent. But for 365, this seems to be less common and different tenants have the issue occur on different jobs.

This is the 3rd or 4th time that the issue in MODDICONV-361 has appeared during a flower release. The unlinking/linking issues date back several releases.

To gather more information, it is worth keeping the corrupted jobs and create replacements.

A job with no action profiles or an empty job can be run. There are no error messages when such a job is run. This is something we shouldn't be able to do. Perhaps a warning or an error message is needed.


Previous notes from 1/31:

 Click here to expand...

Overview : Action profiles connected to multiple job profiles are 'unlinked' from job profiles after migration to Poppy.

  • It isn't happening for every library after migration or for every re-used profile.
  • However, it is happening often enough that libraries should be aware and check.  
  • General confusion on how or if the migration to Poppy is causing this issue.  Root cause is not migration, but the migration process does cause the profiles to unlink.
  • Script #15 (noted in the topic column) provides a list of profiles that need to be fixed.  It does not fix the links.  That must be done manually by the library.  
  • The action profiles actually disappear from the job profile, not just 'unlink'.  They must be re-added.

Comments :

  • Lots of work for libraries to recreate job profiles manually.
  • Should be a CSP candidate. High priority for correction.
  • Could be a blocker for some libraries to migrate to Poppy.
  • The "unlinking from one unlinks them all" issue has popped up multiple times.

Sidebar discussion in chat on how job profiles are deleted spurred #42 in the Data Import Issue Tracker.  

Until MODDICONV-361 is fixed, any time a re-used action profile is unlinked in a job profile it will be unlinked in all other job profiles.  Fixing it after migration doesn't stop it from happening again should a re-used action profile be unlinked.  

The development team will be adding new test cases to their workflow to test this type of scenario (re-used profiles) going forward.

MODDICONV-361 - Getting issue details... STATUS Issue specific to unlinking of Action profiles when used by multiple Job profiles after Poppy migration. Ticket now closed and included within Poppy CSP #1.

MODDICONV-365 - Getting issue details... STATUS
Issue specific to unlinking of Action profiles during migration as reported by Cornell. Plan to address have been identified and ticket is currently In Code Review.


Partial Matching:

Subject raised by Yael Hod 

Not discussed at the 2/21 meeting.  

Previous notes from 1/31:

 Click here to expand...

Partial matching, e.g. begins with, ends with, is required but it does not function as it should regardless of how it is configured.

  • System behaves as though it only looks for exact matches.
  • Examples of use include prefixes/suffixes to an 035 added by a vendor or library to designate the source of the record.
  • University of Chicago has had same issue.  Corrie submitted MODDICORE-386 on their behalf.
  • Question as to whether this is a bug or how the system is intended to function.  Documentation is needed.
  • #12 on the Data Import Issue Tracker.  



MODDICORE-386 - Getting issue details... STATUS

Ryan will :

  • Review Jira with Folijet leads to understand current design and identify requirement gaps.

Documentation: The group has identified a need for new, enhanced, or reorganized documentation around Data Import.

  • In a previous session, we agreed that completing a functionality audit spreadsheet would be a good first step
All

Not discussed at the 2/21 meeting.  


Previous notes from 1/24 meeting:
 Click here to expand...

In lab session on 1/18/2024, we created a wiki page, Data Import Topic Tracker, with guidelines on how to contribute and a spreadsheet to track issues. This is based on the work done in the Acquisitions SIG. An archive area was also created where we could archive outdated pages such as the Archived Data Import Implementers and Feature Discussion Topics.

The idea was to put down issues whether they were linked to a Jira issue or not. Some of the important information that we wanted to track was if there was a linked Jira and in particular when the issue was discussed in the working group and the decision(s) made in regard to that issue.

The spreadsheet is still being developed. Before we add more issues, the group in lab wanted to know:

  • Do we adopt this page and spreadsheet? If yes, do we have volunteers to populate it?
  • To make sure this page is maintained, the group suggested that the working group look at it once a month to see what is outstanding or new. Is this a practice we want to adopt?

Discussion: 

A link to the new Data Import topic tracker is at the top of the page. Format was worked on at last week's data import session. 

Q: is this only to track Jira tickets? Or will there be other topics added to the agenda. R: In Acq /RM individuals add stories to the topic tracker and the Jira may only be added later to the spreadsheet. (many think this is a good idea.)

Can reference the Acq/Resource Management implementers topic tracker.

Perhaps add widgets that bring in Jiras automatically based on the tag. 

Q: How to add "Click here and expand" text. R: Put the cursor where you want the text block to begin and use Insert Macro function. Type "Expand" to locate the Expand Macro.

Agreed to use the de-duplication discussion to work on building a useful functionality framework. 

N/A
  • Get volunteers to create a spreadsheet and start brainstorming - DONE
De-duplication: Continue conversation from previous session to clarify what we expect from de-duplication of field values when a record is loaded into FOLIO via Data Import.All

Ryan has discussed this with the team. He will get this in writing and will share this when done. Christie did some work as well in Poppy bugfest.

Not discussed at the 2/21 meeting.  

Previous notes from 1/24 meeting:

 Click here to expand...


Jennifer Eustis and Aaron Neslin found comments in the data-import-processing-core code that provides details about expected behavior for de-duplication.

These comments align with the behavior we are seeing except for when there is duplicate data in the incoming record. Data is being removed from the incoming record on update as well. 

Consensus seems to be that FOLIO should not be de-duplicating within the incoming record unless it is explicitly defined in an import profile.

Q: Is de-duplication something that should be able to be deactivated on a field by field basis? R: Sounds like a reasonable approach. There is also some concern that this would complicate an already complicated situation. 

Possible solution - a tool to deduplicate in another tool rather than within data import instead.

Suggestion to start with the functionality audit. RT can connect with the developers as a part of this audit. 

Q: Are we starting with how we as users expect functionality work or with how the developers expect it to work. R: Really should have both for each feature. Start from perceived / desired functionality of the users and add to it with designed functionality.  Suggestion to provide examples to the developers so that it is clear what we are expecting.

Pilot functionality audit with de-duplication and start with our understanding and then get input from the developers.



MODDATAIMP-879: Data Import removes duplicate 856s in SRS
  • RYAN: Clarify current behavior of field value de-duplication.
  • Define desired behavior of field value de-duplication (if different).
  • Christie Thomas will create some dummy data to illustrate deduping 856s.

Upcoming meetings/agenda topics:


Chat: