2025-2-5 Data Import Subgroup meeting
Recordings are posted Here (2022+) and Here (pre-2022) Slack channel for Q&A, discussion between meetings
Requirements details Here Additional discussion topics in Subgroup parking lot
Attendees: Ryan Taylor, Christie Thomas, Jennifer Eustis
Notetaker: Jennifer Eustis
Links:
- Data Import Topic Tracker
- Data Import Roadmap
- Poppy import planning dashboard
- Poppy timeline
- Quesnelia import planning dashboard
- Quesnelia timeline
- Folijet Current Development Board
- Folijet (Data import) Bug Dashboard
Agenda:
Topic | Who | Meeting Notes | Related Jira | Decisions and Actions |
---|---|---|---|---|
Announcements: | We are looking for a new group of volunteers to take notes. Please consider helping out. Save the date for WOLFcon 2025 9/23-9/25 EBSCO FOLIO Day is coming up May 2, 2025 | |||
Documentation Discussion: Identify gaps/needs | Ryan/All | Spreadsheet for tracking documentation needs: | ||
Data Import Release Notes for Ramsons | All | Links to documentation are also there when it is available. This time around, Ramsons issues for Data Export were added as this working group no longer meets. | ||
Review Topics Tracker | All | Data Import Implementers Topic Tracker
This is something for Univ. of Chicago which hasn't implemented file splitting because of the logs where you see the logs for the parts and not a log entry for the file that was uploaded before splitting. Without file splitting, they are limited to 1000 records at a time and the logs are not working as intended. They don't have enough resources to view each log corresponding to the parts. There is a performance issue with logs. The current approach to file splitting is ok as a permanent solution. Sara: Could this be a setting at the tenant level to switch to one log entry per file uploaded and not one log per split file? That's fair. The other issue is the default log page shows a 100 lines and there's a lot of noise. Sara: I've found it helpful to have the logs for each split file to focus on that log that has errors. I haven't found the new to review all the logs. Logs hogging the page - I just go over to find my stuff. Rob: For Chicago, it doesn't tell if there is an error if there is no action on a record. If there is a 100 field with an e but no e, then it doesn't load. We can't open up a log bigger than a thousand. There seems to be a performance issue when the log has trouble loading in the UI. Christie: We have a number of examples where records aren't loading but they don't show as errors. It is necessary to look at all the logs because of this. Data Import has been improved. We don't really know what the jobs really do and we rely on QA tasks to do this. Looking at the logs is a part of that. Jennifer: We should really look at the logs and do an overhaul themselves. Christie: The logs do need to be reviewed. We encounter a lot of issues especially with MARC orders. Ideally there needs to be a setting. DI needs to be performative. I'm concerned about the thin implementation as it won't save time. We need a statement from the community. Jennifer: Here the DI has been working well and we can import more than we ever have before. Christie: We didn't turn on splitting and DI is worse than ever. If splitting is the long term solution, then we have to turn it on. We haven't turned it on because of the logs. If splitting is on, then they need to rely on metadb. Rob: Chicago is on Quesnelia csp6. Another question we need to ask is if other places are importing smaller files because they know it won't work. Sara: We try not to step on our toes. We base imports on size, how often a set is updated or deleted. We also use EDS Custom catalogs for some sets because of this. Raegan: We do have consortium that have to deal with a large scale. The highest record count is 22 records. One log versus multiple logs, this isn't a concern. If we could authority work, this might change. Robert: Is the reason don't load Safari books online or large ebook collections because you load them into a shared discovery layer? Sara: All these things are shared. It's because the sets are highly changeable. Christie: This has been a problem within the community. We need to make a decision about whether DI can't support large transactional imports. Chicago can't keep trying to make DI work for them. Ryan: This has been a good conversation. We need to be explicit about the expectations of large transactional imports. We need to look into a setting for when file splitting is enabled to consolidate the logs for a job. Smaller scale is a more general review where logs are at for us today. Being able to change the default display. No action isn't logged as an error. Jennifer: We need to be able to learn from Bulk Edit and how the logs are there such as being able to download a csv of log entries. Ryan: How do we get this into the backlog? We could look at a general feature for enhancements for DI logs and a separate feature for consolidated logs when splitting is turned on.
They ran the script. Now all the records they couldn't edit are now editable. When they did an overlay, there continued the same issue of the record not being editable. What does the script do? The fix is to clean up multiple srs marc bibs that have state as "Actual" and doesn't address the underlying issue. They are running on Quesneslia CSP 8. The overall fix is for Ramsons and the script was for Quesnelia to get people through to Ramsons. Ryan will bring this back to FOLIJET.
| ||
Notes from previous meetings... |
Upcoming meetings/agenda topics: --
Chat: