2022-10-24 Reporting SIG Meeting notes

Date

24 Oct 2022

Attendees

Present?	Name	Organization	Present?	Name	Organization
x	Arthur Aguilera	University of Colorado, Boulder		Eric Luhrs	Lehigh University
	Sharon Beltaine	Cornell University	x	Linda Miller	Cornell University
	Erin Block	University of Colorado, Boulder		Nassib Nassar	Index Data
	Nancy Bolduc	Cornell University	x	Elena O'Malley	Emerson
	Shannon Burke	Texas A&M		Tod Olson	University of Chicago
	Suzette Caneda	Stanford University	x	Jean Pajerek	Cornell University
	Lloyd Chittenden	Marmot		Michael Patrick	The University of Alabama
x	Tim Dannay	Mount Holyoke College		Eric Pennington	Texas A&M
x	Axel Doerrer	University Mainz		Scott Perry	University of Chicago
	Shelley Doljack	Stanford University	x	Natalya Pikulik	Cornell University
	Stefan Dombek	Leipzig University	x	Vandana Shah	Cornell University
	Jennifer Eustis	U. Massachusetts Amherst / Five College		Linnea Shieh	Stanford University
x	Lisa Furubotten	Texas A&M		Clare Spitzer	Stanford University
	Alissa Hafele	Stanford University		Amelia Sutton	U. Massachusetts
	Jeanette Kalchik	Stanford University		Simona Tabacaru	Texas A&M
	Kevin Kishimoto	Stanford University		Huey-Ning Tan	Stanford University
	Ingolf Kuss	hbz		Vitus Tang	Stanford University
	Joanne Leary	Cornell University		Irina Trapido	Stanford University
x	Eliana Lima	Fenway Library Organization		Kevin Walker	The University of Alabama
	Alexander Lao	Stanford University	x	Angela Zoss	Duke University

Discussion Items

Item	Who	Notes
Attendance & Notes	Angela	Attendance & Notes Today's attendance-taker: Linda (or substitute) Today's note-takers: Team Leads for project updates
Announcements / Reminders	Angela	Announcement about Item State Architecture working group: From Erin Nettifee: Hi all - I am spinning up a small group to work on requirements for two features - https://folio-org.atlassian.net/browse/UXPROD-1530 and https://folio-org.atlassian.net/browse/UXPROD-1590 - meant to establish the other two parts of the item state architecture. I want to be clear that I am not taking over as the item state PO - I don't have the bandwidth to do that permanently, and I really only have capacity for the next 3-5 months to do more PO work - if a more permanent PO steps forward, I will gladly hand this over. I also want to be clear that there is no development team for this work, so all we are doing is working on requirements, and we have no guarantee as to when development might happen. But there are very few requirements for either jira right now, so no matter what hopefully we can make it so that when resources do appear they don't have to start the work from scratch. The commitment is likely to be something like a bi-weekly meeting plus additional ad-hoc work depending on the topic, for a limited period of time (the next 3-5 months.) Feature Freeze Coming Up For FOLIO Analytics Finishing up query work for Nolana by November 7 How to find our latest recordings Recording details available on our Reporting SIG Meetings and Notes page and Reporting SIG Subteam Meetings page Recent recordings of interest: Thursday, October 20: SIG meeting includes both Foliage demo from Mike Hucka and LDLite demo from Angela (Always) Recruiting New Query Developers The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.
Follow-up on SysOps conversation about LDP Hosting		Notes from meeting, recording Angela's Notes Wayne from Index Data presented (slides) Questions about refreshes, backups Index Data is running on Amazon RDS for hosting, which includes disaster recovery (7 days of snapshots) EBSCO also using Amazon AWS, RDS Postgres; has come up with a list of strong and weak points on hosting Pros: pretty reliable process; if you set it up the way you want, it will work until you want to change something very quick and good support from developers (thanks to Nassib!) Cons: data transfer time gets longer as data in FOLIO increases will need more resources as FOLIO database grows, since LDP will also grow - will need to upgrade instance type of VM to let everything work properly silent failures are also a problem - don't have good and detailed logging when problems happen (have had to ask Nassib for help, couldn't find all of the information about logging in the documentation) currently using container approach, and all of the logs are stored inside the container, so had to implement additional functionality to store logs on host instead to read logs documentation in ldp1 repo is mostly oriented to developers, so it's not so easy/fast to find necessary information if you're not familiar with the system, don't necessarily know if you need to do something yourself or the system already supports it there is no automatic recovery - when the process fails, you need to re-run it manually incremental update takes more time than a full reload, especially when there are a lot of updates or data were imported from another source into FOLIO process for database takes too long - 14 hours for database with 10 million inventory records Response from Index Data ID uses container log aggregation (cloud watch, or elastic), so the logs don't go away when the container goes away ID not seeing 14 hour update times for 10 million instances; LDP update for largest customer (8.5 million instance records) takes 3.5-4 hours; ldpmarc is quick (10 minutes); derived tables another 4 hours 14 hours doesn't seem normal, and people shouldn't expect that, especially if you are running with network proximity between the FOLIO and LDP databases EBSCO: the 14 hours is LDP1, ldpmarc, and derived tables all together. It depends on the customer's dataset. One customer has a dataset that is similar to Cornell's but the LDP update is shorter by 2-3 hours. TAMU: run all of the LDP processes as cron jobs on Kubernetes (have our own container for FOLIO analytics); seeing times similar to Index Data; incremental is quick; full update is 8-10 hours, for 4.5 million records; production database is standalone Postgres VM with dedicated storage, only use it for FOLIO reporting. Times are similar to ID. Don't have as much memory on production server, only 12-16 GB, but it is 4 core CPU. Very interesting that you can't get ldpmarc incremental updates working. We used to do LDP1, then derived tables, then ldpmarc, but seems to be more reliable to do LDP1, then ldpmarc, then derived table. Use FluentD for logs, which then get pushed to Splunk. Previously, logs were going to file and had to change that to go to standard error/out so that can go to Splunk. Don't have alerts set right now for when those jobs fail, haven't had time yet, so each morning just check them all manually. For production stuff, rarely get a failed job. When I do, it's usually just a mistake from a recent TAMU developer change. Sometimes we would lose connection between the LDP and FOLIO, but the admin guide has advice and new versions have a fix for that. Security concerns: if you don't have things set up securely, sure, but otherwise... not sure what is meant by security concerns For some enterprise institutions that rely on SSO, that's not supported, it's just regular Postgresql security, but that is pretty battle tested, so it's more of an integration problem than a security problem TAMU: we make sure the LDP IP address is only accessible from certain subsets, our K8s network is namespace isolated... don't do any data anonymizing, and we need that data. We have a lot of power users in various departments that like to build their own reports. We've set up Cloud Beaver instances that use read-only users, and we have that set up for SSO. They log in using their library account, they only have access to the things they should see based on their roles. From a security standpoint, no, there isn't SSL all the way down to the schema, but we take care of that by controlling access. And FOLIO has the same issue. Different Cloud Beavers for: production, dev/test, FOLIO database. But for heavier stuff, there are reporting workflows in the workflow engine which does scheduled queried against the LDP and sends data to a website where results can be downloaded. We use SSO for that, so it's also restricted to specific people. Testing/upgrades: LDP1 and ldpmarc are both fairly independent of FOLIO releases, new versions come out on their own cadence. What we have for all tenants is production and staging. When there is a new version of LDP1 or ldpmarc, will implement that in staging. Don't necessarily invite users to test since upgrades might be minor, but if it's more than a point upgrade, might invite users to test. Then follow that with upgrade of production. Invoke upgrade with new command, and then next time you run an update, you'll get the new version. For ldpmarc, usually need to run a full update, but the release notes say whether you need that. FOLIO Analytics is tied to FOLIO releases, so we tie that to the FOLIO flower release. The FOLIO testing should include LDP1 as well. TAMU: pretty much the same. Only ask for users to look at upgrade test for new FOLIO Analytics upgrades. Don't update the FOLIO systems very frequently - every 6 months to a year. Don't update the LDP much either, just do everything at once, including testing. EBSCO: wait until flower release to update LDP1 and ldpmarc unless there is a specific problem with current setup. Ask others if the errors are solved by a new version, and if so might update. Usually have special job that is called "LDP engine" which includes all three components, and use that Docker image for all setups for current flower release. Do smoke tests for current LDP version. Check repos to find the current versions. Build new Docker image. Do smoke test - how does it work, do we get errors, etc. If it works fine, create recommendations for the upcoming flower release - we should use these three versions of the components. Use them until next FOLIO release. Additional accolades for Nassib! Very responsive, active about documenting issues that are coming up. Containers are awesome, save a lot of time. Also appreciate the help from Nassib on queries. SIG Discussion The good news is that none of this is really new; security concerns are not valid or shared by the SysOps group This is all specific to LDP1, but some of the security discussions will be the same for Metadb, so we've already heard about the best practices for that Did Index Data express concern that they are doing more work than others? so far seems like they are employing a standard OSS model of free software, pay for hosting; how can we support their visibility as a hosting provider? At Cornell, we decided that Folio and LDP updates should not happen at the same time. No one at the hosting service has time to look at LDP issues when they are focused on making sure the app upgrade is fine. We learned this the hard way. SIG could help compile information into a Tips and Tricks for Hosting LDP1 wiki page? could be written for people who are not super technical, as opposed to the actual software documentation
Review of In-Progress Projects		Build a directory of extracted and derived tables TBD Build Training Program for FOLIO Data Model Subteams making progress Investigate LDP Hosting Support Follow-up (above) Port LDP derived tables to Metadb Subteams making progress, but we are behind on reviews Recruit from new institutions not represented Have been reviewing the list of implementers; next step will be to identify contacts and reach out Review reporting channels on FOLIO Slack Proposal approved by Nassib, actually thinks we might be able to transition sysadmin channel as well Made posts in these channels to announce the changes, deadline of Friday, Oct 28 for feedback
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts	Community & Coordination, Reporting Subgroup Leads	Project updates Reporting development is using small subgroups to address priorities and complete work on report queries. Each week, these groups will share reports/queries with the Reporting SIG. Reporting development team leads are encouraged to enter a summary of their work group activities below. RA/UM Working Group Meetings have become more of a lab session, working through specific problems Contact Angela if you would like to join these meetings; second Wednesdays at 1pm Eastern Context Meeting notes: https://docs.google.com/document/d/1UnzG64tl917LOH2FtWhCEPlSOsnJWKL-0eu88Ouo1DU/edit Current status of RA/UM issues: https://github.com/folio-org/folio-analytics/issues?q=is%3Aissue+is%3Aopen+RA%2FUM Status of LDP → Metadb querying porting: https://docs.google.com/spreadsheets/d/1efLc6QMJyGuXyQM26pZbti-B6wFhJuZt7s7FJbhdXYo/edit#gid=0 MM Working Group Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata. We are still looking for reviewers and testers for ldpmarc. We have moved our Slack channel to the LDP Slack Workspace. ERM Working Group FOLIO Data Model Training (in progress) ERM Query Development Status Coming soon: Open Access Report requests are being collected in the OA SIG Link The test environment from the SIG Reporting will get the app OA Test data will imported Meetings are bi-weekly on tuesdays 11am ET alternating with RM Working Group Next meeting will be at 25th, Oct Contact Stefan Dombek if you would like to get a calendar invitation RM Working Group group is working on RM-related derived tables for Metadb and adding comments to columns for folio-analytics 1.5 see RM Derived Tables to Convert from LDP to Metadb for latest query development updates, see RM Prototype and Query Development Status Meetings are biweekly on Tuesdays 11am-noon ET; contact Sharon Beltaine if you would like a calendar invitation Meeting Recordings are here: FOLIO ERM-RM Report Dev Teams Reporting SIG Documentation Subgroup Lotus documentation is live on https://docs.folio.org/docs/ Morning Glory documentation is complete and submitted Nolana documentation will be in progress soon Additional Context The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub) External Statistics Working Group no updates currently new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository New organizational structure for External Statistics reports external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues each reporting development team will take responsibility for the queries in their area for the external statistics clusters Product Council For all recent work on FOLIO Reporting SQL development: https://github.com/folio-org/folio-analytics/commits/main
Topics for Future Meetings	All	How to deal with External Stats reports? maybe subteam leads check in about that probably wait until after Metadb conversion is more complete ask for presenters: hosting experiences from implementers Annual Reporting Goals (in progress) Support the transition from LDP to Metadb (e.g., update derived table and report queries, update documentation, outreach, new training) (ready to start) Developing training/onboarding for new SIG members/report users (esp. FOLIO-specific data model and transformation stuff) Improve communication between SIG and developers of apps so we hear about data model changes in advance continued advocacy on part of the SIG to governance groups (ready to start) Review JIRA issues, clean up, revisit strategy for JIRA Regular review of Milestones Exploring new recruitment/onboarding strategies (e.g., buddy system) Demo latest version of LDP app, any new features? Training: Using APIs More work on asynchronous collaboration, how to engage in discussions and question answering more broadly consider connecting discuss.folio.org with a Slack channel, to make sure any forum topics get highlighted on Slack as well? Open question: should we update FOLIO LDP1-based Reporting First Implementers Grid Review and update Topics for Future Reporting SIG Meetings

A test Action Item (Ingolf)