2022-03-17 Reporting SIG Meeting notes

Date

17 Mar 2022

Attendees

Present?	Name	Organization	Present?	Name	Organization
	Arthur Aguilera	University of Colorado, Boulder	x	Linda Miller	Cornell University
x	Sharon Beltaine	Cornell University	x	Nassib Nassar	Index Data
	Erin Block	University of Colorado, Boulder		Elena O'Malley	Emerson
	Nancy Bolduc	Cornell University		Tod Olson	University of Chicago
	Lloyd Chittenden	Marmot		Jean Pajerek	Cornell University
	Axel Doerrer	University Mainz		Michael Patrick	The University of Alabama
	Shelley Doljack	Stanford University		Eric Pennington	Texas A&M
	Stefan Dombek	Leipzig University		Scott Perry	University of Chicago
	Jennifer Eustis	U. Massachusetts Amherst / Five College		Natalya Pikulik	Cornell University
x	Alissa Hafele	Stanford University		Vandana Shah	Cornell University
x	Jeanette Kalchik	Stanford University		Amelia Sutton	U. Massachusetts
x	Kevin Kishimoto	Stanford University		Simona Tabacaru	Texas A&M
	Ingolf Kuss	hbz	x	Huey-Ning Tan	Stanford University
	Kim Laine	Cornell University	x	Irina Trapido	Stanford University
x	Joanne Leary	Cornell University		Kevin Walker	The University of Alabama
	Eliana Lima	Fenway Library Organization	x	Angela Zoss	Duke University
	Eric Luhrs	Lehigh University

Visitor: Mark Arnold, Missouri State University

Absent

Discussion Items

Item	Who	Notes
Attendance & Notes	Angela	Attendance & Notes Today's attendance-taker: ? Today's note-takers: Team Leads for project updates
LDLite demonstration	Mark Arnold	LDLite https://github.com/library-data-platform/ldlite Missouri State first US institution to go live with FOLIO had reports that were generated in previous system that library staff wanted to retain info not really available via API setting up LDLite was something we wanted to do anyway, gave us a chance to do other things are an EBSCO-hosted library, so don't have access to LDP if we wanted to do anything using SQL, LDLite was only option don't mind doing stuff with the APIs, enjoy that, but really like SQL having things in an SQL format gives us some additional options we wouldn't normally have using the variation Nassib mentioned - MSU is taking data from LDLite and putting it into a Postgres database 1 database - complete harvest of all data; not updated often; have 1.4 million bib records, full harvest takes about 33 hours (VM with 4 processors 8 GB RAM, 300 GB storage), including SRS (15 hours just by themselves) set that up first just to see what it was like soon after, librarian needed to do some stats, didn't want them to have all the data, so created a smaller database and only harvested the data they need (collections, circulation) - harvest took about 9 hours have been around libraries since 1993; back in mid-'90s, if you wanted to do a free scripting language, PHP and mySQL was the thing. don't have background in Postgres or Python, but have been able to figure it out. have some scripts that run the harvest - start it end of day, let it run overnight. working on something else, with help from Nassib - want to do incremental updates. want to run a script that only brings changes since the last run of the script. hopefully run in the morning and get the data for working in the afternoon. like the idea of using multiple databases to control access - just don't even include sensitive data in certain databases script using is a variation of the script Nassib has on the GitHub page have added the call into the database have also created a text output; says when it starts and ends, also writes the endpoint and the tables it wrote for that endpoint; could also have it print table structure have DBeaver connected to the VM for running queries; VM is ubuntu server, do everything commandline, so wanted DBeaver for an interface for queries Questions: institutions accessing legacy data through LDP Cornell has saved off data in a Postgres database, hope to bring it into the LDP; have pulled data out of legacy, do have a bunch that has been pulled into the local schema have been pulling requests data from Ares into the LDP to blend with FOLIO have been talking about getting the data from the Voyager archive and using that; would really love that have been talking about getting bits and pieces of the Voyager data separately (it's static, so just have to do it once) getting the data from the legacy system into a format that will work is a big project; would have been nice to bring it in just before implementation during migration, some things may have accidentally gotten skipped, so good to have that backup, would be easier if it was already in LDP hard to query out the circ data and then bring it into the LDP would be easier if it were all there Stanford is likely to go straight to Metadb, which will eventually have a concept of multiple input sources; FOLIO would be one input source, but you could possibly set up the legacy ILS as a second input source, and Metadb could pull everything automatically (or a defined subset); and then if you needed to run both systems at the same time, it could continue to pull data from the legacy system any lessons learned for implementation? not terribly difficult, just install LDLite (one dependency, maybe); had more trouble getting Postgres installed and configured than LDLite originally tried just using LDLite without database, but not great for harvesting absolutely everything; if you are going to use it for a lot of data, recommend go ahead and install Postgres and connect LDLite to that that means you'll have to learn Postgres, at least a little have gotten everything done with Google If you're using LDLite in place of an LDP or Metadb instance, probably makes sense to have a shared database that would mimic an LDP, and use a regular scheduled to have LDLite update the data; probably works better for a smaller library where you can download the data regularly; a larger university probably needs the full LDP and Metadb if Mark can get the incremental updates working, the plan is to write up the script and schedule it to run every night, which would give the users access to yesterday's data just like LDP users what kind of documentation is there? you talk about having different databases that cover different things, how do you share that with users? there's only two people using it right now at MSU, so not too many people who need to know; just shared the database connection details and he went ahead with DBeaver; he doesn't know about the other database he doesn't have access to we're a small library - 45 employees, 3 IT staff, Mark is only one doing FOLIO stuff. documentation is just for him. documentation for other universities Note: we hope to have LDLite page up for the Juniper version of the FOLIO docs soon: https://juniper.docs.folio.org/docs/reporting/ Mark is happy to share the script, just message him on Slack
Announcements / Reminders	Angela	Introductions of everyone Questions Entity-Relationship Diagrams Latest version we have for LDP: https://glintcore.net:8443/ldp/schemaspy/public/relationships.html could generate it specifically for Cornell running it against the FOLIO sample data has been problematic; the sample data has duplicated identifiers, which prevents us from tracking with tables connect to each other so, need to use this approach with real database rather than something like FOLIO snapshot for Metadb there is a new version that works a bit differently Nassib willing to talk to Cornell about trying to help with that still trying to figure out relationships between tables in the various areas hoping to start creating diagrams for derived tables would also be great to have for each functional area/query subteam FOLIO Analytics repository pre-release! https://github.com/folio-org/folio-analytics/releases/tag/v1.3.0-beta5 Commits over the last year: https://github.com/folio-org/folio-analytics/graphs/commit-activity What went well? What should we do stop doing? What should we do moving forward? Recruiting New Query Developers The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts	Community & Coordination, Reporting Subgroup Leads	Project updates Reporting development is using small subgroups to address priorities and complete work on report queries. Each week, these groups will share reports/queries with the Reporting SIG. Reporting development team leads are encouraged to enter a summary of their work group activities below. RA/UM Working Group Meetings have become more of a lab session, working through specific problems Angela has been attending RA SIG meetings to open the lines of communication, now once a month Contact Angela if you would like to join these meetings; second Wednesdays at 1pm Eastern (but for Feb, actually Tuesday the 8th at 2pm Eastern) Context Meeting notes: https://docs.google.com/document/d/1UnzG64tl917LOH2FtWhCEPlSOsnJWKL-0eu88Ouo1DU/edit Current status of RA/UM issues: https://github.com/folio-org/folio-analytics/issues?q=is%3Aissue+is%3Aopen+RA%2FUM MM Working Group Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata. We have submitted our slate of derived tables for Metadb for Lotus R1 2022. We completed 16 derived tables for metadb! We are turning our attention to finish porting over derived tables to metadb. We have a sign up sheet. We are looking for reviewers. In next our lab session, we will use the LDP instance identifiers and go through the process of porting this over to metadb, how to contribute a pull request with the changes and update the runlist.txt, and go through a review. ERM Working Group Fixed data types for foreign keys: #554, #551, #550 Will now refocus on documentation and new derived table developement ERM Prototype and Query Development Status Meetings are bi-weekly on tuesdays 11am ET alternating with RM Working Group Next meeting will be at 15th, March Contact Axel Dörrer if you would like to get a calendar invitation. Mermaid diagrams can now display directly in GitHub markdown! Example RM Working Group group is working on RM-related derived tables for Metadb see RM Derived Tables to Convert from LDP to Metadb for latest query development updates, see RM Prototype and Query Development Status Meetings are biweekly on Tuesdays 11am-noon ET; contact Sharon Markus if you would like to join us Next meeting is Tuesday, March 22 11am-noon ET Reporting SIG Documentation Subgroup Honeysuckle documentation is live on https://docs.folio.org/docs/ Iris documentation is in progress, due December 15 Additional Context The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub) External Statistics Working Group no updates currently new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository New organizational structure for External Statistics reports external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues each reporting development team will take responsibility for the queries in their area for the external statistics clusters Product Council For all recent work on FOLIO Reporting SQL development: https://github.com/folio-org/folio-analytics/commits/main
Demonstration of GitHub Desktop (probably next meeting)	Angela	Software helpers (GitHub Desktop, DBeaver plug in and installation instructions) - Angela
Topics for Future Meetings	All	Next: Reviewing a pull request How to deal with External Stats reports? maybe subteam leads check in about that probably wait until after Metadb conversion is more complete Definitely do GitHub Desktop demonstration at a SIG meeting in future, probably about 15 minutes Query optimization by Nassib: Mar 28 and Apr 7 Review and update Topics for Future Reporting SIG Meetings

A test Action Item (Ingolf)