2023-03-13 Reporting SIG Meeting notes

Date

13 Mar 2023

Attendees

Present?	Name	Organization	Present?	Name	Organization
x	Arthur Aguilera	University of Colorado, Boulder	x	Eliana Lima	Fenway Library Organization
x	Sharon Beltaine	Cornell University		Eric Luhrs	Lehigh University
	Erin Block	University of Colorado, Boulder	x	Linda Miller	Cornell University
	Nancy Bolduc	Cornell University		Nassib Nassar	Index Data
x	Shannon Burke	Texas A&M	x	Elena O'Malley	Emerson
	Suzette Caneda	Stanford University		Tod Olson	University of Chicago
	Lloyd Chittenden	Marmot		Jean Pajerek	Cornell University
	Tim Dannay	Mount Holyoke College	x	Kimberly Pamplin	Texas A&M University
x	Axel Doerrer	University Mainz		Scott Perry	University of Chicago
	Shelley Doljack	Stanford University		Natalya Pikulik	Cornell University
x	Stefan Dombek	Leipzig University		Bob Scheier	Holy Cross
	Jennifer Eustis	U. Massachusetts Amherst / Five College		Vandana Shah	Cornell University
	Lynne Fors	Wellesley College		Linnea Shieh	Stanford University
	Lisa Furubotten	Texas A&M		Clare Spitzer	Stanford University
	Alissa Hafele	Stanford University	x	Amelia Sutton	U. Massachusetts
	Kara Hart	Wellesley College		Simona Tabacaru	Texas A&M
	Corrie Hutchinson	Index Data		Huey-Ning Tan	Stanford University
x	Jamie Jesanis	Wellesley College		Vitus Tang	Stanford University
	Jeanette Kalchik	Stanford University		Irina Trapido	Stanford University
	Kevin Kishimoto	Stanford University		Catherine Tuohy	Emmanuel College
x	Ingolf Kuss	hbz		Kevin Walker	The University of Alabama
	Alexander Lao	Stanford University	x	Angela Zoss	Duke University
	Joanne Leary	Cornell University	Guest	Jason Creel	Texas A&M University
			Guest	Jeremy Huff	Texas A&M University
			Guest	Jason Root	Texas A&M University
			x	Kimberly Smith	?

Discussion Items

Item

Who

Notes

Attendance & Notes

Sharon

Attendance & Notes

Today's attendance-taker: Linda (or substitute)
Today's note-takers: Team Leads for project updates

Announcements /
Reminders

Sharon

Upcoming meeting topics

March 16: ACQ-ERM Data Model Training Part 1 (focus on ACQ)
March 27: ACQ-ERM Data Model Training Part 2 (focus on ERM and areas that overlap between ACQ and ERM)
April 6 and 10: Onboarding/re-onboarding meetings

Updating Directory of Reporting SIG Members

Our directory of Reporting SIG members is out of date. Please take a moment to update/add/fix entries:

See entries on the Reporting SIG Home page (scroll down to get to the list of names)

BugFest Testers for LDP reporting app needed

BugFest is open until March 17

Any new members?

Welcome/introductions

How to find our latest recordings

Recording details available on our Reporting SIG Meetings and Notes page and Reporting SIG Subteam Meetings page

(Always) Recruiting New Query Developers

The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.

Texas A&M's FOLIO Reporting Environment Demonstration/Presentation

Jeremy Huff James Creel Jason Root

Have utilized workflows heavily (Camunda)
- Jeremy presented at WOLFCon
Also expose reports to end user through a custom dashboard
DBeaver, Superset
MIS home-grown dashboard interface
- Have put together a PHP application to host reports that need to be consumed by end users
- it's behind SSO, and users have various roles and levels of access; can differentiate between users, and administrators can elevate users
- reports are bespoke, and then can be launched through the dashboard
- some depend on a workflow, and clicking just launches the workflow and sends an email
- can specify parameters
- experience from report to report can vary
- workflow might run a cron job that gets written to disk and can then be downloaded through interface
- try to indicate if it triggers a workflow by including that in the name
- dashboard groups reports by area (e.g., Acq, Cataloging, etc.)
- developed to provide a reporting solution similar to what staff had in Voyager, which had built up over a few decades
- For most reports, much of the data is coming from LDP
Superset
- haven't gone into production with this, have just done some end of year reports
- since the custom dashboard is not very standardized, thought Superset might smooth that out a bit
- in Superset, create a dataset, which can either be a SQL query or an uploaded dataset
- Superset can cache the data, so performance is good
- With the dataset specified, can build charts, and then those can be grouped into a dashboard
- Can build tabular views of the data and expose those through a public URL
- Could use the existing home-grown dashboard to point to Superset
- Could also be cool to open up Superset so other staff could be building charts and dashboards from the datasets without any help from developers
- One wrinkle could be launching workflows; not sure if those can be launched from Superset
CloudBeaver
- Some people want direct access to data, so they are running CloudBeaver (read-only instances, so not too worried about opening up access)
- Developers can use that to test out a query, and then when it's working can pull that into a report for the home-grown interface
- In addition to CloudBeaver, can also provide access to a VM with DBeaver that has more than just read access, a bit more careful about that
- Even in CloudBeaver, can limit access for certain people to certain databases, not everyone will see everything
Camunda
- building a workflow
- Camunda is open source
- there is also mod-workflow for a generic workflow engine, and then mod-camunda is specific to Camunda
- other people might be using AirFlow, etc.
- currently have 17 workflows registered
- can generate incident reports that show what went wrong if something doesn't work (stack trace, etc.)
- there is a visual representation of the nodes that are happening
- If the workflow is slow enough, you can actually watch the graphics to see what is happening right now
- A workflow can be triggered by a cronjob, but also other triggers
- workflows are based on BPMN in Camunda
- mod-workflow is a representation of the workflow, then mod-camunda takes the workflow object and turns it into BPMN so Camunda understands it
- how would you build a workflow around purchase orders?
- this is TAMU's most complicated workflow: log into FOLIO, complete tasks to gather data, start a subroutine that will iterate, results in new instances in FOLIO
- this isn't a report, it's a workflow, but the home-grown dashboard is basically the only logical place to share the workflow trigger
When is a report a workflow, rather than just a simple query?
- Open Orders query runs fast enough and a small enough result set that people should be able to get it from the browser (execute at time of request)
- Shelflist (Holdings Level) workflow - it takes forever, so it's not a good user experience; have user input an email, and then that and all of the other parameters get handed off to a workflow
- Workflow is also useful for scheduling reports, even if it's a fast query
What kinds of users do you have? Are you limiting the amount of work people have to do to build their own reports?
- Don't have any pressure to limit, but people do have different appetites for that kind of work; some people really want access to the data, some people are intimidated. Superset seems like it would enable people to build their own reports. With CBeaver, that really creates opportunity for people to build their own reports. And there are many environments, so even if the data is a bit old, people can practice a report in older environments.
- For people who aren't as comfortable, MIS report site is limited by user role, so they don't see everything necessarily. Don't want to overwhelm people.
Rancher
- have four FOLIO environments, all have LDP updated nightly by cron jobs governed by Rancher
- for SysOps, Rancher is the main interaction point
- have Docker containers
- can look at the clusters for each, check out resources
- this represents VMs(?) in data cluster that are tagged to run different parts of the FOLIO puzzle
- can even drill down to see what artifacts are running - edge, Elasticsearch, Kafka, and also the LDP cron jobs
- LDP cron jobs are kicked off every night, keep 7 days of history
- if someone reports a problem, can check whether it kicked off or if some error happened
- order: LDP data update (LDP platform, runs against FOLIO pre), ldpmarc incremental, derived tables update
- not everyone is comfortable with command lines, so this is a nice visual interface, can do things with buttons
- can even view the log for previous execution inside the process list
- as it's running you can see a graph of how many resources it's using
- some artifacts are coming right out of GitHub, e.g., Docker container
- derived tables are homegrown, do build their own images to select the SQL queries; have their own container registry for that, not public
- if a module dies, we have a backup module that can do what needs to be done
- FOLIO has 90-something modules now
- LDP production is an actual VM, but all others are containerized
- if you are a developer and want to check in on the box, can click on the container and see how heavily hit it is, look at CPU utilization, memory utilization, disk I/O, etc. can also shell into it like you would with a terminal, could send a psql command straight to it. don't normally have to dive in this way, they can use CBeaver or DBeaver, but if those are down, someone can still get in and check things out.
- also useful to check on ownership, which is important for who will have access through the other interfaces.
- just using public instances for PostgreSQL (crunchy data systems); use those for FOLIO production instances as well, over ansible
- pre-production is scaled the same way as production and is running the same number of modules, so it's a good test of what impact a report is going to have on a system
- librarians also like to log into pre to kick off something to see if data important/export is really going to have the right result before they do it in prod
- same is true for reports and workflows, just check with pre
- mod-comunda and mod-workflow, those are docker artifacts running right alongside FOLIO in the cluster
- mod-data-mig handled migration from Voyager to FOLIO, also handled nightly user updates from campus system
- logs help to troubleshoot things like network interruptions
Any testing with multi tenant? how to set it up with LDP?
- TAMU is single tenant, so haven't gotten to test that much
- sometimes start by setting up DIKU, but don't connect that to a LDP
- do use multi tenancy standards anyway, so everything is correctly formatted by tenant, so would be compatible with multi-tenancy but don't test it
LDP Query Builder app?
- check it out every so often to see its features, but results set limitation to 1000 records is an issue, so while it seems like a good idea, still waiting for more advanced features
- also have reports that need to pull data from multiple data sources that isn't supported
- think it's a good idea though, and everything that can be brought into FOLIO sphere is a great idea
- if it could build the queries we need to run, would consider using
Any change for Metadb?
- Not sure yet, excited to find this out
- Do expect to need to rework things; could impact all of the reports
- Might be a good opportunity to make a switch to Superset, since other things will be changing as well
- Nassib has been very helpful finetuning LDP databases so they are as optimized as possible, but even with that it takes each LDP about a half a day to update
K8s jobs - only for import? or also database? or is that part of crunchy PG cluster?
- LDP is separate from crunchy PG cluster; it's not as mission critical if it goes down
- the LDP dbs are backed up, but not as highly available as FOLIO
- data update cron jobs are just LDP update jobs; take data from running instance of FOLIO and puts it in LDP database... all handled by cron
- Jeremy's workflows that use cron, those are independent from K8s cluster; they tend to live on the dev and test machines, they have a joint developer VM with code and ability to kick of cron jobs
- no workflow crons live in FOLIO cluster, but they could
- one long-term goal is to move FOLIO-related bits of code all into the same organizing "pane of glass"

Recurring Items (Updated weekly, but not always discussed in meeting)

Item

Who

Notes

Review of In-Progress Projects (Recurring)

Build a directory of extracted and derived tables
- update coming (Feb 27)
Build Metadb query cookbook-style documentation
- has been opened and sent to the development teams for planning/prioritization
- haven't settled on a style exactly, but have some examples
Build Training Program for FOLIO Data Model
- expecting ACQ/ERM presentation in March
Port LDP1 derived tables to Metadb
- expecting to finish for Orchid release
Recruit from new institutions not represented
- waiting to see about FOLIO-wide onboarding efforts
- recommend that the development teams reach out to SIGs for an update, invite new members

Review the release notes for FOLIO Analytics, LDP1, LDLite, LDP Reporting App, ldpmarc, Metadb Projects (Recurring)

LDP1 release notes
Metadb release notes (via tags)
LDLite release notes (via tags)
ldpmarc release notes
LDP Reporting App release notes (mod-ldp, ui-ldp) and JIRA issues (mod-ldp, ui-ldp)
FOLIO Analytics release notes

Updates and Query Demonstrations from Various Reporting Related Groups and Efforts Projects (Recurring)

Community & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries. Each week, these groups will share reports/queries with the Reporting SIG. Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group

Meetings have become more of a lab session, working through specific problems
Contact Angela if you would like to join these meetings; second Tuesdays at 1pm Eastern
Context
- Meeting notes: https://docs.google.com/document/d/1UnzG64tl917LOH2FtWhCEPlSOsnJWKL-0eu88Ouo1DU/edit
- Current status of RA/UM issues: https://github.com/folio-org/folio-analytics/issues?q=is%3Aissue+is%3Aopen+RA%2FUM
- Status of LDP1 → Metadb querying porting: https://docs.google.com/spreadsheets/d/1efLc6QMJyGuXyQM26pZbti-B6wFhJuZt7s7FJbhdXYo/edit#gid=0

MM Working Group

Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata. MM notes
We have begun working on the MM Data Model for folio. We are using as a these slides for our work.
We are looking for a convenor for the group. If interested, contact Jennifer Eustis on LDP/Metadb slack.

ERM Working Group

Current topics
- Reporting for FOLIO Apps
  - Open Access
  - eUsage
  - eHoldings
- FOLIO Data Model Training
Meetings are bi-weekly on tuesdays 11am ET alternating with ACQ Working Group
- Next meeting will be at 14th, March
- Contact Stefan Dombek if you would like to get a calendar invitation

ACQ Working Group

group is working with ERM group on ACQ-ERM FOLIO Data Model Training
group is working on ACQ-related (Acquisitions) derived tables for the next Folio-Analytics release
- see ACQ Derived Tables to Convert from LDP1 to Metadb
for latest query development updates, see RM Prototype and Query Development Status
Meetings are biweekly on Tuesdays 11am-noon ET; contact Sharon Markus if you would like a calendar invitation
Meeting Recordings are here: FOLIO ERM-RM Report Dev Teams

Reporting SIG Documentation Subgroup

Morning Glory documentation is live on https://docs.folio.org/docs/
Nolana documentation is in review
Orchid documentation will be in progress soon, and plans are underway to include beta-level documentation for Metadb
Additional Context
- The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub)

External Statistics Working Group

no updates currently
new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
New organizational structure for External Statistics reports
- external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
- these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
- institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
- each reporting development team will take responsibility for the queries in their area for the external statistics clusters

D-A-CH Working Group (D-Reporting)

Group for reporting in Germany
Current topics
- The onboarding training has just started
- DBS statistics
  - formulate a goal / visions / milestones
  - Overview of work already done (Issues etc.)
Meetings are currently held by appointment
- Next meeting will be at 27th, March (1 pm, CET)
- Contact Stefan Dombek if you would like to get a calendar invitation

Product Council

(December 8 meeting) PC shared a report of the SIG check-ins
would love for PC to check-in yearly

For all recent work on FOLIO Reporting SQL development:

https://github.com/folio-org/folio-analytics/commits/main