2023-03-13 Reporting SIG Meeting notes

Date

Attendees

 Present?

Name

Organization

Present?

Name

Organization

xArthur AguileraUniversity of Colorado, BoulderxEliana LimaFenway Library Organization
xSharon BeltaineCornell University
Eric LuhrsLehigh University

Erin BlockUniversity of Colorado, BoulderxLinda MillerCornell University

Nancy BolducCornell University
Nassib NassarIndex Data
xShannon BurkeTexas A&MxElena O'MalleyEmerson

Suzette CanedaStanford University
Tod OlsonUniversity of Chicago

Lloyd ChittendenMarmot
Jean PajerekCornell University

Tim DannayMount Holyoke CollegexKimberly PamplinTexas A&M University 
xAxel DoerrerUniversity Mainz
Scott PerryUniversity of Chicago


Shelley DoljackStanford University
Natalya PikulikCornell University
xStefan DombekLeipzig University
Bob ScheierHoly Cross

Jennifer EustisU. Massachusetts Amherst / Five College
Vandana ShahCornell University

Lynne ForsWellesley College
Linnea ShiehStanford University

Lisa FurubottenTexas A&M
Clare SpitzerStanford University

Alissa HafeleStanford UniversityxAmelia SuttonU. Massachusetts

Kara HartWellesley College
Simona TabacaruTexas A&M

Corrie HutchinsonIndex Data
Huey-Ning Tan

Stanford University

xJamie JesanisWellesley College
Vitus TangStanford University

Jeanette KalchikStanford University
Irina TrapidoStanford University

Kevin KishimotoStanford University
Catherine TuohyEmmanuel College
xIngolf Kusshbz
Kevin WalkerThe University of Alabama

Alexander LaoStanford UniversityxAngela ZossDuke University

Joanne LearyCornell UniversityGuestJason CreelTexas A&M University



GuestJeremy HuffTexas A&M University



GuestJason RootTexas A&M University



xKimberly Smith?

Discussion Items

Item

Who

Notes

Attendance & NotesSharon

Attendance & Notes

  • Today's attendance-taker: Linda (or substitute)
  • Today's note-takers:  Team Leads for project updates

Announcements /
Reminders

Sharon

Upcoming meeting topics

  • March 16: ACQ-ERM Data Model Training Part 1 (focus on ACQ)
  • March 27: ACQ-ERM Data Model Training Part 2 (focus on ERM and areas that overlap between ACQ and ERM)
  • April 6 and 10: Onboarding/re-onboarding meetings


Updating Directory of Reporting SIG Members 

Our directory of Reporting SIG members is out of date. Please take a moment to update/add/fix entries:


BugFest Testers for LDP reporting app needed

  • BugFest is open until March 17


Any new members?

  • Welcome/introductions


How to find our latest recordings


(Always) Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.


Texas A&M's FOLIO Reporting Environment Demonstration/Presentation Jeremy Huff James Creel Jason Root
  • Have utilized workflows heavily (Camunda)
    • Jeremy presented at WOLFCon
  • Also expose reports to end user through a custom dashboard
  • DBeaver, Superset
  • MIS home-grown dashboard interface
    • Have put together a PHP application to host reports that need to be consumed by end users
    • it's behind SSO, and users have various roles and levels of access; can differentiate between users, and administrators can elevate users
    • reports are bespoke, and then can be launched through the dashboard
    • some depend on a workflow, and clicking just launches the workflow and sends an email
    • can specify parameters
    • experience from report to report can vary
    • workflow might run a cron job that gets written to disk and can then be downloaded through interface
    • try to indicate if it triggers a workflow by including that in the name
    • dashboard groups reports by area (e.g., Acq, Cataloging, etc.)
    • developed to provide a reporting solution similar to what staff had in Voyager, which had built up over a few decades
    • For most reports, much of the data is coming from LDP
  • Superset
    • haven't gone into production with this, have just done some end of year reports
    • since the custom dashboard is not very standardized, thought Superset might smooth that out a bit
    • in Superset, create a dataset, which can either be a SQL query or an uploaded dataset
    • Superset can cache the data, so performance is good
    • With the dataset specified, can build charts, and then those can be grouped into a dashboard
    • Can build tabular views of the data and expose those through a public URL
    • Could use the existing home-grown dashboard to point to Superset
    • Could also be cool to open up Superset so other staff could be building charts and dashboards from the datasets without any help from developers
    • One wrinkle could be launching workflows; not sure if those can be launched from Superset 
  • CloudBeaver
    • Some people want direct access to data, so they are running CloudBeaver (read-only instances, so not too worried about opening up access)
    • Developers can use that to test out a query, and then when it's working can pull that into a report for the home-grown interface
    • In addition to CloudBeaver, can also provide access to a VM with DBeaver that has more than just read access, a bit more careful about that
    • Even in CloudBeaver, can limit access for certain people to certain databases, not everyone will see everything
  • Camunda
    • building a workflow
    • Camunda is open source
    • there is also mod-workflow for a generic workflow engine, and then mod-camunda is specific to Camunda
    • other people might be using AirFlow, etc.
    • currently have 17 workflows registered
    • can generate incident reports that show what went wrong if something doesn't work (stack trace, etc.)
    • there is a visual representation of the nodes that are happening
    • If the workflow is slow enough, you can actually watch the graphics to see what is happening right now
    • A workflow can be triggered by a cronjob, but also other triggers
    • workflows are based on BPMN in Camunda
    • mod-workflow is a representation of the workflow, then mod-camunda takes the workflow object and turns it into BPMN so Camunda understands it
    • how would you build a workflow around purchase orders?
    • this is TAMU's most complicated workflow: log into FOLIO, complete tasks to gather data, start a subroutine that will iterate, results in new instances in FOLIO
    • this isn't a report, it's a workflow, but the home-grown dashboard is basically the only logical place to share the workflow trigger
  • When is a report a workflow, rather than just a simple query?
    • Open Orders query runs fast enough and a small enough result set that people should be able to get it from the browser (execute at time of request)
    • Shelflist (Holdings Level) workflow - it takes forever, so it's not a good user experience; have user input an email, and then that and all of the other parameters get handed off to a workflow
    • Workflow is also useful for scheduling reports, even if it's a fast query
  • What kinds of users do you have? Are you limiting the amount of work people have to do to build their own reports?
    • Don't have any pressure to limit, but people do have different appetites for that kind of work; some people really want access to the data, some people are intimidated. Superset seems like it would enable people to build their own reports. With CBeaver, that really creates opportunity for people to build their own reports. And there are many environments, so even if the data is a bit old, people can practice a report in older environments.
    • For people who aren't as comfortable, MIS report site is limited by user role, so they don't see everything necessarily. Don't want to overwhelm people.
  • Rancher 
    • have four FOLIO environments, all have LDP updated nightly by cron jobs governed by Rancher
    • for SysOps, Rancher is the main interaction point
    • have Docker containers
    • can look at the clusters for each, check out resources
    • this represents VMs(?) in data cluster that are tagged to run different parts of the FOLIO puzzle
    • can even drill down to see what artifacts are running - edge, Elasticsearch, Kafka, and also the LDP cron jobs
    • LDP cron jobs are kicked off every night, keep 7 days of history
    • if someone reports a problem, can check whether it kicked off or if some error happened
    • order: LDP data update (LDP platform, runs against FOLIO pre), ldpmarc incremental, derived tables update
    • not everyone is comfortable with command lines, so this is a nice visual interface, can do things with buttons
    • can even view the log for previous execution inside the process list
    • as it's running you can see a graph of how many resources it's using
    • some artifacts are coming right out of GitHub, e.g., Docker container
    • derived tables are homegrown, do build their own images to select the SQL queries; have their own container registry for that, not public
    • if a module dies, we have a backup module that can do what needs to be done
    • FOLIO has 90-something modules now
    • LDP production is an actual VM, but all others are containerized
    • if you are a developer and want to check in on the box, can click on the container and see how heavily hit it is, look at CPU utilization, memory utilization, disk I/O, etc. can also shell into it like you would with a terminal, could send a psql command straight to it. don't normally have to dive in this way, they can use CBeaver or DBeaver, but if those are down, someone can still get in and check things out.
    • also useful to check on ownership, which is important for who will have access through the other interfaces.
    • just using public instances for PostgreSQL (crunchy data systems); use those for FOLIO production instances as well, over ansible
    • pre-production is scaled the same way as production and is running the same number of modules, so it's a good test of what impact a report is going to have on a system
    • librarians also like to log into pre to kick off something to see if data important/export is really going to have the right result before they do it in prod
    • same is true for reports and workflows, just check with pre
    • mod-comunda and mod-workflow, those are docker artifacts running right alongside FOLIO in the cluster
    • mod-data-mig handled migration from Voyager to FOLIO, also handled nightly user updates from campus system
    • logs help to troubleshoot things like network interruptions
  • Any testing with multi tenant? how to set it up with LDP?
    • TAMU is single tenant, so haven't gotten to test that much
    • sometimes start by setting up DIKU, but don't connect that to a LDP
    • do use multi tenancy standards anyway, so everything is correctly formatted by tenant, so would be compatible with multi-tenancy but don't test it
  • LDP Query Builder app?
    • check it out every so often to see its features, but results set limitation to 1000 records is an issue, so while it seems like a good idea, still waiting for more advanced features
    • also have reports that need to pull data from multiple data sources that isn't supported
    • think it's a good idea though, and everything that can be brought into FOLIO sphere is a great idea
    • if it could build the queries we need to run, would consider using
  • Any change for Metadb?
    • Not sure yet, excited to find this out
    • Do expect to need to rework things; could impact all of the reports
    • Might be a good opportunity to make a switch to Superset, since other things will be changing as well
    • Nassib has been very helpful finetuning LDP databases so they are as optimized as possible, but even with that it takes each LDP about a half a day to update
  • K8s jobs - only for import? or also database? or is that part of crunchy PG cluster?
    • LDP is separate from crunchy PG cluster; it's not as mission critical if it goes down
    • the LDP dbs are backed up, but not as highly available as FOLIO
    • data update cron jobs are just LDP update jobs; take data from running instance of FOLIO and puts it in LDP database... all handled by cron
    • Jeremy's workflows that use cron, those are independent from K8s cluster; they tend to live on the dev and test machines, they have a joint developer VM with code and ability to kick of cron jobs
    • no workflow crons live in FOLIO cluster, but they could
    • one long-term goal is to move FOLIO-related bits of code all into the same organizing "pane of glass"


Recurring Items (Updated weekly, but not always discussed in meeting)

ItemWhoNotes
Review of In-Progress Projects (Recurring)
Review the release notes for FOLIO Analytics, LDP1, LDLite, LDP Reporting App, ldpmarc, Metadb Projects (Recurring)
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts Projects (Recurring)Community & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.


RA/UM Working Group


MM Working Group

  • Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata. MM notes
  • We have begun working on the MM Data Model for folio. We are using as a these slides for our work.
  • We are looking for a convenor for the group. If interested, contact Jennifer Eustis on LDP/Metadb slack.


ERM Working Group

  • Current topics
  • Meetings are bi-weekly on tuesdays 11am ET alternating with ACQ Working Group
    • Next meeting will be at 14th, March
    • Contact Stefan Dombek if you would like to get a calendar invitation


ACQ Working Group


Reporting SIG Documentation Subgroup

  • Morning Glory documentation is live on https://docs.folio.org/docs/
  • Nolana documentation is in review
  • Orchid documentation will be in progress soon, and plans are underway to include beta-level documentation for Metadb
  • Additional Context
    • The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub)


External Statistics Working Group

  • no updates currently
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


D-A-CH Working Group (D-Reporting)

  • Group for reporting in Germany
  • Current topics
    • The onboarding training has just started
    • DBS statistics
      • formulate a goal / visions / milestones
      • Overview of work already done (Issues etc.)
  • Meetings are currently held by appointment
    • Next meeting will be at 27th, March (1 pm, CET)
    • Contact Stefan Dombek if you would like to get a calendar invitation


Product Council


For all recent work on FOLIO Reporting SQL development: