2021-06-07 Reporting SIG Meeting notes

Date

Attendees

Present?

Name

Organization

Present?

Name

Organization

XSharon BeltaineCornell UniversityXTod OlsonUniversity of Chicago

Nancy BolducCornell University
Jean PajerekCornell University
XAxel DoerrerUniversity Mainz
Michael PatrickThe University of Alabama
XStefan DombekLeipzig UniversityXEric PenningtonTexas A&M
XJennifer EustisU. Massachusetts Amherst / Five College
Scott PerryUniversity of Chicago

X

Ingolf Kusshbz
Natalya PikulikCornell University

Jesse LambertsonUniversity of ChicagoXVandana ShahCornell University
XEliana LimaFenway Library Organization
Amelia SuttonU. Massachusetts
XLinda MillerCornell UniversityXSimona TabacaruTexas A&M
XNassib NassarIndex DataXKevin WalkerThe University of Alabama
XElena O'MalleyEmersonXAngela ZossDuke University

Discussion Items

Item

Who

Notes

Attendance & NotesAngela

Attendance & Notes

  • Today's attendance-taker: Linda Miller
  • Today's note-takers:  Team Leads for project updates

Announcements /
Reminders

Angela

Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.


Test Data

FOLIO Reporting developers and Reporting SIG members are encouraged to use this new page to share test data cases they have entered into FOLIO reference environments to keep us all aware of what data to expect when we test our queries.


Cluster Ranking

New Report Clusters are added on a regular basis, so it is important to make sure your institution is reviewing these clusters and ranking them to establish report development priorities. If you rank reports for your institution, please follow the instructions below. If someone else ranks, please pass this information along to that person so your institution's vote can be included.

  • Action =>> Please review Reporting SIG-All Report Clusters (57 issues) in JIRA and RANK each report cluster for your institution (R1-R5)
  • For reporting, institutions only need to rank the UXPROD Report Cluster JIRA issues. All reporting requirements, which are captured in REP-XXX issues, roll up to the UXPROD Report Clusters. Report clusters cover one or more report (REP-XXX issue) requirements.
Updates and Query Demonstrations from Various Reporting Related Groups and EffortsCommunity & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.


Development


Reporting SIG Documentation Subgroup

  • Request for documentation review:

The Reporting SIG subgroup on documentation has just completed a draft of end-user documentation (that is, documentation for novice FOLIO users) focused on reporting. This documentation will eventually be hosted at https://docs.folio.org/docs/, but first we are looking for feedback from our stakeholders.

Between now and Friday, June 11, we would appreciate it if members of the FOLIO reporting community could review the draft documentation. We are looking especially for errors, gaps, and places where the explanations are confusing. Where there is an example workflow, we would appreciate help testing the workflow to make sure the instructions are correct across systems.

Some things to note about this documentation:

        • We do not want to duplicate documentation that is directly connected with our query repository or the LDP software, as that documentation is (and should be) maintained on GitHub.
        • We generally do not want to point end-users to the Reporting SIG wiki, which should be restricted to SIG business
        • At this time, we are documentation the Honeysuckle release and the versions of our repositories that connect most closely to that release. Features released more recently will not show up in this version of the documentation, but as we begin developing the documentation Iris and future versions of FOLIO, those newer features will start showing up.
        • We are trying to follow the Google developer documentation style guide, so we may not be able to incorporate feedback about document style that does not conform to that guide.
        • At this time, the documentation is meant to be predominantly text based, but you can make a note if there are places where additional images would be helpful.

To provide feedback on the documentation, read through the any or all of the six documents and use the commenting and suggesting features in Google Docs to add notes or propose changes. Each document has a table of contents at the bottom to facilitate easy navigation between docs.

We are excited to hear both from people who know the systems well and people who have less direct experience with reporting. I hope you’ll be able to take some time to review this material. Let me know if you have any questions!


  • Additional Context


RA/UM Working Group


MM Working Group

  • Those writing queries will do this on their own. The group will not be meeting to give more time to those going live.
  • We are working on these reports:
    • OCLC synchronization report
    • Duplicate orders reports
    • HathiTrust reports


ERM Working Group


RM Working Group

  • no meeting last week
  • working on queries for the 1.2 FOLIO Analytics release
    • voucher_lines_with_po_line and voucher_summary
    • reorganizing REP issues from the RM-Costs Cluster into more specific report clusters
    • subscriptions cost and subscriptions count
    • more with invoices and orders
    • more with funds
  • for latest updates, see RM Prototype and Query Development Status


External Statistics Working Group

  • no updates currently
  • ACRL query set included in FOLIO-Analytics 1.1 release
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


For all recent work on FOLIO Reporting SQL development:


Updates from WOLFcon

All
  • Recordings available
  • Session highlights
    • Question: what is the difference between LDP and metadb and pre-metadb?
      • Some confusion about using LDP for project, software, and app. Metadb is a totally new codebase, based on real-time streaming. So, shift to metadb will be a big shift from LDP software that is currently used to create the database tied to FOLIO Snapshot. Pre-metadb was just used for the WOLFCon talk.
      • LDP app - yes, might be nice to rename that, but haven't had time to worry about that yet. "app" shows that it's an app that runs inside FOLIO or ReShare. Better name might be LDP Query Builder app. 
      • Maybe when metadb comes out, we can do a presentation; maybe even have a page somewhere that explains the difference
      • Did add metadb and LDP into the new FOLIO glossary
      • After metadb, will we have to query LDP for physical circulation and metadb for ERM?
      • No, everything will be in metadb, but ERM have already been using metadb because they need the new system. There is some additional JSON transformation code that needs to be implemented in metadb before it makes sense to move other groups over
      • As we transition to metadb, will there be a lot of query transformation?
      • We do need to start working through that; have some decisions to make. Hopefully, though, people could be running LDP and metadb alongside each other if institutions need to have a gradual transition
      • If we are building derived tables for our own purposes, will there be ways for those derived tables to take advantage of the streaming aspect of metadb? Would that be support for views?
      • Derived tables now are run overnight (or as often as you want), and right now the tables are dropped and rebuilt completely (just like the rest of LDP). Real-time updates give you the ability to begin using streaming data, but you might not be able to immediately transition to real-time data for everything. In metadb, there are some things that might supplant the derived tables, like less need for array extraction and other JSON query techniques. But for queries that need real-time data, you could focus on rewriting those for metadb or have certain derived tables that update more frequently. There are still concerns with views, so probably better to just run queries at time of need or have smaller derived tables. Can use CTEs (subqueries) to simulate a view inside a query. But hopefully some of the derived tables will go away soon because they're only extracting arrays from a JSON object.
      • Looking forward could see something where you define a normalization by describing it to metadb and then metadb could use that to create a derived table it updates in real time.
      • Some kind of plug-in to the metadb replication logic?
      • Something analogous to the transformations metadb runs in real time, a transformation plug-in. Maybe denormalizing derived tables could be generated by metadb. If JSON transformation takes care of arrays and more of the fields in general and, increasingly, denormalization, that covers a lot of our current derived tables.
      • Could potentially create a plug-in that just has access to that incoming stream and does other things with it.
      • Current LDP software is C++, metadb is in Go
      • What about new input sources?
      • If we install it today, we can add FOLIO as a source (FOLIO database is streaming data to metadb), can also add ReShare source to same database; can you add other source? yes, can even do that today. right now you can stream in any PostgreSQL database, probably any MySQL database. (CC+ also looking at using metadb.) If you have a PostgreSQL database, can pull that in at near-real-time. Other databases, could add support. Apart from that, would use the Kafka API, but we'll have a more specific specification for how you can take data and put it onto the Kafka stream and have the data come into metadb and be parsed correctly. 
      • Importing Excel files? Importing legacy data?
      • Legacy data - metadb has concept of tracking tables; the tables it tracks are the ones it updates. You can create other tables and schemas in the same database, and as long as metadb isn't tracking them, there's no conflict, they'll just live alongside the tracked tables. Just have to be careful about naming scheme. FOLIO schemas start with folio_, and ReShare schemas start with reshare_. Sysadmin could do large scale data imports. (Maybe put legacy data in their own schema.)
      • For bringing in a simple Excel file, like a list of barcodes, could be a good fit for LDP app. Maybe needs permissions to control who can do CSV import. 
      • Maybe could have two different features - one full CSV import that saves in the LDP/metadb, another CSV import that is a temporary import that just pipes into a query to filter by a temporary list. The next question there would be can it hook into a query in another FOLIO app.
      • For legacy data migration, should we wait for metadb?
      • No real need to do that. Custom tables will just sit alongside other data, should work same for LDP and metadb. All the design choices about that can just happen locally. 
      • Maybe also think about permissions. The default permissions in current LDP software, there is one user by default. Metadb does something similar, but in metadb someone will need to be managing permissions a bit more. For legacy data, probably want to protect it so it can't be messed up accidentally. Turn off write permissions.
      • Can we have permissions that are limited to certain tables? For example, person 1 can see the data from users, but person 2 cannot.
      • By default, it's a shared workspace. On a table, you can tell the database system who can SELECT, who can INSERT, who can UPDATE, who can DELETE. Can do something similar for schemas: usage permission, update/create. Default setup is that if you log in as default ldp user, user can read and write in local schema but everything else is read only. Maybe create two different categories of users who have different permissions. But current code doesn't have sophisticated permissions management. Would have a periodically run a script to update permissions. 
      • Could have default LDP user and two additional. If one users creates new tables, others wouldn't be able to see those tables.
  • Side conversations about LDP/reporting alternatives
    • Vendors who aren't supporting LDP hosting
    • Non-LDP reporting tools
      • ERM dashboard
      • ByWater
      • Google Sheets add-on
Topics for Future MeetingsAll
  • Alternate FOLIO reporting systems
    • Google Sheets add-on (June 14)
    • ByWater system (June 28)
  • Rollout plans from institutions (keep asking for volunteers)
  • Follow-up on MARC status, Quickmarc/Data Import conflicts
  • Reporting SIG representation on Support SIG
  • How-Tos:
    • adding test data in FOLIO snapshot
  • Cornell's report ticketing system
  • Ask someone on the sysadmin side to talk about LDP administration (Jason Root?)
  • Show and tell - how are institutions using the LDP
  • Interest in training?
    • How to do ad hoc querying with the derived tables
    • How to use the LDP app


Review and update Topics for Future Reporting SIG Meetings 





  • A test Action Item (Ingolf)