2021-11-08 Reporting SIG Meeting notes

2021-11-08 Reporting SIG Meeting notes

Date

Nov 8, 2021

Attendees

Present?

Name

Organization

Present?

Name

Organization

Present?

Name

Organization

Present?

Name

Organization

x

Arthur Aguilera

University of Colorado, Boulder

x

Linda Miller

Cornell University

x

Sharon Beltaine

Cornell University

 

Nassib Nassar

Index Data

 

Nancy Bolduc

Cornell University

x

Elena O'Malley

Emerson

x

Lloyd Chittenden

Marmot

x

Tod Olson

University of Chicago

x

Axel Doerrer

University Mainz

x

Jean Pajerek

Cornell University

x

Shelley Doljack

Stanford University

x

Michael Patrick

The University of Alabama

x

Stefan Dombek

Leipzig University

x

Eric Pennington

Texas A&M

x

Jennifer Eustis

U. Massachusetts Amherst / Five College

x

Scott Perry

University of Chicago

x

Alissa Hafele

Stanford University

x

Natalya Pikulik

Cornell University

 

Ingolf Kuss

hbz

x

Vandana Shah

Cornell University

 

Kim Laine

Cornell University

x

Amelia Sutton

U. Massachusetts

 

Joanne Leary

Cornell University

x

Simona Tabacaru

Texas A&M

x

Eliana Lima

Fenway Library Organization

 

Kevin Walker

The University of Alabama

x

Eric Luhrs

Lehigh University

x

Angela Zoss

Duke University

Discussion Items

Item

Who

Notes

Item

Who

Notes

Attendance & Notes

Angela

Attendance & Notes

  • Today's attendance-taker: Linda Miller

  • Today's note-takers:  Team Leads for project updates

Announcements /
Reminders

Angela

Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.

Mermaid diagram software for query documentation

Stefan

  • https://mermaid-js.github.io/mermaid/#/

  • "Mermaid" or "Mermaid.js"

  • under MIT license, so open source program

  • can copy the program from GitHub

  • can use the program to create graphics and diagrams using short scripts

  • good opportunity to document derived table - don't have to create graphics with graphic program, just type in script and the program generates the diagrams automatically

  • can create flow charts, sequence diagrams, gantt diagrams, class diagrams, entity relationship diagrams

  • can use it as a command line interface program, but can also use the API by adding a script to your HTML document (API is probably better than command line, has more diagrams)

  • Also a live editor: https://mermaid-js.github.io/mermaid-live-editor (no installation required)

  • after adding JS files to HTML, just need to create a div with the class "mermaid"

  • similar tools:

  • could manage all of this on github

  • documentation for ER diagrams: https://mermaid-js.github.io/mermaid/#/entityRelationshipDiagram

  • what could we use this for? how is this different from SchemaSpy

    • automated vs. manual

  • could it be automated?

    • especially as we go to MetaDB, is automated even a useful starting point?

  • should we be really trying to prioritize this work?

    • maybe try to do a pilot with a small area?

  • where to host the diagrams? confluence vs. GitHub?

  • possibility for automation: https://dev.to/sualeh/how-to-generate-mermaid-diagrams-for-your-database-33bn

Updates and Query Demonstrations from Various Reporting Related Groups and Efforts

Community & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group

 

MM Working Group

  • At our last lab session,

    • We saw Cornell's HathiTrust report for serials. The entire process used 9 temporary tables to final create the final report. Data was being pulled from MARC SRS records. A lot of exceptions had to be taken into account. One issue raised about the template queries for the community was that institutions implement metadata slightly differently. Those in metadata tend to want to see institutional reports rather than templates to see how someone else ran the report and to get ideas on how to implement local metadata practices.

    • We helped out with a query to get back a range of instance HRIDs that had an error. The query that needed help was pulling the instance HRID from the public.srs.marctab (SM) as an integer using this where SM.instance.hrid :: INTEGER between '15077113' and '15077535'. This worked for another query at that person's institution. The difference was the query that worked pulled the instance HRIDs from the public.inventory_instances table. We suggested that the person join the srs.marctab to the inventory_instances table to construct the where with the instance HRIDs from the inventory_instances table. The query worked with this change. We noted that there is some inconsistency in FOLIO and this is something to watch out for.

 

  • The zoom link for the meeting is https://umass-amherst.zoom.us/j/99254861410. Usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata.

 

ERM Working Group

  • On going investigation on extracting ERM data with LDlite

    • will try to rebuild existing derived tables with LDlite

    • might show results next week

  • Started work on report of agreements by cancellation dates

  • Investigated mermaid as tool for dta model documentation

  • ERM Prototype and Query Development Status

  • Bi-weekly schedule (next meeting 9th, November 9:00 AM ET)

    • Zoomlink (password is the common zoom password in the FOLIO community)

 

RM Working Group

  • subscription costs and subscription counts queries PR posted to folio-analytics

  • exploring ways to make it easier to find options for reporting parameters in your LDP (e.g., a list of locations, item statuses, etc.)

  • several queries for RM completed, but still need documentation, testing, and review

  • for latest updates, see RM Prototype and Query Development Status

 

Reporting SIG Documentation Subgroup

  • No changes

  • Additional Context

 

External Statistics Working Group

  • no updates currently

  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository

  • New organizational structure for External Statistics reports

    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas

    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository

    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues

    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters

 

Product Council

 

 

For all recent work on FOLIO Reporting SQL development:

 

Advanced topic: querying non-Roman scripts

Stefan

  • content of database usually stored in unicode or UTF8

  • want to query, e.g., chinese symbols directly in query, can write them in the query

  • if you are looking for whitespace in string, you can use hexadecimal or byte level representation

  • can use functions to transform strings: encode, decode, convert_from

    • look at official postgresql documentation for these functions

  • sometimes something looks like whitespace but isn't, can't just tell by looking

  • DBeaver has a hex view (or bytea?)

    • can click on a result in the data table and see the expression for each character

  • can look in UTF-8 and unicode tables to find the expression for each symbol

  • difficult to do this on byte level, easier at hexidecimal level because the tables of the expressions are easy to find on the internet

  • if you have to save the query file, save as UTF-8 and not ASCII

  • Documentation: Non latin symbols

Topics for Future Meetings

All

  • Follow-up on MARC status, Quickmarc/Data Import conflicts

  • How to strengthen connections to SIGs and their developers to be kept in the loop about changes to the data model

  • Show and tell

    • how are institutions using the LDP

    • examples of using the local schema

    • Cornell's report ticketing system

    • Rollout plans from institutions

    • Ask someone on the sysadmin side to talk about LDP administration (Jason Root?)

    • What is done in JIRA? (JIRA clean up)

  • Training topics

    • adding test data in FOLIO snapshot

    • How to do ad hoc querying with the derived tables

    • How to use the LDP app

    • using KNIME to build reports (LDP edition)

    • use of local schema for custom tables

    • more on MARC (ask Jennifer)

    • using different applications (other than DBeaver)

    • Insomnia for API queries?

    • SQL

    • LDP to metadb

    • non-Roman script querying (Stefan)

    • Metadb implications

      • if schema changes, will that be relatively seamless? old fields still in history, but new fields in current?

      • if fields are deprecated and removed, what happens if they come back again?

      • how will deleted records show up?

    • GitHub

    • Panorama

    • LDLite (again?)

  • Discussion:

    • how to rank clusters with institutional rankings going away?

    • consortia SIG is talking about a central office app of some kind; that app might need to deal with consortia-wide reporting

    • how to train:

      • wiki pages with examples? form a small project team? (e.g., how to pull from marc, how to use local schema)

      • each dev team create training about the data structures in that area?

    • Query style (ask Nassib)

  • Upcoming:

    • Nov 8: querying non-Roman scripts (Stefan)

    • query optimization (Axel?)

 

Review and update Topics for Future Reporting SIG Meetings 

 

 

 

 

A test Action Item (Ingolf)