2021-11-08 Reporting SIG Meeting notes

Date

Attendees

Present?

Name

Organization

Present?

Name

Organization

xArthur AguileraUniversity of Colorado, BoulderxLinda MillerCornell University
xSharon BeltaineCornell University
Nassib NassarIndex Data

Nancy BolducCornell UniversityxElena O'MalleyEmerson
xLloyd ChittendenMarmotxTod OlsonUniversity of Chicago
xAxel DoerrerUniversity MainzxJean PajerekCornell University
xShelley DoljackStanford UniversityxMichael PatrickThe University of Alabama

x

Stefan DombekLeipzig UniversityxEric PenningtonTexas A&M
xJennifer EustisU. Massachusetts Amherst / Five CollegexScott PerryUniversity of Chicago
xAlissa HafeleStanford UniversityxNatalya PikulikCornell University

Ingolf KusshbzxVandana ShahCornell University

Kim LaineCornell UniversityxAmelia SuttonU. Massachusetts

Joanne LearyCornell UniversityxSimona TabacaruTexas A&M
xEliana LimaFenway Library Organization
Kevin WalkerThe University of Alabama
xEric LuhrsLehigh UniversityxAngela ZossDuke University

Discussion Items

Item

Who

Notes

Attendance & NotesAngela

Attendance & Notes

  • Today's attendance-taker: Linda Miller
  • Today's note-takers:  Team Leads for project updates

Announcements /
Reminders

Angela

Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.
Mermaid diagram software for query documentationStefan
  • https://mermaid-js.github.io/mermaid/#/
  • "Mermaid" or "Mermaid.js"
  • under MIT license, so open source program
  • can copy the program from GitHub
  • can use the program to create graphics and diagrams using short scripts
  • good opportunity to document derived table - don't have to create graphics with graphic program, just type in script and the program generates the diagrams automatically
  • can create flow charts, sequence diagrams, gantt diagrams, class diagrams, entity relationship diagrams
  • can use it as a command line interface program, but can also use the API by adding a script to your HTML document (API is probably better than command line, has more diagrams)
  • Also a live editor: https://mermaid-js.github.io/mermaid-live-editor (no installation required)
  • after adding JS files to HTML, just need to create a div with the class "mermaid"
  • similar tools:
  • could manage all of this on github
  • documentation for ER diagrams: https://mermaid-js.github.io/mermaid/#/entityRelationshipDiagram
  • what could we use this for? how is this different from SchemaSpy
    • automated vs. manual
  • could it be automated?
    • especially as we go to MetaDB, is automated even a useful starting point?
  • should we be really trying to prioritize this work?
    • maybe try to do a pilot with a small area?
  • where to host the diagrams? confluence vs. GitHub?
  • possibility for automation: https://dev.to/sualeh/how-to-generate-mermaid-diagrams-for-your-database-33bn
Updates and Query Demonstrations from Various Reporting Related Groups and EffortsCommunity & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group


MM Working Group

  • At our last lab session,
    • We saw Cornell's HathiTrust report for serials. The entire process used 9 temporary tables to final create the final report. Data was being pulled from MARC SRS records. A lot of exceptions had to be taken into account. One issue raised about the template queries for the community was that institutions implement metadata slightly differently. Those in metadata tend to want to see institutional reports rather than templates to see how someone else ran the report and to get ideas on how to implement local metadata practices.
    • We helped out with a query to get back a range of instance HRIDs that had an error. The query that needed help was pulling the instance HRID from the public.srs.marctab (SM) as an integer using this where SM.instance.hrid :: INTEGER between '15077113' and '15077535'. This worked for another query at that person's institution. The difference was the query that worked pulled the instance HRIDs from the public.inventory_instances table. We suggested that the person join the srs.marctab to the inventory_instances table to construct the where with the instance HRIDs from the inventory_instances table. The query worked with this change. We noted that there is some inconsistency in FOLIO and this is something to watch out for.

  • The zoom link for the meeting is https://umass-amherst.zoom.us/j/99254861410. Usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata.


ERM Working Group

  • On going investigation on extracting ERM data with LDlite
    • will try to rebuild existing derived tables with LDlite
    • might show results next week
  • Started work on report of agreements by cancellation dates
  • Investigated mermaid as tool for dta model documentation
  • ERM Prototype and Query Development Status
  • Bi-weekly schedule (next meeting 9th, November 9:00 AM ET)
    • Zoomlink (password is the common zoom password in the FOLIO community)


RM Working Group

  • subscription costs and subscription counts queries PR posted to folio-analytics
  • exploring ways to make it easier to find options for reporting parameters in your LDP (e.g., a list of locations, item statuses, etc.)
  • several queries for RM completed, but still need documentation, testing, and review
  • for latest updates, see RM Prototype and Query Development Status


Reporting SIG Documentation Subgroup

  • No changes
  • Additional Context


External Statistics Working Group

  • no updates currently
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


Product Council



For all recent work on FOLIO Reporting SQL development:


Advanced topic: querying non-Roman scriptsStefan
  • content of database usually stored in unicode or UTF8
  • want to query, e.g., chinese symbols directly in query, can write them in the query
  • if you are looking for whitespace in string, you can use hexadecimal or byte level representation
  • can use functions to transform strings: encode, decode, convert_from
    • look at official postgresql documentation for these functions
  • sometimes something looks like whitespace but isn't, can't just tell by looking
  • DBeaver has a hex view (or bytea?)
    • can click on a result in the data table and see the expression for each character
  • can look in UTF-8 and unicode tables to find the expression for each symbol
  • difficult to do this on byte level, easier at hexidecimal level because the tables of the expressions are easy to find on the internet
  • if you have to save the query file, save as UTF-8 and not ASCII
  • Documentation: Non latin symbols
Topics for Future MeetingsAll
  • Follow-up on MARC status, Quickmarc/Data Import conflicts
  • How to strengthen connections to SIGs and their developers to be kept in the loop about changes to the data model
  • Show and tell
    • how are institutions using the LDP
    • examples of using the local schema
    • Cornell's report ticketing system
    • Rollout plans from institutions
    • Ask someone on the sysadmin side to talk about LDP administration (Jason Root?)
    • What is done in JIRA? (JIRA clean up)
  • Training topics
    • adding test data in FOLIO snapshot
    • How to do ad hoc querying with the derived tables
    • How to use the LDP app
    • using KNIME to build reports (LDP edition)
    • use of local schema for custom tables
    • more on MARC (ask Jennifer)
    • using different applications (other than DBeaver)
    • Insomnia for API queries?
    • SQL
    • LDP to metadb
    • non-Roman script querying (Stefan)
    • Metadb implications
      • if schema changes, will that be relatively seamless? old fields still in history, but new fields in current?
      • if fields are deprecated and removed, what happens if they come back again?
      • how will deleted records show up?
    • GitHub
    • Panorama
    • LDLite (again?)
  • Discussion:
    • how to rank clusters with institutional rankings going away?
    • consortia SIG is talking about a central office app of some kind; that app might need to deal with consortia-wide reporting
    • how to train:
      • wiki pages with examples? form a small project team? (e.g., how to pull from marc, how to use local schema)
      • each dev team create training about the data structures in that area?
    • Query style (ask Nassib)
  • Upcoming:
    • Nov 8: querying non-Roman scripts (Stefan)
    • query optimization (Axel?)


Review and update Topics for Future Reporting SIG Meetings 





  • A test Action Item (Ingolf)