2018-09-10 Reporting SIG notes

Date

Attendees

Present?NameOrganizationPresent?NameOrganization
XSharon BeltaineCornell University
Peter MurrayIndex Data

Elizabeth BerneyDuke University
Erin NettifeeDuke University

Joyce ChapmanDuke University
Karen NewberyDuke University

Elizabeth EdwardsUniversity of ChicagoXTod OlsonUniversity of Chicago
XClaudius Herkt-JanuschekSUB HamburgXScott PerryUniversity of Chicago
XDoreen HeroldLehigh University
Robert SassQulto

Anne L. HighsmithTexas A&MXSimona TabacaruTexas A&M

Vince BareauEBSCO
Mark VekslerEBSCO

Harry KaplanianEBSCOXKevin WalkerThe University of Alabama
XIngolf Kusshbz
Charlotte WhittIndex Data

Lina LakhiaSOASX

Michael Winkler

OLE
XJoanne LearyCornell University
Uschi KluteGBV
XMichael PatrickThe University of AlabamaXHolly MistlebauerCornell University
XNassib NassarIndex DataXAngela Zoss

Duke University

 XVeit KöppenUniversity Magdeburg




Discussion items

ItemWhoNotes
Assign Notetaker, Take Attendance, Review agendaSharon

Today's notetaker: Ingolf

Last week's notetaker: Tod Olson, Sharon Beltaine


Welcome New MembersAngela, Veit

Please welcome our news Reporting SIG members, Angela Zoss from Duke University and Veit Köppen from the Library at the University Magdeburg.

-introductions, background

Angela is the Assessment and Data Visualization Analyst in the Assessment and User Experience department at Duke University Libraries. She has been working on Data Visualization for many years.

Veit has been working at the University of Magdeburg for 4 years. He is an expert and lecturer on Data Warehouse Technologies. He is also the Head of IT Applications at the University Libraries.

Data Warehouse ArchitectureNassib

Nassib will walk the Reporting SIG through "A Library Data Platform Architecture for FOLIO, which provides an outline for the requirements, architecture, and implementation of an environment to support the extensive data analysis and reporting needs of institutions implementing FOLIO. This provides the next step after the groundwork laid in our planning for a reference data warehouse environment. Nassib would like your feedback, so please bring your questions.

Meeting Notes

Nassib provides an overview of the Library Data Platform Architecture for FOLIO . The architecture is divided into FOLIO Core, the Data Analysis Platform and the Reporting Tool. In addition, there might be external data sources which can stream data into the platform.

An important conecpt (of the microservice architecture) is that not all data are stored in a single location.

There is transaction processing versus analytical processing. For transaction processing, operational data are fragmented in fairly granular databases. This is very responsive and quick. It is realized via database indexes. Analytical processing, in contrast, is done for Reporting. One is interested in a set of columns, thus looks at columns in the records. Analytical processing competes resource contention and is very slow.

Most library data are highly structured, suitable for databases. One needs an ETL-extraction for them. The concern about Referential Inegrity has to be taken into account, because of multiple database storages.

There are two types of data extraction (for reporting) from the storage modules. One type is Batch ETL for extracting data from the storage modules in batch mode. Batch ETL will also be used to extract external data. The other one is Streaming ETL. This will stream the data that come up in operational transactions (in Okapi) through a Message Queue, do some kind of ETL on them and feed the result into the Reporting Database. Streaming ETL can be time-consuming to implement. Maintanance is also cost-efficient. The development team will need more developers to do this.

"Star Schema" is an example of analytical processing. Another idea is to use columns stores; this is more suitable for analytical processing.

Discussion / Concerns

  • (Veit) Where (in the diagram) is a metadata repository ? This will be needed for a data warehouse solution.
  • (Michael) It needs a resolution step. Data can be encoded - dereferencing is needed. – This will be part of the ETL process. We like the data unfragmented at this point.
  • (Sharon) Where dores modeling of the processes and doing the actual transformations occurs ?


Report Prototyping UpdateReport Prototype Workgroup

To support the initial steps for development of a reporting data warehouse environment, a small workgroup has formed to prototype some simple reports in the functional areas of loans, inventory, and users. As part of this effort, the workgroup has mapped and diagrammed the data elements required for 2 circulation reports with the assistance of Emma Boettcher, PO for Loans, and Charlotte Whitt, PO for Inventory. This effort lays the groundwork for the Reporting SIG to begin prototyping additional reports to support the development required for our future reporting environment. Members of the workgroup will describe the steps taken to create these initial report prototypes. Feedback and questions are encouraged as we step through this process.

Meeting Notes

Joanne, Charlotte and Emma worked on an operational structure in which we can develop the prototypes. They first focused on loans, inventory and users. They present the first two prototypes to the group. These are two basic types of reports:

  1. Circulation Detail Report
  2. Item Detail Report

See the documents in the Google drive folder of the Report Prototypes Group !

The Circulation Detail Report has owning library, item-ID, charge date and patron group as selection fields. One needs to look at the Loan Rules, these capture much information.

The Item Detail Report needs many details from Inventory. It lists titles which have been changed in a specific date range.

We need to get the data for these reports in the JSON schema and the RAML definitions to build a data dictionary (Tod). The schemas and definitions need to be updated by the developers.

Disucssion about de-referencing data

Concern (Tod): "I expect a direct link to the patron itself, not just a link to the loan rule. It would be cleaner if we didn't have to parse the loan rule. The loan transaction has to have a reference to a patron and to an item."

Response (Joanne): "A circ matrix ID links to the loan rule; it creates a link to the patron group.The item ID is recorded with the Circ Transaction directly."

We may want to identify the candidates for stable reference data. It would be nice to have a fair amount of data that needs not be de-referenced.


Topics for Future MeetingsAll

Review and update Topics for Future Reporting SIG Meetings

-during our September 17 meeting, Holly Mistlebauer will walk us through the upload of our data warehouse report information into the FOLIO JIRA system

Other Topics?AllAny other topics to discuss today?

Action items

  •