2018-09-10 Reporting SIG notes

Date

10 Sep 2018

Attendees

Present?	Name	Organization	Present?	Name	Organization
X	Sharon Beltaine	Cornell University		Peter Murray	Index Data
	Elizabeth Berney	Duke University		Erin Nettifee	Duke University
	Joyce Chapman	Duke University		Karen Newbery	Duke University
	Elizabeth Edwards	University of Chicago	X	Tod Olson	University of Chicago
X	Claudius Herkt-Januschek	SUB Hamburg	X	Scott Perry	University of Chicago
X	Doreen Herold	Lehigh University		Robert Sass	Qulto
	Anne L. Highsmith	Texas A&M	X	Simona Tabacaru	Texas A&M
	Vince Bareau	EBSCO		Mark Veksler	EBSCO
	Harry Kaplanian	EBSCO	X	Kevin Walker	The University of Alabama
X	Ingolf Kuss	hbz		Charlotte Whitt	Index Data
	Lina Lakhia	SOAS	X	Michael Winkler	OLE
X	Joanne Leary	Cornell University		Uschi Klute	GBV
X	Michael Patrick	The University of Alabama	X	Holly Mistlebauer	Cornell University
X	Nassib Nassar	Index Data	X	Angela Zoss	Duke University
X	Veit Köppen	University Magdeburg

Discussion items

Item	Who	Notes
Assign Notetaker, Take Attendance, Review agenda	Sharon	Today's notetaker: Ingolf Last week's notetaker: Tod Olson, Sharon Beltaine
Welcome New Members	Angela, Veit	Please welcome our news Reporting SIG members, Angela Zoss from Duke University and Veit Köppen from the Library at the University Magdeburg. -introductions, background Angela is the Assessment and Data Visualization Analyst in the Assessment and User Experience department at Duke University Libraries. She has been working on Data Visualization for many years. Veit has been working at the University of Magdeburg for 4 years. He is an expert and lecturer on Data Warehouse Technologies. He is also the Head of IT Applications at the University Libraries.
Data Warehouse Architecture	Nassib	Nassib will walk the Reporting SIG through "A Library Data Platform Architecture for FOLIO, which provides an outline for the requirements, architecture, and implementation of an environment to support the extensive data analysis and reporting needs of institutions implementing FOLIO. This provides the next step after the groundwork laid in our planning for a reference data warehouse environment. Nassib would like your feedback, so please bring your questions. Meeting Notes Nassib provides an overview of the Library Data Platform Architecture for FOLIO . The architecture is divided into FOLIO Core, the Data Analysis Platform and the Reporting Tool. In addition, there might be external data sources which can stream data into the platform. An important conecpt (of the microservice architecture) is that not all data are stored in a single location. There is transaction processing versus analytical processing. For transaction processing, operational data are fragmented in fairly granular databases. This is very responsive and quick. It is realized via database indexes. Analytical processing, in contrast, is done for Reporting. One is interested in a set of columns, thus looks at columns in the records. Analytical processing competes resource contention and is very slow. Most library data are highly structured, suitable for databases. One needs an ETL-extraction for them. The concern about Referential Inegrity has to be taken into account, because of multiple database storages. There are two types of data extraction (for reporting) from the storage modules. One type is Batch ETL for extracting data from the storage modules in batch mode. Batch ETL will also be used to extract external data. The other one is Streaming ETL. This will stream the data that come up in operational transactions (in Okapi) through a Message Queue, do some kind of ETL on them and feed the result into the Reporting Database. Streaming ETL can be time-consuming to implement. Maintanance is also cost-efficient. The development team will need more developers to do this. "Star Schema" is an example of analytical processing. Another idea is to use columns stores; this is more suitable for analytical processing. Discussion / Concerns (Veit) Where (in the diagram) is a metadata repository ? This will be needed for a data warehouse solution. (Michael) It needs a resolution step. Data can be encoded - dereferencing is needed. – This will be part of the ETL process. We like the data unfragmented at this point. (Sharon) Where dores modeling of the processes and doing the actual transformations occurs ?
Report Prototyping Update	Report Prototype Workgroup	To support the initial steps for development of a reporting data warehouse environment, a small workgroup has formed to prototype some simple reports in the functional areas of loans, inventory, and users. As part of this effort, the workgroup has mapped and diagrammed the data elements required for 2 circulation reports with the assistance of Emma Boettcher, PO for Loans, and Charlotte Whitt, PO for Inventory. This effort lays the groundwork for the Reporting SIG to begin prototyping additional reports to support the development required for our future reporting environment. Members of the workgroup will describe the steps taken to create these initial report prototypes. Feedback and questions are encouraged as we step through this process. Meeting Notes Joanne, Charlotte and Emma worked on an operational structure in which we can develop the prototypes. They first focused on loans, inventory and users. They present the first two prototypes to the group. These are two basic types of reports: Circulation Detail Report Item Detail Report See the documents in the Google drive folder of the Report Prototypes Group ! The Circulation Detail Report has owning library, item-ID, charge date and patron group as selection fields. One needs to look at the Loan Rules, these capture much information. The Item Detail Report needs many details from Inventory. It lists titles which have been changed in a specific date range. We need to get the data for these reports in the JSON schema and the RAML definitions to build a data dictionary (Tod). The schemas and definitions need to be updated by the developers. Disucssion about de-referencing data Concern (Tod): "I expect a direct link to the patron itself, not just a link to the loan rule. It would be cleaner if we didn't have to parse the loan rule. The loan transaction has to have a reference to a patron and to an item." Response (Joanne): "A circ matrix ID links to the loan rule; it creates a link to the patron group.The item ID is recorded with the Circ Transaction directly." We may want to identify the candidates for stable reference data. It would be nice to have a fair amount of data that needs not be de-referenced.
Topics for Future Meetings	All	Review and update Topics for Future Reporting SIG Meetings -during our September 17 meeting, Holly Mistlebauer will walk us through the upload of our data warehouse report information into the FOLIO JIRA system
Other Topics?	All	Any other topics to discuss today?

2018-09-10 Reporting SIG notes

[data-colorid=y81ry7v7lb]{color:#333333} html[data-color-mode=dark] [data-colorid=y81ry7v7lb]{color:#cccccc}Date

Attendees

Discussion items

Action items

Date