2018-03-05 Reporting SIG Notes

Date

05 Mar 2018

Attendees

Present?	Name	Organization	Present?	Name	Organization
X	Vince Bareau	EBSCO	X	Matt Reno	EBSCO
X	Sharon Beltaine	Cornell University		John McDonald	EBSCO
	Elizabeth Berney	Duke University		Peter Murray	Index Data
	Ginny Boyer	Duke University		Erin Nettifee	Duke University
	Joyce Chapman	Duke University	X	Karen Newbery	Duke University
	Elizabeth Edwards	University of Chicago	X	Tod Olson	University of Chicago
	Claudius Herkt-Januschek	SUB Hamburg	X	Scott Perry	University of Chicago
X	Doreen Herold	Lehigh University		Robert Sass	Qulto
X	Anne L. Highsmith	Texas A&M		Simona Tabacuru	Texas A&M
	Filip Jakobsen	Index Data	X	Mark Veksler	EBSCO
X	Harry Kaplanian	EBSCO	X	Kevin Walker	The University of Alabama
X	Ingolf Kuss	hbz		Charlotte Whitt	Index Data
	Lina Lakhia	SOAS		Michael Winkler	Cornell University
X	Joanne Leary	Cornell University		Christine Wise	SOAS
X	Michael Patrick	The University of Alabama	X	Chris Creswell	Lehigh

Discussion items

Item	Who	Notes
Assign Notetaker, Take Attendance, Review agenda	Sharon Markus	Previous Notetaker: Sharon Beltaine Today's Notetaker: Tod Olson
POC Data Lake	Matt Reno, Tod Olson, Anne L. Highsmith, Mark Veksler, Christopher Creswell	update on the progress of the Proof-of-Concept Data Lake project test data loaded into Folio? documenting the process? report in BIRT? what questions do the Data Lake Proof of Concept Recommendations need to answer? other issues? Christopher Creswell has written a BIRT report directly against the FOLIO data, found it pretty easy. Cross-referencing inventory, circulation domains. SQL query, special Postrgres syntax to deal with the JSON columns from Postgres dumps. Requires direct SQL connection. In hosted environment, would also need to host the BIRT tool? VBar: this illustrates the sort of reporting we can achieve with in-app reporting. Some caution about the SQL approach: there's no guarantee that the underlying structures will not change, especially if try to scale to cross-domain. One of the promises of a microservices approach is rapid development facilitated by the isolation, but that means data models may change quickly. This is the direction we should go to explore reporting, but remember that the guarantee is at the API level, not the data model. Data Lake demo: Goals: Create Data Lake on AWS load sample transactional data from FOLIO to Data Lake (stretch goal) Load live transactiona data Produce similar report as above through a different method. Using Kenesis data stream to capture data from FOLIO into Data Lake bucket. Visulalization provided by Amazon's Athena plus a service called Glue to access that data. All using serverless technology on AWS, does not require standing up an application. (Could extend this design to use different specific technolgies) Demo by Matt Reno. Current PoC adds a service to Okapi to route transaction information to a data collecter. Runs script to generate loans, can watch in the Okapi logs. AWS tools have some buffering, so it takes about a minute for data to be available. Run a tool to process the data stream. Can use BIRT to run similar report to above. Some data not available for de-reference, will need to use There's a line over what goes to the data lake Conclusions: Transactiona data can be extracted in real time BIRT can integrate wiht Data lake to produce reports will need enhancement to resolve UUIDs resolved repotr can be producted direct as in-app report, if all data availabel in the Domain Does not impact existing apps Filtering and partitioning of Data is require tenant context avoid security leaks, GDPR privacy concerns minimize data size, optimize the transactions of interest a similar mechnism could be used for auditing e.g. capture operational errors in audit Next steps: Develop more permanant mechanisms to enable data extraction form Okapi support selective transactions for reporting (separately) support transactions for auditing do not extract chatter traffic Develop generatl purpose Stream Source from FOLIO (eg. Message Queue) allow to avoid data buffering in case of temporary disconnect Avoids hardwire to Kinesis system and allow other back ends Develop External APIs for FOLIO to support resolution of Identifiers (UUIDs) for reports Role is to make internal information available for consumption by tools like BIRT Hide internal handshakes, etc. from outside tools that do not need it. Will have to be implemented in modules or data domains VBar will create JIRAs and stories for: Mechanism in Okapi to extract data Implement Message Queue for the stream to the data lake Supplement with APIs for external systems to resolve APIs These steps will allow support for different technologies for Data Lakes, not hardwired to AWS Kinesis. Sharon Markus Seems like next steps should include geting access to such an environment where we can collaboratively build testing specific reports, to build out a testing plan. Would be helpful to send a set of use cases to the developers so they can figure out how the development can proceed.
Tags Subgroup of RM SIG	Sharon	Anne-Marie Breaux has formed a Tags subgroup and is looking for a representative from the Reporting SIG to join. Who can attend?
Reporting Tools	Ingolf Kuss	Reporting Tools used in Germany
Future Topics	Sharon Markus	Topics for Future Reporting SIG Meetings

2018-03-05 Reporting SIG Notes

Date

Attendees

Discussion items

Action items