2018-03-05 Reporting SIG Notes

Date

Attendees

Present?NameOrganizationPresent?NameOrganization
XVince BareauEBSCOXMatt RenoEBSCO
XSharon BeltaineCornell University
John McDonaldEBSCO

Elizabeth BerneyDuke University
Peter MurrayIndex Data

Ginny BoyerDuke University
Erin NettifeeDuke University

Joyce ChapmanDuke UniversityXKaren NewberyDuke University

Elizabeth EdwardsUniversity of ChicagoXTod OlsonUniversity of Chicago

Claudius Herkt-JanuschekSUB HamburgXScott PerryUniversity of Chicago
XDoreen HeroldLehigh University
Robert SassQulto
XAnne L. HighsmithTexas A&M
Simona TabacuruTexas A&M

Filip JakobsenIndex DataXMark VekslerEBSCO
XHarry KaplanianEBSCOXKevin WalkerThe University of Alabama
XIngolf Kusshbz
Charlotte WhittIndex Data

Lina LakhiaSOAS
Michael WinklerCornell University
XJoanne LearyCornell University
Christine WiseSOAS
XMichael PatrickThe University of AlabamaXChris CreswellLehigh

Discussion items

ItemWhoNotes
Assign Notetaker, Take Attendance, Review agenda

Previous Notetaker: Sharon Beltaine

Today's Notetaker: Tod Olson

POC Data Lake

update on the progress of the Proof-of-Concept Data Lake project

Christopher Creswell has written a BIRT report directly against the FOLIO data, found it pretty easy. Cross-referencing inventory, circulation domains. SQL query, special Postrgres syntax to deal with the JSON columns from Postgres dumps. Requires direct SQL connection. In hosted environment, would also need to host the BIRT tool?

VBar: this illustrates the sort of reporting we can achieve with in-app reporting. Some caution about the SQL approach: there's no guarantee that the underlying structures will not change, especially if try to scale to cross-domain. One of the promises of a microservices approach is rapid development facilitated by the isolation, but that means data models may change quickly. This is the direction we should go to explore reporting, but remember that the guarantee is at the API level, not the data model.

Data Lake demo:

Goals:

  • Create Data Lake on AWS
  • load sample transactional data from FOLIO to Data Lake
  • (stretch goal) Load live transactiona data
  • Produce similar report as above through a different method.

Using Kenesis data stream to capture data from FOLIO into Data Lake bucket. Visulalization provided by Amazon's Athena plus a service called Glue to access that data. All using serverless technology on AWS, does not require standing up an application. (Could extend this design to use different specific technolgies)

Demo by Matt Reno. Current PoC adds a service to Okapi to route transaction information to a data collecter. Runs script to generate loans, can watch in the Okapi logs. AWS tools have some buffering, so it takes about a minute for data to be available. Run a tool to process the data stream. Can use BIRT to run similar report to above. Some data not available for de-reference, will need to use

There's a line over what goes to the data lake

Conclusions:

  • Transactiona data can be extracted in real time
  • BIRT can integrate wiht Data lake to produce reports
    • will need enhancement to resolve UUIDs
    • resolved repotr can be producted direct as in-app report, if all data availabel in the Domain
  • Does not impact existing apps
  • Filtering and partitioning of Data is require
    • tenant context
    • avoid security leaks, GDPR privacy concerns
    • minimize data size, optimize the transactions of interest
  • a similar mechnism could be used for auditing
    • e.g. capture operational errors in audit

Next steps:

  • Develop more permanant mechanisms to enable data extraction form Okapi
    • support selective transactions for reporting
    • (separately) support transactions for auditing
    • do not extract chatter traffic
  • Develop generatl purpose Stream Source from FOLIO (eg. Message Queue)
    • allow to avoid data buffering in case of temporary disconnect
    • Avoids hardwire to Kinesis system and allow other back ends
  • Develop External APIs for FOLIO to support resolution of Identifiers (UUIDs) for reports
    • Role is to make internal information available for consumption by tools like BIRT
    • Hide internal handshakes, etc. from outside tools that do not need it.
    • Will have to be implemented in modules or data domains

VBar will create JIRAs and stories for:

  1. Mechanism in Okapi to extract data
  2. Implement Message Queue for the stream to the data lake
  3. Supplement with APIs for external systems to resolve APIs

These steps will allow support for different technologies for Data Lakes, not hardwired to AWS Kinesis.

Sharon Markus Seems like next steps should include geting access to such an environment where we can collaboratively build testing specific reports, to build out a testing plan. Would be helpful to send a set of use cases to the developers so they can figure out how the development can proceed.

Tags Subgroup of RM SIGSharonAnne-Marie Breaux has formed a Tags subgroup and is looking for a representative from the Reporting SIG to join. Who can attend?
Reporting ToolsReporting Tools used in Germany
Future TopicsTopics for Future Reporting SIG Meetings

Action items

  •