2018-11-19 Reporting SIG notes

Date

Attendees

Present?NameOrganizationPresent?NameOrganization
XSharon BeltaineCornell University
Peter MurrayIndex Data

Elizabeth BerneyDuke University
Erin NettifeeDuke University

Joyce ChapmanDuke University
Karen NewberyDuke University

Elizabeth EdwardsUniversity of ChicagoXTod OlsonUniversity of Chicago
XClaudius Herkt-JanuschekSUB Hamburg
Scott PerryUniversity of Chicago
XDoreen HeroldLehigh University
Robert SassQulto
XAnne L. HighsmithTexas A&M
Simona TabacaruTexas A&M

Vince BareauEBSCO
Mark VekslerEBSCO

Harry KaplanianEBSCOXKevin WalkerThe University of Alabama
XIngolf Kusshbz
Charlotte WhittIndex Data

Lina LakhiaSOASX

Michael Winkler

OLE

Joanne LearyCornell University
Uschi KluteGBV
XMichael PatrickThe University of AlabamaXHolly MistlebauerCornell University
XNassib NassarIndex DataXAngela Zoss

Duke University

XVeit KöppenUniversity Magdeburg                Anna KnyazevaNew attendee


Discussion items

ItemWhoNotes
Assign Notetaker, Take Attendance, Review agendaSharon

Today's notetaker: Angela Zoss

Last week's notetaker: Tod Olson

Action ItemsSharon

Update on Action Items from November 12, 2018 meeting

  • ARES-related Resource Access reports with real-time data requirements:

    • Darcy setting up meeting with RA-SIG to look at Pull Slips reports that link to ARES
    • Sharon to review these reports with Darcy and bring notes back to Reporting SIG
  • Sharon will meet this week with Ann-Marie Breaux on Import-Export reports and bring notes back to Reporting SIG

Real-time data in ReportsNassib

Nassib will discuss the development impacts of providing reports with real-time data in the Folio reporting database. This is important to understand as we review each of our reports.

Real-time data vs. day-old data makes an impact on development, but it's not the case that we can't produce any real-time reports in data warehouse
Still good to check with other SIGs to learn what they are taking care of
Just need to identify where real-time data is needed

(slide from Nassib) Three categories of types of services we could use to query FOLIO data:

  • FOLIO modules (in-app reporting). Would have access to all attributes in "real-time" (there is technically still some delay, ideally seconds or minutes).
  • LDP (reporting) database with batch load - currently in development. Would have attributes prioritized for reporting. Updates overnight.
  • LDP database with streaming update - when developed, greater frequency of update, but requires message queue. (Message queue being developed by another group, supposed to be ready in January, then we need to implement staging function and incremental update.) Could stream just select attributes. Ideally, attributes would be available in seconds or minutes, but different elements still might be different ages.

Note: reporting database is not going to have all data attributes available at first, going to start with elements prioritized for reporting. Example: we prioritize certain circulation reports, then make sure reporting database is using data elements needed for those reports

We could also select just some attributes to be implemented for streaming, this might make it more feasible to implement streaming, but still a question about when to do this development

Need to keep doing work to identify which reports need real-time data, where those reports will live

Even with streaming data implemented, probably not real-time enough for operational uses (patron notices). Only use in-app reporting for operational or mission-critical functions.

Tod: compared to data lake - with LDP batch we have a stable snapshot, things are consistent and normalized, but cost is that there is a delay. Thinking about data lake differently - might get more current data, but don't have any of the integrations. Would be up to individuals to trust that everything is consistent, all messages have come in.

Nassib: with the data lake, the implementation for streaming updates would be different - don't need staging or incremental update, but still have message queue, and that queue could still introduce time delay, so data lake doesn't remove this issue. Staging is needed to resolve out-of-order problem.

If someone thinks a report needs real-time data, need to look at context, but by default probably means it should be in-app. "In-app" reports - querying modules directly - doesn't technically have to be in app. Could use external scripts (R or python) to connect to modules through HTTP. Doesn't necessarily have to be wrapped as a FOLIO module, but having it as a FOLIO module means it has a nice user interface, can be part of the broader system.

Sharon: Resource Mgmt may be doing something like this for export

Report Prototype SubgroupSharon, Nassib, Kevin, Tod, Angela

Update from group on experiences with connecting to the FOLIO reporting database to test out building the loan report from BIRT, Tableau, Aqua Data Studio, R, Access, and possibly other applications (Excel). Last meeting was Wed Nov 7 from 3-4pm EST. Next meetings scheduled for Mon Nov 12 and Mon Nov 19.

Tod: we have the first few tables in the reporting database that we can query. Nassib has provided some directly SQL access for specific IP addresses. We have been testing different tools - Aqua Data Studio, creating report based on Nassib's specifications. Have confirmed that can make the connection, produce report. Some IDEs, you can pull back a certain amount of data, but they want everything in memory, so may not be good for millions of rows. Might not be able to get all of that into the client. In Aqua Data Studio, there is a scripting language in the background (builds a script based on your query), so might be able to can some things. Might be easier to get large data with scripting, can definitely output results to file. Testing/documenting output formats, as well.

Sharon: still getting connected with BIRT - can make connection, now loading test SQL, trying to get it to see the data. Working through that this afternoon. Workgroup will me this afternoon 3-4.

Nassib: hope to add more data today

Angela: testing a bunch (Tableau, R, Google Data Studio, pgAdmin4, Mode Studio), still working on documentation

Kevin: Access yes, soon Crystal Reports and Excel)

Real-Time Data ReportsAll

Reporting SIG members were asked at last week's meeting to identify reports that require real-time data which have not been identified as in-app, and bring those forward. We will review and discuss these reports.

  • reports are highlighted in yellow on the Reporting SIG Master Spreadsheet
  • themes: batch jobs, Ares reports, need to evaluate Consortia reports
  • more work required on data warehouse infrastructure to provide reports with real time data
  • streaming will not be ready until January 2019; "state changes" provided by streaming data would need to be added to the data model
  • for now, Reporting SIG will review the reports that are not defined as in app reports and do require real time data with these questions in mind:
  • does the report truly require real time data?
  • can the report be provided as an in app report (requires talking with functional SIGs)?
  • is the report functionality already being covered via other functionality (e.g., error reports from batch jobs)?

Holly is in communication with the POs about in-app reports, we should make sure we bring her into the loop on any of these. If there is a question about whether a report is in-app, just make the JIRA ticket anyhow. If it is in-app, Holly can move it.

U. Alabama working on an Assessment module with some canned reports. See recording of Nov 1 FOLIO Forum at time signature 47:12.

We don't have a clear idea of what reporting is planned for import/export. Sharon Markus will reach out to Ann-Marie Breaux (Deactivated).

Expect cross-app real-time needs, especially where User Management and Resource Access interact.

Still working on import/export and Ares, reports still highlighted in yellow may need to go SIGs

Prioritizing Your Institution's ReportsSharon Beltaine

In order to determine the order in which we will prototype our reports for the development of data models for the data warehouse, we need to prioritize them within each functional area (e.g., RM, RA, etc.). Please see the Prioritizing Reports page in the wiki for detailed instructions.

  • review status of progress, discuss any issues and how to address them

Discussion came up last week about lack of user management reports, UM SIG hasn't yet discussed reporting

Resource Access SIG happy to join in, User Management also happy to take a look

May try to schedule a meeting between conveners of RA and UM, additional members of RA, UM, and Reporting as desired

For new reports in spreadsheet - check for duplicates, let Holly know and she will mention to Product Owners

Assigning Yourself to JIRA ReportsHolly Mistlebauer

Holly walked us through the process of assigning ourselves to our reports in the FOLIO JIRA System. Instructions are provided on this wiki page, which will also contain other JIRA info:  Working on a JIRA Issue

  • review status of progress, discuss any issues and how to address them

Topics for Future Meetings

All

Review and update Topics for Future Reporting SIG Meetings

Other Topics?AllAny other topics to discuss today?

Action items

  •