2018-11-19 Reporting SIG notes

2018-11-19 Reporting SIG notes

Date

Nov 19, 2018

Attendees

Present?

Name

Organization

Present?

Name

Organization

Present?

Name

Organization

Present?

Name

Organization

X

Sharon Beltaine

Cornell University

 

Peter Murray

Index Data

 

Elizabeth Berney

Duke University

 

Erin Nettifee

Duke University

 

Joyce Chapman

Duke University

 

Karen Newbery

Duke University

 

Elizabeth Edwards

University of Chicago

X

Tod Olson

University of Chicago

X

Claudius Herkt-Januschek

SUB Hamburg

 

Scott Perry

University of Chicago

X

Doreen Herold

Lehigh University

 

Robert Sass

Qulto

X

Anne L. Highsmith

Texas A&M

 

Simona Tabacaru

Texas A&M

 

Vince Bareau

EBSCO

 

Mark Veksler

EBSCO

 

Harry Kaplanian

EBSCO

X

Kevin Walker

The University of Alabama

X

Ingolf Kuss

hbz

 

Charlotte Whitt

Index Data

 

Lina Lakhia

SOAS

X

Michael Winkler

OLE

 

Joanne Leary

Cornell University

 

Uschi Klute

GBV

X

Michael Patrick

The University of Alabama

X

Holly Mistlebauer

Cornell University

X

Nassib Nassar

Index Data

X

Angela Zoss

Duke University

X

Veit Köppen

University Magdeburg

                

Anna Knyazeva

New attendee

 

Discussion items

Item

Who

Notes

Item

Who

Notes

Assign Notetaker, Take Attendance, Review agenda

Sharon

Today's notetaker: Angela Zoss

Last week's notetaker: Tod Olson

Action Items

Sharon

Update on Action Items from November 12, 2018 meeting

  • ARES-related Resource Access reports with real-time data requirements:

    • Darcy setting up meeting with RA-SIG to look at Pull Slips reports that link to ARES

    • Sharon to review these reports with Darcy and bring notes back to Reporting SIG

  • Sharon will meet this week with Ann-Marie Breaux on Import-Export reports and bring notes back to Reporting SIG

Real-time data in Reports

Nassib

Nassib will discuss the development impacts of providing reports with real-time data in the Folio reporting database. This is important to understand as we review each of our reports.

Real-time data vs. day-old data makes an impact on development, but it's not the case that we can't produce any real-time reports in data warehouse
Still good to check with other SIGs to learn what they are taking care of
Just need to identify where real-time data is needed

(slide from Nassib) Three categories of types of services we could use to query FOLIO data:

  • FOLIO modules (in-app reporting). Would have access to all attributes in "real-time" (there is technically still some delay, ideally seconds or minutes).

  • LDP (reporting) database with batch load - currently in development. Would have attributes prioritized for reporting. Updates overnight.

  • LDP database with streaming update - when developed, greater frequency of update, but requires message queue. (Message queue being developed by another group, supposed to be ready in January, then we need to implement staging function and incremental update.) Could stream just select attributes. Ideally, attributes would be available in seconds or minutes, but different elements still might be different ages.

Note: reporting database is not going to have all data attributes available at first, going to start with elements prioritized for reporting. Example: we prioritize certain circulation reports, then make sure reporting database is using data elements needed for those reports

We could also select just some attributes to be implemented for streaming, this might make it more feasible to implement streaming, but still a question about when to do this development

Need to keep doing work to identify which reports need real-time data, where those reports will live

Even with streaming data implemented, probably not real-time enough for operational uses (patron notices). Only use in-app reporting for operational or mission-critical functions.

Tod: compared to data lake - with LDP batch we have a stable snapshot, things are consistent and normalized, but cost is that there is a delay. Thinking about data lake differently - might get more current data, but don't have any of the integrations. Would be up to individuals to trust that everything is consistent, all messages have come in.

Nassib: with the data lake, the implementation for streaming updates would be different - don't need staging or incremental update, but still have message queue, and that queue could still introduce time delay, so data lake doesn't remove this issue. Staging is needed to resolve out-of-order problem.

If someone thinks a report needs real-time data, need to look at context, but by default probably means it should be in-app. "In-app" reports - querying modules directly - doesn't technically have to be in app. Could use external scripts (R or python) to connect to modules through HTTP. Doesn't necessarily have to be wrapped as a FOLIO module, but having it as a FOLIO module means it has a nice user interface, can be part of the broader system.

Sharon: Resource Mgmt may be doing something like this for export

Report Prototype Subgroup

Sharon, Nassib, Kevin, Tod, Angela

Update from group on experiences with connecting to the FOLIO reporting database to test out building the loan report from BIRT, Tableau, Aqua Data Studio, R, Access, and possibly other applications (Excel). Last meeting was Wed Nov 7 from 3-4pm EST. Next meetings scheduled for Mon Nov 12 and Mon Nov 19.

Tod: we have the first few tables in the reporting database that we can query. Nassib has provided some directly SQL access for specific IP addresses. We have been testing different tools - Aqua Data Studio, creating report based on Nassib's specifications. Have confirmed that can make the connection, produce report. Some IDEs, you can pull back a certain amount of data, but they want everything in memory, so may not be good for millions of rows. Might not be able to get all of that into the client. In Aqua Data Studio, there is a scripting language in the background (builds a script based on your query), so might be able to can some things. Might be easier to get large data with scripting, can definitely output results to file. Testing/documenting output formats, as well.

Sharon: still getting connected with BIRT - can make connection, now loading test SQL, trying to get it to see the data. Working through that this afternoon. Workgroup will me this afternoon 3-4.

Nassib: hope to add more data today

Angela: testing a bunch (Tableau, R, Google Data Studio, pgAdmin4, Mode Studio), still working on documentation

Kevin: Access yes, soon Crystal Reports and Excel)

Real-Time Data Reports

All

Reporting SIG members were asked at last week's meeting to identify reports that require real-time data which have not been identified as in-app, and bring those forward. We will review and discuss these reports.

  • reports are highlighted in yellow on the Reporting SIG Master Spreadsheet

  • themes: batch jobs, Ares reports, need to evaluate Consortia reports

  • more work required on data warehouse infrastructure to provide reports with real time data

  • streaming will not be ready until January 2019; "state changes" provided by streaming data would need to be added to the data model

  • for now, Reporting SIG will review the reports that are not defined as in app reports and do require real time data with these questions in mind:

  • does the report truly require real time data?

  • can the report be provided as an in app report (requires talking with functional SIGs)?

  • is the report functionality already being covered via other functionality (e.g., error reports from batch jobs)?

Holly is in communication with the POs about in-app reports, we should make sure we bring her into the loop on any of these. If there is a question about whether a report is in-app, just make the JIRA ticket anyhow. If it is in-app, Holly can move it.

U. Alabama working on an Assessment module with some canned reports. See recording of Nov 1 FOLIO Forum at time signature 47:12.

We don't have a clear idea of what reporting is planned for import/export. @Sharon Markus will reach out to @Ann-Marie Breaux (Deactivated).

Expect cross-app real-time needs, especially where User Management and Resource Access interact.

Still working on import/export and Ares, reports still highlighted in yellow may need to go SIGs

Prioritizing Your Institution's Reports

Sharon Beltaine

In order to determine the order in which we will prototype our reports for the development of data models for the data warehouse, we need to prioritize them within each functional area (e.g., RM, RA, etc.). Please see the Prioritizing Reports page in the wiki for detailed instructions.

  • review status of progress, discuss any issues and how to address them

Discussion came up last week about lack of user management reports, UM SIG hasn't yet discussed reporting

Resource Access SIG happy to join in, User Management also happy to take a look

May try to schedule a meeting between conveners of RA and UM, additional members of RA, UM, and Reporting as desired

For new reports in spreadsheet - check for duplicates, let Holly know and she will mention to Product Owners

Assigning Yourself to JIRA Reports

Holly Mistlebauer

Holly walked us through the process of assigning ourselves to our reports in the FOLIO JIRA System. Instructions are provided on this wiki page, which will also contain other JIRA info:  Working on a JIRA Issue

  • review status of progress, discuss any issues and how to address them

Topics for Future Meetings

All

Review and update Topics for Future Reporting SIG Meetings

Other Topics?

All

Any other topics to discuss today?

Action items