2018-10-22 Reporting SIG notes

Date

Attendees

Present?NameOrganizationPresent?NameOrganization
XSharon BeltaineCornell University
Peter MurrayIndex Data

Elizabeth BerneyDuke University
Erin NettifeeDuke University

Joyce ChapmanDuke University
Karen NewberyDuke University

Elizabeth EdwardsUniversity of ChicagoXTod OlsonUniversity of Chicago
XClaudius Herkt-JanuschekSUB HamburgXScott PerryUniversity of Chicago
XDoreen HeroldLehigh University
Robert SassQulto
XAnne L. HighsmithTexas A&MXSimona TabacaruTexas A&M

Vince BareauEBSCO
Mark VekslerEBSCO
XHarry KaplanianEBSCOXKevin WalkerThe University of Alabama
XIngolf Kusshbz
Charlotte WhittIndex Data

Lina LakhiaSOASX

Michael Winkler

OLE
XJoanne LearyCornell University
Uschi KluteGBV
XMichael PatrickThe University of AlabamaXHolly MistlebauerCornell University
XNassib NassarIndex DataXAngela Zoss

Duke University

Veit KöppenUniversity Magdeburg




Discussion items

ItemWhoNotes
Assign Notetaker, Take Attendance, Review agendaSharon

Today's notetaker: Ingolf Kuss

Last week's notetaker: Simona Tabacaru

GlintNassib Nassar

Nassib will discuss the https://glintcore.net project, which provides a foundation for the library data platform architecture for the FOLIO reference data warehouse. Glint is open source software for communicating, describing, and integrating data.

Notes

Glint and the LDP (slides by Nassib) (LDP = Library Data Platform)

Glint : An open source software for sharing, curating and integrating data.

Features:

  • lightweight protocol
  • interoperability layer
  • new approach to managing scientfic data

Data curation = describing data. Data curation usually takes place at a late stage, after data collection, data pipelines and sharing with a research team. Glint takes care of the data curation at an earlier stage. This helps to integrate and re-use the data.

Glint is kept simple (lightweight) to be able to run in many different contexts.

Glint has two interfaces:

  • web based interface
  • command line interface

Glint could be configured to store the data in a repository. The data is posted to glint (e.g. by means of the command line interface) and will be assigned an URL.The data is user-modifiable. There is a simple language to modifiy the data.

Jason Skomorowski has written an UI-interface for integrating data in FOLIO (called ui-datasets, a FOLIO client for Glint).

LDP - the Library Data Platform is build for Reporting. It consists of (classical) Batch ETL and a database (on the left hand side of the diagram, for a classic Reporting Tool), but also streaming ETL, distributed databases and other databases (on the right hand side of the diagram, for other data analysis tools). One might use Glint as a data platform.

Questions that come up with LDP:

  • How can requests be made available to people who don't have access to connect to the database ? → One can address that with Glint.
  • How will data outside of the database be accessed ?
  • Support for data analysis tools such as R ?
  • What about privacy and permissions ?

How Glint can help:

Glint allows for a lot of flexibility how data are stored:

  • file system
  • network database

Glint can be used to offer an interface for accessing reports.

With Glint, one can access data that are within as well as outside of a database.

Integration with R and Python: Can use Python calls for Glint, a Glint library has been developed for Python.

LDP can be made accessible to FOLIO modules in form of Glient data sets, when direct database access is not feasible. I.e. if that would put to much load on the database.

What Glint doesn't do:

  • Glint doesn't give you a full query language. It focuses on describing and sharing data. Assumes that querying will be done by database views.
  • Less complex permissions model than in FOLIO.


Discussion

Q: Does Glint replace the database ? Will Glint be the primary source when writing reports ?

A: Glint would not replace the direct database connection.

Glint would share reports, share database views. Glint gives access to the data (to people who do not have direct access to the database).

Glint is for data scientists, not so much for data analysts (the latter ones would prefer to use SQL).

Glint can be used for data which do not fit in a database. It is also for sharing data to the public (could share views or generated reports).


(discussed in the TC:) A Reporting SIG Data Warehouse might be redundant for institutions which already have a data warehouse (e.g. needed for legal reasons or for some national reporting needs). But most of us are not in the situation where we already have a data warehouse.

LDP is a hybrid solution for evolving needs for data analysis. The model (LDP and Glint) is superior in some ways but inferior in many ways (to classical solutions). It would be a more modern system, but some things will have to be re-invented.

Glint addresses a missing piece. However, there is nothing pressing in terms of development right now.

Q: What is the dataflow like ? Would Glint query the underlying FOLIO databases or would the data flow from FOLIO into Glint ?

A: Glint could sit on top of both, FOLIO databases or FOLIO modules. Glint might make in-app reporting easier. Glint could sit directly on top of the reporting database. Glint will generate materialized views from the database. This would work for daily data updates, but not continuously.

Q: How would Glint work for textual data ?

A: There is no problem, whether it's text or numeric. Nassib has some experiences with that. One can use "slicing and dicing" as in SQL. We are talking about a database on top of and besides an SQL database.

Q: What possibilities does the Glint interface have ?

A: Glint isn't really an interface. Anyone can run their owm dashboard. Glint is more like a middleware.


Prioritizing Your Institution's ReportsSharon Beltaine

In order to determine the order in which we will prototype our reports for the development of data models for the data warehouse, we need to prioritize them within each functional area (e.g., RM, RA, etc.). Please see the Prioritizing Reports page in the wiki for detailed instructions.

-This week, we will review our group's progress on prioritizing reports in the "import and export," "external statistics," and "resource mgmt" worksheets in the Reporting SIG Master Spreadsheet

-Monday November 12, 2018 is our target completion date

Notes: tabled.

Assigning Yourself to JIRA ReportsHolly Mistlebauer

Holly walked us through the process of assigning ourselves to our reports in the FOLIO JIRA System. Instructions are provided on this wiki page, which will also contain other JIRA info:  Working on a JIRA Issue

-review of JIRA report assignments

-Monday November 12, 2018 is our target completion date

Notes: tabled.

Topics for Future Meetings

All

Review and update Topics for Future Reporting SIG Meetings

Other Topics?AllAny other topics to discuss today?


We will meet next week, Oct 29, as usual at 9 AM Eastern U.S. time. This will be 2 PM in central Europe (most European countries turn back the clock on Oct 28).

We will meet on Nov 5 as usual at 9 AM Eastern U.S. time (this will then be EST, not daylight saving time EDT, anymore). This will be 3 PM in central Europe (as usual).

Action items

  •