2019-10-04 - System Operations and Management Agenda and SIG Notes
Date
Attendees
- Christopher Creswell
- Greg Delisle
- Robert Douglas
- jackie.gottlieb@duke.edu (Deactivated)
- Hkaplanian
- Tod Olson
- spampell
- jroot
- Vandana Shah
- Catherine Smith
- Brandon Tharp
- patty.wanninger
- Dale Arntson
Goals
- Getting to know the FOLIO Reporting Library Data Platform
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
5 | Welcome | Ingolf |
|
45 | The FOLIO Reporting Library Data Platform (LDP) | Nassib is the Product Owner of FOLIO Reporting . He will give an overview about the FOLIO Reporting Library Data Platform (LDP). The LDP serves as the database on which the FOLIO reports are based upon. Nassib will also talk about the SysOps part of FOLIO Reporting - Setting up the FOLIO LDP. Meeting Notes : Nassib explained SysOps related issues related to the LDP to us. Data is extracted from the FOLIO databases once a day. It is being extracted via the module APIs. The Beta version of the LDP is coming out in January; v.1.0 will be out in the summer 2020. Data in the LDP will allow for SQL querying on it. It will allow data analysts to build queries across modules and on historical data. These things can not be done in the FOLIO system itself. The LDP will allow for the integration of external, non-FOLIO data sources. LDP will be scalable from the start. The LDP will be delivered with a set of common report queries. The LDP is not intended for:
The LDP is intented for use with reporting tools like Tableau, R etc.
Database Three schemas in the database (Postgres and Redshift are supported):
One would need a difference instance of the LDP for each tenant. LDP does anynomization of attributes in /users - with excpetions, such as aggregate user data (patron group). Hardware Suggested hardware: 32 GB memory, spinning disk, 2 TB HDD (SSDs have a limited lifetime of writes). Software requirements: Linux, Go 1.13, Postgres 9.6 or later or Redshift Two database users, ldpadmin (for loading and administrating the data) and ldp (for the data analysts). Debugging & Testing options Can keep temporary files containing extracted data for testing / debugging "unsafe" options : load data from file system, don't use encryption (–nossl) Outlook At the moment data from 22 FOLIO interfaces is being loader to the LDP by the Loader. An Exception is ERM-data, this works differently. Nassib is in contact with the ERM Subgroup. Loading from the interfaces can be made configurable (not only concerning the loading of personal data). At the moment, no personal data in the LDP, so European libraries can use it. In future, GDPR compliance by adding technical means (erasure of data, fulfillment of right of access etc.) of LDP is being planned. Manual specifications of columns, as it is now, will go away in a future release. Loader will be more intelligent then. After January 2020, no large changes to the LDP are expected. https://github.com/folio-org/ldp/analytics - Repository for Report Queries developed by members of the Reporting SIG. Discussion Nassib/Dale: Could use tags as fieldnames. Chris: At Lehigh, we are doing it the same way as Dale describes.A row for every subfield. Question about Json-Path Query Language. It is much more expressive. Nassib: Yes, but it is not compatible with Redshift. We want to make the LDP compliant with Redshift for institutions who are dealing with large amounts of data from the start. Q: Which of the LDP modules are going to be dockerized ? A: The LDP has no modules in the sense of FOLIO modules. It's external to FOLIO. It's governed by a command line tool. You could dockerize that tool. Question about performance of the loader. A: It is as efficient as it could be. Historical data take a little time Thank you, Nassib Nassar , for this comprehensive overview and answering our questions. | |
Next Meetings | Ingolf |
|