/
2019-10-04 - System Operations and Management Agenda and SIG Notes

2019-10-04 - System Operations and Management Agenda and SIG Notes

Date

Attendees

Goals

  • Getting to know the FOLIO Reporting Library Data Platform

Discussion items

TimeItemWhoNotes
5WelcomeIngolf
  • Introductions
  • Note taker: Ingolf
 45The FOLIO Reporting Library Data Platform (LDP)

Nassib is the Product Owner of FOLIO Reporting .

He will give an overview about the FOLIO Reporting Library Data Platform (LDP).

The LDP serves as the database on which the FOLIO reports are based upon.

Nassib will also talk about the SysOps part of FOLIO Reporting - Setting up the FOLIO LDP.

Meeting Notes :

Nassib explained SysOps related issues related to the LDP to us.

Data is extracted from the FOLIO databases once a day. It is being extracted via the module APIs.

The Beta version of the LDP is coming out in January; v.1.0 will be out in the summer 2020.

Data in the LDP will allow for SQL querying on it. It will allow data analysts to build queries across modules and on historical data. These things can not be done in the FOLIO system itself.

The LDP will allow for the integration of external, non-FOLIO data sources.

LDP will be scalable from the start.

The LDP will be delivered with a set of common report queries.

The LDP is not intended for:

  • operational reporting
  • real-time access to FOLIO data
  • access to personal data
  • database views are not supported, because schemas will change over time

The LDP is intented for use with reporting tools like Tableau, R etc.


  • Performance benefits from having the data in columns
  • Access to the full data is kept in the JSON data as it exactly comes from FOLIO

Database

Three schemas in the database (Postgres and Redshift are supported):

  1. public - this database contains one table per FOLIO interface
  2. historical - this database contains historical datam which have previously been in public
  3. local - local data will be ignored by the LDP loader

One would need a difference instance of the LDP for each tenant.

LDP does anynomization of attributes in /users - with excpetions, such as aggregate user data (patron group).

Hardware

Suggested hardware: 32 GB memory, spinning disk, 2 TB HDD (SSDs have a limited lifetime of writes).

Software requirements:

Linux, Go 1.13, Postgres 9.6 or later or Redshift

Two database users, ldpadmin (for loading and administrating the data) and ldp (for the data analysts).

Debugging & Testing options

Can keep temporary files containing extracted data for testing / debugging

"unsafe" options : load data from file system, don't use encryption (–nossl)


Outlook

At the moment  data from 22 FOLIO interfaces is being loader to the LDP by the Loader.

An Exception is ERM-data, this works differently. Nassib is in contact with the ERM Subgroup.

Loading from the interfaces can be made configurable (not only concerning the loading of personal data).

At the moment, no personal data in the LDP, so European libraries can use it. In future, GDPR compliance by adding technical means (erasure of data, fulfillment of right of access etc.) of LDP is being planned.

Manual specifications of columns, as it is now, will go away in a future release. Loader will be more intelligent then.

After January 2020, no large changes to the LDP are expected.

https://github.com/folio-org/ldp/analytics   - Repository for Report Queries developed by members of the Reporting SIG.


Discussion

Nassib/Dale: Could use tags as fieldnames. Chris: At Lehigh, we are doing it the same way as Dale describes.A row for every subfield.

Question about Json-Path Query Language. It is much more expressive. Nassib: Yes, but it is not compatible with Redshift. We want to make the LDP compliant with Redshift for institutions who are dealing with large amounts of data from the start.

Q: Which of the LDP modules are going to be dockerized ?

A: The LDP has no modules in the sense of FOLIO modules. It's external to FOLIO. It's governed by a command line tool. You could dockerize that tool.

Question about performance of the loader.

A: It is as efficient as it could be. Historical data take a little time


Thank you, Nassib Nassar , for this comprehensive overview and answering our questions.






Next MeetingsIngolf
  • No meeting next week
  • 2019/10/18: Security Issues, Policies and Processes in FOLIO. How does the community react in case a critical security breach has become known ? This will have been disussed prior to this meeting in a subgroup of the TC, cf. Security Issue Policies and Processes

Action items

  •