2021-09-16 Resource Access Meeting Notes

Reporting SIG is helping to design the Library Data Platform LDP
Data base to pull data from FOLIO apps and stores it into a relational data base to then write queries against data from multiple FOLIO apps
to write queries that are large, but don’t slow down the operational system of FOLIO
and it is organized – so that queries that you write are easier than the native data format for FOLIO (JSON objects (JavaScript Object Notation objects) - a lot of hierarchy and complexity)
the data is stored in tables that can be combined in queries
difference between LDP and Metadb (rewrite of software to take streaming data from FOLIO almost real time)
There is not a specific date for a formal release of Metadb / two consortia with more than 60 libraries went live with ReShare and Metadb and are testing it now, Metadb will be deployed for Index Data customers and after that the release is recommended
Question: APIs are used to connect to e.g. Google sheets. Can you explain the difference between the API based reporting and the query based?
- microservices approach where each app has its own function supported by the use of APIs
- so API gives access to either retrieve data from FOLIO app or push data to FOLIO app
- can be useful for reporting - you get immediate real time results
- It takes practice getting used to post man or coding to send that request to the API and to get the data back into process it
- immediate direct access to data / you need to have permissions and have the ability to kind of construct those queries those API requests
- downsides of using that as your only reporting solution: APIs are limited to a specific app, or a specific part of the data model within the app
- sometimes more than one request to get the data / sometimes data is needed from more than one app, so another request is necessary / have to download each of those JSON data / transform the JSON data into table
- really big queries through the API, query would be running for a long time and potentially really slow down your operational database
- you might have to segment that query, sending lots of requests
- APIs are not really designed to do those big queries
- best practice having a separate database for reporting so it never slows down your operational database
- using the LDP to store not just FOLIO data, but also legacy data from their previous ILS or ILL data
- querying data from the LDP is a little more natural using SQL, because the relational database and SQL is kind of designed to work with a relational database
Demo LDP App (snapshot)
- need to have an LDP installed, pulling data from a FOLIO instance for LDP App to work, and this App connects to that LDP instance
- sending an SQL query to the LDP database / database with lots of different tables
- data is organized into schemas - in snapshot 2 schemas "FOLIO_reporting" and "public"
- public schema: drop down menu called table, filled with all of the tables that are available in that schema / first part of the name is e.g. inventory or finance or circulation
- e.g. table "circulation_loans" - by clicking submit und can start to look at the data
- tailoring to current reporting needs - using filter by column - submit - limit this table to the filters set
- icon "floppy disc" - export option as CSV
- working on being able to save a query and share that query with other members of your organization
- we've created a series of what we call derived tables that join up tables from the public schema in ways that we know people want
- "umbrella" queries = a lot of data, that you can than reduce to what you want
- FOLIO_reporting schema: this is the schema that has tables that we call derived tables and this derived tables were made with queries that combine and transform the data from the public schema
- e.g. "loan_items" - like the checkout service point ID, we can connect that up to the checkout service point name so now if you want to filter on a specific checkout service point by name, that column is already available.
- brought in the name of the loan policy which again and alone record is just ID. This is a much bigger table now it has a lot more columns, and it's a lot more complex, but because it's an umbrella query
- limit it to exactly the columns and rows that is wanted - this one table can serve the needs of different circulation queries
- https://github.com/folio-org/folio-analytics There are two different folders for our SQL right - SQL plain and SQL Metadb / so all of our LDP queries are in the plains sequel folder / divided by derived tables and report queries
- So each of the columns in our derived tables have an index / when you want to use a filter on those columns or search for specific records in those columns, it's much faster
- third schema called "local": to create their own derived tables or bring in legacy data
- Question: in the app when you export the data - Do you only get the selected data/columns or do you get all data? - the data you selected
- right now it may have an upper limit of 1000 - this might be increased or find other ways to deal with large data
maybe a topic for next time: Metadb data structure