Missouri State first US institution to go live with FOLIO
had reports that were generated in previous system that library staff wanted to retain
info not really available via API
setting up LDLite was something we wanted to do anyway, gave us a chance to do other things
are an EBSCO-hosted library, so don't have access to LDP
if we wanted to do anything using SQL, LDLite was only option
don't mind doing stuff with the APIs, enjoy that, but really like SQL
having things in an SQL format gives us some additional options we wouldn't normally have
using the variation Nassib mentioned - MSU is taking data from LDLite and putting it into a Postgres database
1 database - complete harvest of all data; not updated often; have 1.4 million bib records, full harvest takes about 33 hours (VM with 4 processors 8 GB RAM, 300 GB storage), including SRS (15 hours just by themselves)
set that up first just to see what it was like
soon after, librarian needed to do some stats, didn't want them to have all the data, so created a smaller database and only harvested the data they need (collections, circulation) - harvest took about 9 hours
have been around libraries since 1993; back in mid-'90s, if you wanted to do a free scripting language, PHP and mySQL was the thing. don't have background in Postgres or Python, but have been able to figure it out.
have some scripts that run the harvest - start it end of day, let it run overnight.
working on something else, with help from Nassib - want to do incremental updates. want to run a script that only brings changes since the last run of the script. hopefully run in the morning and get the data for working in the afternoon.
like the idea of using multiple databases to control access - just don't even include sensitive data in certain databases
script using is a variation of the script Nassib has on the GitHub page
have added the call into the database
have also created a text output; says when it starts and ends, also writes the endpoint and the tables it wrote for that endpoint; could also have it print table structure
have DBeaver connected to the VM for running queries; VM is ubuntu server, do everything commandline, so wanted DBeaver for an interface for queries
Questions: institutions accessing legacy data through LDP
Cornell has saved off data in a Postgres database, hope to bring it into the LDP; have pulled data out of legacy, do have a bunch that has been pulled into the local schema
have been pulling requests data from Ares into the LDP to blend with FOLIO
have been talking about getting the data from the Voyager archive and using that; would really love that
have been talking about getting bits and pieces of the Voyager data separately (it's static, so just have to do it once)
getting the data from the legacy system into a format that will work is a big project; would have been nice to bring it in just before implementation
during migration, some things may have accidentally gotten skipped, so good to have that backup, would be easier if it was already in LDP
hard to query out the circ data and then bring it into the LDP
would be easier if it were all there
Stanford is likely to go straight to Metadb, which will eventually have a concept of multiple input sources; FOLIO would be one input source, but you could possibly set up the legacy ILS as a second input source, and Metadb could pull everything automatically (or a defined subset); and then if you needed to run both systems at the same time, it could continue to pull data from the legacy system
any lessons learned for implementation?
not terribly difficult, just install LDLite (one dependency, maybe); had more trouble getting Postgres installed and configured than LDLite
originally tried just using LDLite without database, but not great for harvesting absolutely everything; if you are going to use it for a lot of data, recommend go ahead and install Postgres and connect LDLite to that
that means you'll have to learn Postgres, at least a little
have gotten everything done with Google
If you're using LDLite in place of an LDP or Metadb instance, probably makes sense to have a shared database that would mimic an LDP, and use a regular scheduled to have LDLite update the data; probably works better for a smaller library where you can download the data regularly; a larger university probably needs the full LDP and Metadb
if Mark can get the incremental updates working, the plan is to write up the script and schedule it to run every night, which would give the users access to yesterday's data just like LDP users
what kind of documentation is there? you talk about having different databases that cover different things, how do you share that with users?
there's only two people using it right now at MSU, so not too many people who need to know; just shared the database connection details and he went ahead with DBeaver; he doesn't know about the other database he doesn't have access to
we're a small library - 45 employees, 3 IT staff, Mark is only one doing FOLIO stuff. documentation is just for him.
running it against the FOLIO sample data has been problematic; the sample data has duplicated identifiers, which prevents us from tracking with tables connect to each other
so, need to use this approach with real database rather than something like FOLIO snapshot
for Metadb there is a new version that works a bit differently
Nassib willing to talk to Cornell about trying to help with that
still trying to figure out relationships between tables in the various areas
hoping to start creating diagrams for derived tables
would also be great to have for each functional area/query subteam
The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts
Community & Coordination, Reporting Subgroup Leads
Project updates
Reporting development is using small subgroups to address priorities and complete work on report queries. Each week, these groups will share reports/queries with the Reporting SIG. Reporting development team leads are encouraged to enter a summary of their work group activities below.
RA/UM Working Group
Meetings have become more of a lab session, working through specific problems
Angela has been attending RA SIG meetings to open the lines of communication, now once a month
Contact Angela if you would like to join these meetings; second Wednesdays at 1pm Eastern (but for Feb, actually Tuesday the 8th at 2pm Eastern)
Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata.
We have submitted our slate of derived tables for Metadb for Lotus R1 2022. We completed 16 derived tables for metadb!
We are turning our attention to finish porting over derived tables to metadb. We have a sign up sheet. We are looking for reviewers.
In next our lab session, we will use the LDP instance identifiers and go through the process of porting this over to metadb, how to contribute a pull request with the changes and update the runlist.txt, and go through a review.
ERM Working Group
Fixed data types for foreign keys: #554, #551, #550
Will now refocus on documentation and new derived table developement
Iris documentation is in progress, due December 15
Additional Context
The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub)
External Statistics Working Group
no updates currently
new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
each reporting development team will take responsibility for the queries in their area for the external statistics clusters
Product Council
For all recent work on FOLIO Reporting SQL development: