2022-10-21 - Sys Ops & Management SIG Agenda and Meeting notes

Date

Topic

  • Hosting the FOLIO LDP

Attendees

TimeItemWhoNotes
5WelcomeIngolfWe will only talk about LDP hosting today. The Metadb interface is still undergoing a lot of changes.

Hosting the FOLIO LDP

Following a discussion in the Reporting SIG, Ingolf committed to bring the topic of LDP Hosting into this group.

Points of discussion should be:

  • Who is self-hosting LDP or Metadb ?
  • How do you self-host it ?
  • How does EBSCO or IndexData host it ? (LDP hosting experts of those companies need to be invited to this session → Ingolf)
  • How do you secure your LDP/Metadb implementation ?
  • How do you do testing (of LDP/Metadb) ?
  • How do you (plan to) handle upgrades ?


Wayne

Presentation that Wayne plans to hold: LDP deployment and operations 2022-10 - Google Präsentationen

  • Scope: grew out of work of FOLIO Reporting SIG, but is now more broad, includes ReShare, possibly others.
    • Takes a data warehouse approach to reporting data
    • Separate the reporting database from the transactional data, avoid reporting contending with production transaction.

LDP: Library Data Platform, incubator project of OLF

    • LDP1: first generation of software,
      • copies all of data into reporting 
      • update periodically, often once per day
    • ldpmarc: breaks MARC data into tabular form more convenient for reporting
    • LDLite: make reporting queries of the Okapi APIs, command-line tool
    • Metadb: second generation
      • extracts data from production,
      • uses Postgresql streaming replication, Kafka streaming to transform into more normalized form
    • community contributions of analytics/reports
    • FOLIO/ReShare LDP app
      • mod-ldp, ui-ldp, lightweight very simple SQL interface

Operational notes:

  • LDP1:
    • Can run multiple tenants on a server instance, but needs one Postgres database per tenant
    • Need minimal number of users for admin and query, can add additional users
    • Need to schedule daily builds
    • Deploy from source or use project-provided containers
    • Simple JSON config, documented in admin guide
  • ldbmarc
    • Significant storage to duplicate MARC records
    • Need to do initial build, but then daily incremental updates are fast (~10 min. for UChicago)
  • folio_reporting tables
    • community-contributed sql/derived tables
    • No views in LDP1, data is rebuilt repeatedly so views are not convenient
    • derived tables makes subsequent queries fairly fast
  • mod-ldp & LDP reporting/query builder
    • backend is proxied by Okapi and communicates with LDP reporting DB
    • frontend can be included in FOLIO/Stripes webpack

Questions

  • When you close FOLIO prod environment, do you also clone LDP data over? Or do you just re-build?
    • TAMU does not, just rebuilds
    • Index Data: pre-prod or test, do not attempt to preserve, just rebuild
  • What do you do for prod LDP for Postgresql backup solution? and what cadence?
    • TAMU keeps 28 days backups, but do not ship offsite
    • Index Data uses Amazon RDS for Postgresql services, which has some disaster recovery capabilities. Only keep 7 days of backups.
      • No one has yet expressed the need for "I need LDP from 3 weeks ago"
      • LDP preserves some of earlier state of objects internally anyway.
  • EBSCO notes
    • Similar for disaster recovery
    • Strong points
      • Very reliable process, if you set everything correctly it will work until you need to change or upgrade
      • Quick and good support from developers, Nassib has provided a lot of support
    • Weak points
      • data transfer times gets longer as the size of data increases
      • will need more resources as data size grows, will need to increase size of VM
      • Silent failures, logging often does not catch the failure
        • Container approach means logs live inside containers. Implemented additional functionality to store logs outside of 
        • Note: Index Data uses container log aggregation. ID uses CloudWatch, but there are other options, like EKB. Even if the container goes away, then the logs are still there. Might not fit all environments.
        • TAMU also uses container logs, can view real time through Rancher interface and ship out to Splunk. Note: have not yet set up alerts on failed jobs so manually check logs. Most failures have been result of something devs has changed.
      • LDP1 documentation is mostly directed to developers, difficult if you do not already know how the system works
      • No automatic recovery, when process fails need to re-run manually
      • For EBSCO environment, incremental updates can take more time than full rebuild if there are a lot of updates
      • process for big databases takes too long, in our case takes 14 hours for DB with 10M inventory records
        • ID sees quicker indexing but still substantial: 8.5 M takes 3.5 hour, ldpmarc incremetal 10 min., folio-analytics ~4hours, so a little over half the time
        • TAMU runs all LPD processes as K8s containers, sees similar to above, can run 8-12 hours on 4M records, depending on what goes on. Does not have as much RAM on LDP servers, but have 4 cores. Reordered sequence to be more reliable. 
  • Security concerns?
    • We keep hearing rumors of security concerns, but are not certain what these concerns are.
    • There are some basic things: secure the network, secure PII information, but that's not specific to LDP
    • Postgresql on-wire security has been good, but doesn't do SSO, so could be an issue for some sites. 
    • TAMU uses CloudBeaver to provide SSO and read-only access to LDP for power users. Also use their workflow engine to run some reports on a schedule and post output to a secure website.
  • Testing / Upgrades
    • Is there a upgrade sequence?
    • Index Data : LDP1 and ldpmarc are largely independant of FOLIO versions. For all tenants, have a production environment and a staging environment. When there's a new instance of LDP1 or ldpmarc will upgrade in staging environment. For point releases, usually do not have user testing, but for anything bigger will invite users to test. Will roll to production when satisfied with the results. ldpmarc does not have a real "upgrade" process, but need to rebuild. folio-analytics are much more connected to FOLIO, tie upgrades to flower releases.
    • TAMU very similar. Do not upgrade their FOLIO instance very often, every 6 months or year. Currently on Lotus
    • EBSCO: usually do not upgrade LDP components until flower releases, unless there is some problem and will talk to Nassib and others. Usually have special Jenkins job which creates something called "LDP engine" with the three components and creates a Docker container for them. Do smoke testing for the release and then make a recommendation for the specific versions of the three components and use these versions until the next flower release.
  • Many expressions of gratitude to Nassib for all of his work and support!

Action items

  • Type your task here, using "@" to assign to a user and "//" to select a due date