Texas A&M's FOLIO Reporting Environment Demonstration/Presentation | Jeremy Huff James Creel Jason Root | - Have utilized workflows heavily (Camunda)
- Jeremy presented at WOLFCon
- Also expose reports to end user through a custom dashboard
- DBeaver, Superset
- MIS home-grown dashboard interface
- Have put together a PHP application to host reports that need to be consumed by end users
- it's behind SSO, and users have various roles and levels of access; can differentiate between users, and administrators can elevate users
- reports are bespoke, and then can be launched through the dashboard
- some depend on a workflow, and clicking just launches the workflow and sends an email
- can specify parameters
- experience from report to report can vary
- workflow might run a cron job that gets written to disk and can then be downloaded through interface
- try to indicate if it triggers a workflow by including that in the name
- dashboard groups reports by area (e.g., Acq, Cataloging, etc.)
- developed to provide a reporting solution similar to what staff had in Voyager, which had built up over a few decades
- For most reports, much of the data is coming from LDP
- Superset
- haven't gone into production with this, have just done some end of year reports
- since the custom dashboard is not very standardized, thought Superset might smooth that out a bit
- in Superset, create a dataset, which can either be a SQL query or an uploaded dataset
- Superset can cache the data, so performance is good
- With the dataset specified, can build charts, and then those can be grouped into a dashboard
- Can build tabular views of the data and expose those through a public URL
- Could use the existing home-grown dashboard to point to Superset
- Could also be cool to open up Superset so other staff could be building charts and dashboards from the datasets without any help from developers
- One wrinkle could be launching workflows; not sure if those can be launched from Superset
- CloudBeaver
- Some people want direct access to data, so they are running CloudBeaver (read-only instances, so not too worried about opening up access)
- Developers can use that to test out a query, and then when it's working can pull that into a report for the home-grown interface
- In addition to CloudBeaver, can also provide access to a VM with DBeaver that has more than just read access, a bit more careful about that
- Even in CloudBeaver, can limit access for certain people to certain databases, not everyone will see everything
- Camunda
- building a workflow
- Camunda is open source
- there is also mod-workflow for a generic workflow engine, and then mod-camunda is specific to Camunda
- other people might be using AirFlow, etc.
- currently have 17 workflows registered
- can generate incident reports that show what went wrong if something doesn't work (stack trace, etc.)
- there is a visual representation of the nodes that are happening
- If the workflow is slow enough, you can actually watch the graphics to see what is happening right now
- A workflow can be triggered by a cronjob, but also other triggers
- workflows are based on BPMN in Camunda
- mod-workflow is a representation of the workflow, then mod-camunda takes the workflow object and turns it into BPMN so Camunda understands it
- how would you build a workflow around purchase orders?
- this is TAMU's most complicated workflow: log into FOLIO, complete tasks to gather data, start a subroutine that will iterate, results in new instances in FOLIO
- this isn't a report, it's a workflow, but the home-grown dashboard is basically the only logical place to share the workflow trigger
- When is a report a workflow, rather than just a simple query?
- Open Orders query runs fast enough and a small enough result set that people should be able to get it from the browser (execute at time of request)
- Shelflist (Holdings Level) workflow - it takes forever, so it's not a good user experience; have user input an email, and then that and all of the other parameters get handed off to a workflow
- Workflow is also useful for scheduling reports, even if it's a fast query
- What kinds of users do you have? Are you limiting the amount of work people have to do to build their own reports?
- Don't have any pressure to limit, but people do have different appetites for that kind of work; some people really want access to the data, some people are intimidated. Superset seems like it would enable people to build their own reports. With CBeaver, that really creates opportunity for people to build their own reports. And there are many environments, so even if the data is a bit old, people can practice a report in older environments.
- For people who aren't as comfortable, MIS report site is limited by user role, so they don't see everything necessarily. Don't want to overwhelm people.
- Rancher
- have four FOLIO environments, all have LDP updated nightly by cron jobs governed by Rancher
- for SysOps, Rancher is the main interaction point
- have Docker containers
- can look at the clusters for each, check out resources
- this represents VMs(?) in data cluster that are tagged to run different parts of the FOLIO puzzle
- can even drill down to see what artifacts are running - edge, Elasticsearch, Kafka, and also the LDP cron jobs
- LDP cron jobs are kicked off every night, keep 7 days of history
- if someone reports a problem, can check whether it kicked off or if some error happened
- order: LDP data update (LDP platform, runs against FOLIO pre), ldpmarc incremental, derived tables update
- not everyone is comfortable with command lines, so this is a nice visual interface, can do things with buttons
- can even view the log for previous execution inside the process list
- as it's running you can see a graph of how many resources it's using
- some artifacts are coming right out of GitHub, e.g., Docker container
- derived tables are homegrown, do build their own images to select the SQL queries; have their own container registry for that, not public
- if a module dies, we have a backup module that can do what needs to be done
- FOLIO has 90-something modules now
- LDP production is an actual VM, but all others are containerized
- if you are a developer and want to check in on the box, can click on the container and see how heavily hit it is, look at CPU utilization, memory utilization, disk I/O, etc. can also shell into it like you would with a terminal, could send a psql command straight to it. don't normally have to dive in this way, they can use CBeaver or DBeaver, but if those are down, someone can still get in and check things out.
- also useful to check on ownership, which is important for who will have access through the other interfaces.
- just using public instances for PostgreSQL (crunchy data systems); use those for FOLIO production instances as well, over ansible
- pre-production is scaled the same way as production and is running the same number of modules, so it's a good test of what impact a report is going to have on a system
- librarians also like to log into pre to kick off something to see if data important/export is really going to have the right result before they do it in prod
- same is true for reports and workflows, just check with pre
- mod-comunda and mod-workflow, those are docker artifacts running right alongside FOLIO in the cluster
- mod-data-mig handled migration from Voyager to FOLIO, also handled nightly user updates from campus system
- logs help to troubleshoot things like network interruptions
- Any testing with multi tenant? how to set it up with LDP?
- TAMU is single tenant, so haven't gotten to test that much
- sometimes start by setting up DIKU, but don't connect that to a LDP
- do use multi tenancy standards anyway, so everything is correctly formatted by tenant, so would be compatible with multi-tenancy but don't test it
- LDP Query Builder app?
- check it out every so often to see its features, but results set limitation to 1000 records is an issue, so while it seems like a good idea, still waiting for more advanced features
- also have reports that need to pull data from multiple data sources that isn't supported
- think it's a good idea though, and everything that can be brought into FOLIO sphere is a great idea
- if it could build the queries we need to run, would consider using
- Any change for Metadb?
- Not sure yet, excited to find this out
- Do expect to need to rework things; could impact all of the reports
- Might be a good opportunity to make a switch to Superset, since other things will be changing as well
- Nassib has been very helpful finetuning LDP databases so they are as optimized as possible, but even with that it takes each LDP about a half a day to update
- K8s jobs - only for import? or also database? or is that part of crunchy PG cluster?
- LDP is separate from crunchy PG cluster; it's not as mission critical if it goes down
- the LDP dbs are backed up, but not as highly available as FOLIO
- data update cron jobs are just LDP update jobs; take data from running instance of FOLIO and puts it in LDP database... all handled by cron
- Jeremy's workflows that use cron, those are independent from K8s cluster; they tend to live on the dev and test machines, they have a joint developer VM with code and ability to kick of cron jobs
- no workflow crons live in FOLIO cluster, but they could
- one long-term goal is to move FOLIO-related bits of code all into the same organizing "pane of glass"
|