[UXPROD-949] Rewrite Cornell's Web based LS-Tools Created: 14/Jun/18  Updated: 18/Apr/23  Resolved: 18/Apr/23

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: Q4 2019

Type: Epic Priority: P3
Reporter: Hkaplanian Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: sysops_mgt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File LSTools Features Desc.png     PNG File LSTools Main Screen.png     Microsoft Word LStools priorities with FOLIO issues.xlsx     Microsoft Word LStools_Folio.docx    
Issue links:
Relates
relates to UXPROD-597 Files App Open
relates to UXPROD-868 Bulk Edit In Progress
relates to UXPROD-47 Batch Importer (Bib/Acq) Analysis Complete
relates to UXPROD-120 Bulk edit Inventory records In Progress
relates to UXPROD-594 Scheduler Closed
relates to UX-52 Workflow app Closed
relates to UXPROD-600 Workflow App with To-Do App Integration In Review
Epic Name: Rewrite Cornell's Web based LS-Tools (batch ETL tool)
Score: 14.5
Start date (migrated):
End date:
Epic Color: ghx-label-7

 Description   

LS-Tools = A tool set at Cornell which allows technical services do library automation tasks through a common web interface.
It is essentially an ETL tool. It comprises cleanup jobs, export and import jobs. There is a specification written for each job that describes who owns it, how it is executed, what is needed, etc. The app also allows for searching of marc data with a highly granular selection capability. The app is a swiss army knife of technical services activities. It has meant a huge increase in productivity for technical services staff. Reimplementing this in Folio will be a challenge.
Some combination of the workflow engine and the marc batch loader could replace LS tools. The workflow engine is meant to be a platform to build these kinds of applications on.



 Comments   
Comment by Chris Manly [ 09/Jul/18 ]

Here is my attempt to capture what LSTools is in a way that will facilitate figuring out what would be needed to support it in FOLIO:

LSTools is a platform for automated library technical services, centered around batch processing of MARC metadata and feeding it into Cornell's installation of Voyager. It was not originally designed as such, but instead grew organically out of a collection of scripts and programs that did library automation tasks. The programmer responsible for those scripts realized that by putting a web interface in front of the automation jobs, they could be invoked by the technical services staff directly. Over time, patterns emerged in the types of jobs being run, and the programmer consolidated disparate scripts into fewer, more capable programs that could be adjusted via parameters for specific situations.

In one way, LSTools can be viewed as a highly specialized ETL tool, in that it extracts, transforms, and loads data. It does not, however, play the role that ETL tools generally do in transforming data to feed a data warehouse.

The most effective way to replicate the functionality of LSTools within FOLIO would be to identify and replicate its generalized capabilities as part of the platform, and then glue them together with an app that provides an optimized UI for managing those capabilities. Those capabilities are:

1. Advanced querying of metadata records. A key component of FOLIO is a custom program called 'Harvest' which uses flat-file extracts of binary MARC data from Voyager to do detailed searches of metadata. It can select records on multiple criteria (including the absence of certain fields, specific subfields, etc.). It can then output a list of record IDs, or specified fields from the records as tabular data. Replicating Harvest directly would not be desirable (and probably not even technically possible, as a specification for the original program does not exist. Previous attempts to re-implement the C program in perl did not succeed). A modern implementation could likely be accomplished with a solr or elasticsearch index of appropriate configuration.
2. Algorithmic modification of metadata. The "transform" step of each process can be highly variable, if not unique. Ideally, there would be a capability to insert arbitrary code functions within the framework to do metadata manipulation on the records selected in the first step. There may be common cleanup tasks that could be pre-defined, but the ability to have technical staff craft new work here is key. This also goes beyond mere batch editing of the sort that can be done by MARCEdit. Rather, it involves making policy/practice decisions up-front and encoding those decisions in an automated engine, greatly reducing the time spent on the day-to-day implementation of those practices.
3. Bulk loading of records into system. Currently, LSTools interfaces with the Voyager BULKIMPORT tool to load MARC records into Voyager. An equivalent capability is needed here.
4. Scheduling of work. Currently, all jobs defined in LSTools can be run either as a cron job or manually via the web interface. This allows regular activity to be pre-scheduled, but also allows one-off runs of a job to be invoked. That is useful for both testing new jobs and for re-running a job that failed.
5. Reporting of results. The success (or failure) of any given job needs to be reported to appropriate staff, along with relevant information about the data handled by the job (number of records, etc.). Currently this is done via e-mail, but could potentially be done within FOLIO. Failures/problems should be escalated automatically for analysis and remediation.
6. Remote file interactions. A common function of LSTools jobs is to fetch metadata from vendors for loading into Voyager, or to send to an external party a set of records that was the result of a Harvest query. The ability to fetch and push data sets via FTP, SFTP, and other relevant protocols is needed.

Some of these capabilities may already exist within FOLIO or be on the roadmap. The envisioned Workflow engine might provide the ability to drive jobs based on a schedule or a triggering event, and/or provide escalation response to a failed job. A general-purpose metadata loader could likely serve as the interface for loading records in this context as well. To the extent that these capabilities overlap with existing FOLIO tools, we should make use of existing functionality.

Comment by Ann-Marie Breaux (Inactive) [ 12/Jul/18 ]

In Batch Loader subgroup mtg today, we were talking about LS Tools, Bulk Edit, and Open Refine. Chris Manly I linked LS Tools and Bulk Edit since whatever development happens on these apps/services seems like it should be done in the context of each other. Christie Thomas Also need to be taking scripts into account. And will need to think about how this may fit with Workflows app.

Comment by Philip Robinson [ 28/Mar/19 ]

Screenshot of main screen in LSTools

Comment by Philip Robinson [ 28/Mar/19 ]

LSTools "Business Analysis" screen describing the system's main functions

Comment by Philip Robinson [ 29/Mar/19 ]

LSTools priorities spreadsheet from Cornell
LStools priorities with FOLIO issues.xlsx

Comment by Philip Robinson [ 29/Mar/19 ]

LSTools / FOLIO critical needs statement from Cornell
LStools_Folio.docx

Comment by Erin Nettifee [ 17/Apr/23 ]

Thomas Trutt should this be closed?

Comment by Erin Nettifee [ 18/Apr/23 ]

Per Thomas at Cornell, this can be closed.

Generated at Fri Feb 09 00:11:42 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.