2024-02-23 Sys Ops & Management SIG Agenda and Meeting notes

Date and time

9-10 CT

https://openlibraryfoundation.zoom.us/j/591934220?pwd=dXhuVFZoSllHU09qamZoZzZiTWhmQT09

Topics

Using ChatGPT with FOLIO

Attendees

TimeItemWhoNotes
5WelcomeIngolf

45Using ChatGPT with FOLIO

Jeremy Nelson

Jeremy will deliver a speech which he has held on several occasions already elsewhere.

One use case that he will describe will be the use of Artificial Intelligence to populate records of a FOLIO instance.


Notes:

https://ai4lam.github.io/catalog-chat/

Jeremy Nelson
Co.Chair of metadata working group
Focus is on AI
Experimenting with ChatGPT
Talk from AI4LAM: (Libraries, Archives, Museums) Conference in Vancouver
Using Py-Script
runs within the webbrowser / a static web site / all interaction happens in the webbrowser
https://ai4lam.github.io/catalog-chat/
https://github.com/AI4LAM/catalog-chat
ChatGPT does a statistical guess (best match) based on your prompted data
ChatGPT takes a MARC record and converts it to a FOLIO instance JSON record.
"Prompt Engineering" - pre-ceding the prompt
A workable proof-of-concept application for FOLIO.
Go from BIBFRAME to MARC, than get it into FOLIO; work in Sinopia.
But we go directly from BIBFRAME to FOLIO.
BIBFRAME doesn't really have an author. But it did produce some valid RDF.
RDF graph to jsonld.
Tod Olson 16:23 :
Would that have been valid BF at the time training data was gathered? or did ChatGPT hallucinate the bf:author?
Jeremy: It's valid RDF, but not valid BIBFRAME. It's a reasonable guess.
3 Workflows. Some functionality which is not available through the web interface of ChatGPT.
Using the folio demo site https://folio-nolana.dev.folio.org to use this. Need ChatGPT API Key.
ChatGPT "Temperature" from 0 to 2. At zero, the prediction is not so good.
System message: "You are an expert cataloger, return any records as FOLIO JSON". Then provide Additional Context (type in metadata of a book).
You can see what is going on in your web browsers analytic tools
Jeremy Nelson 16:36
https://folio-nolana.dev.folio.org/inventory/view/52d2f5b4-dcf2-4534-9e74-aac202b0a6e5
Ian Walls an Alle 16:40
could you automatically prompt ChatGPT with a sample of records from your own FOLIO system? That is, select X records, either at random, based on a CQL query, or by UUID, and prompt with those before the interaction begins?
Root, Jason M 16:43
“Generative” ;)

Jeremy: It's not a lot of code
Tod Olson 16:48
Thank you for this talk, interesting work and will like to hear how this continues to develop and what you learn.

Jeremy: Using RAG (this technique) reduces the amount of hallucination. RAG = Retrieval Automated Generation
There is a Python command line, I can run Python code directly in my web browser.
Its an open source project. Please add issues.

Uwe Reh: The approach to do conversion sounds horrible. It is not semantic, but syntax based.
Catalogue metadata often contain authority records, which are not directly found in the source of your data.
If you have a scan of your frontpage, it does not contain authority records. That would be a clearer use case.
Uwe Reh: How could it be possible to integrate a system which is not a vector database into the GPT ecosystem.
FOLIO is not vector based. World cat would be.
Jeremy: LCSH subject headings is a use case that we have been looking at.
Uwe Reh: Most library systems are not vector based.
Ian Walls 17:01
or, better, we in the library work together to provide an open alternative to OCLC, and make the vector representation one of the features
Uwe: I am doing Research in Lucene Search Engine; Elasticsearch & Solr
ChatGPT uses Vector databases ; data is tokenized and then converted to integers or floats; uses 3D space. Then it makes a best match. This is how ChatGPT works.





--
5Topics for next meetings


Action items

  • Type your task here, using "@" to assign to a user and "//" to select a due date