Jeremy Nelson Co.Chair of metadata working group Focus is on AI Experimenting with ChatGPT Talk from AI4LAM: (Libraries, Archives, Museums) Conference in Vancouver Using Py-Script runs within the webbrowser / a static web site / all interaction happens in the webbrowser https://ai4lam.github.io/catalog-chat/ https://github.com/AI4LAM/catalog-chat ChatGPT does a statistical guess (best match) based on your prompted data ChatGPT takes a MARC record and converts it to a FOLIO instance JSON record. "Prompt Engineering" - pre-ceding the prompt A workable proof-of-concept application for FOLIO. Go from BIBFRAME to MARC, than get it into FOLIO; work in Sinopia. But we go directly from BIBFRAME to FOLIO. BIBFRAME doesn't really have an author. But it did produce some valid RDF. RDF graph to jsonld. Tod Olson 16:23 : Would that have been valid BF at the time training data was gathered? or did ChatGPT hallucinate the bf:author? Jeremy: It's valid RDF, but not valid BIBFRAME. It's a reasonable guess. 3 Workflows. Some functionality which is not available through the web interface of ChatGPT. Using the folio demo site https://folio-nolana.dev.folio.org to use this. Need ChatGPT API Key. ChatGPT "Temperature" from 0 to 2. At zero, the prediction is not so good. System message: "You are an expert cataloger, return any records as FOLIO JSON". Then provide Additional Context (type in metadata of a book). You can see what is going on in your web browsers analytic tools Jeremy Nelson 16:36 https://folio-nolana.dev.folio.org/inventory/view/52d2f5b4-dcf2-4534-9e74-aac202b0a6e5 Ian Walls an Alle 16:40 could you automatically prompt ChatGPT with a sample of records from your own FOLIO system? That is, select X records, either at random, based on a CQL query, or by UUID, and prompt with those before the interaction begins? Root, Jason M 16:43 “Generative” ;)
Jeremy: It's not a lot of code Tod Olson 16:48 Thank you for this talk, interesting work and will like to hear how this continues to develop and what you learn.
Jeremy: Using RAG (this technique) reduces the amount of hallucination. RAG = Retrieval Automated Generation There is a Python command line, I can run Python code directly in my web browser. Its an open source project. Please add issues.
Uwe Reh: The approach to do conversion sounds horrible. It is not semantic, but syntax based. Catalogue metadata often contain authority records, which are not directly found in the source of your data. If you have a scan of your frontpage, it does not contain authority records. That would be a clearer use case. Uwe Reh: How could it be possible to integrate a system which is not a vector database into the GPT ecosystem. FOLIO is not vector based. World cat would be. Jeremy: LCSH subject headings is a use case that we have been looking at. Uwe Reh: Most library systems are not vector based. Ian Walls 17:01 or, better, we in the library work together to provide an open alternative to OCLC, and make the vector representation one of the features Uwe: I am doing Research in Lucene Search Engine; Elasticsearch & Solr ChatGPT uses Vector databases ; data is tokenized and then converted to integers or floats; uses 3D space. Then it makes a best match. This is how ChatGPT works.