2023-07-20 Metadata Management Meeting notes

Date

~ 20-25

Recordings of meetings can be found in the Metadata_Management_SIG > Recordings folder on AWS from 2022 onwards: https://recordings.openlibraryfoundation.org/folio/metadata-management-sig/

Discussion items

Notetaker


Christie Thomas
Announcements


At August 10 meeting, Templates for Inventory Record discussion will be continued. 

Volunteers needed to help with MM adjacent documentation for the wiki. If you are able to contribute to maintaining the documentation for the docs.folio.org, reach out Laura E Daniels.

App Interaction SIG meetings to discuss enhancements with a FOLIO-wide impact. Check out the App Interaction wiki page or the App Interaction slack channel for updates about meeting times and when these items are scheduled to be discussed. 

Entity Management Working group - there will be an update coming to MM SIG soon. TBD. Entity Management work is wrapping and there is discussion about whether the group should be disbanded and oversight handed to the MM Sig. 

PC update

There will not be a PC meeting this week due to multiple vacations. We will reconvene on Thurs, 7/27!

Searching in Inventory (mod-search)

Several questions around Inventory searching using mod-search have come up on Slack and in Jira tickets. We will go over the different questions and try to find consensus of how search should work in these specific areas.

There is a slack channel for this discussion: #test-msearch-inventory.

MSEARCH-478 - Getting issue details... STATUS

Inventory. Title search (all). Support Phrase search.

Discussion of requirements. Consensus that they accurately represent the expectations for the behavior. More examples were provided.

MSEARCH-549 - Getting issue details... STATUS

Remove word stemming and fuzzy logic from title (all) searches - draft

Need to make sure that fuzzy logic or word stemming is removed for title (all) searches and reserved for keyword searches. Suggestion that maybe fuzzy logic could also be enabled in advanced search at the discretion of the user. Fuzzy logic works well for full text searching, but not with structured data. Or have the ability to select  to enable fuzzy logic or stemming via conventions like truncation in advanced or title (all) searches, but automatic stemming or fuzzy logic only in keyword searches. Ticket will go back in draft for more revision.

It was also noted that this is an issue for other fields, not just title (all). Subject was given as an example of where it is expected to search for an exact match. Identifier is expect to not have fuzzy logic or stemming at all. 

Question / concern. How can they solve keyword searching that includes identifier when that should be exact? Maybe it is okay for an identifier search in keyword to not be exact as long as a search of the identifier index explicitly is an exact search. 

MSEARCH-567 - Getting issue details... STATUS

Remove fuzzy logic and word-stemming from phrase CQL queries in Inventory

See notes from conversation above re: MSEARCH-549. 

MSEARCH-486 - Getting issue details... STATUS (Does anyone have specific examples of records where this is not working as expected?)

Keyword & identifier (all) search not returning results with the leading period

This may no longer be an issue. There will be systematic testing and the ticket will be closed if it is confirmed to no longer be a problem. 

MSEARCH-507 - Getting issue details... STATUS

Inventory. Holdings and Item > Search by call number, eye readable should be case insensitive

Consensus is that case insensitivity should be the default for all searching. 

MSEARCH-512 - Getting issue details... STATUS (This is a known issue in discovery systems as well, e. g.

Inventory. Search on contributor names results in irrelevant noise

What is expected? Is it a phrase search? 

The main issue is that the contributor name is a single string. Problematizes searching names via different orders of family name/personal name. 

Expectation that it is a Begins with search. 

There are 2 different use cases. One with the way contributor search now works and the second to have an exact search. The default right now is a contains any search, but the default behavior can be changed. 

What about relevancy ranking to make sure that the exact searches appear first? - Opinion that ranking by relevancy should be optional and another that this would not help - if you know what you are looking for, you just want that and not a keyword search - wants an exact match or a begins with. Comment that left anchored searching known titles. Example - If I search for "National Geographic" I want results that have "National Geographic" and not any thing that has national or geographic.

Comment in chat that "If I'm doing a tokenized search (not phrase), I would appreciate seeing titles that have term-A and term-B in the same contributor appear before titles where term-A is in one field and term-B is found in another continuator."

Highly complex topic - we can return to this topic in a future meeting. Along with a demo of what is coming in search for future releases. 

It was also suggested to ask about how previous systems have handled the issue. 


Better documentation of search behavior

Question on Slack

Former user (Deleted) 

There is search documentation on the Tipps and Tricks page: Search - using Elasticsearch (or OpenSearch)

More technical documentation is available in the mod-search README on GitHub: https://github.com/folio-org/mod-search#readme

Questions arose around:

- how to generate your own indexes
- a description of the way the FOLIO community uses OpenSearch
- the general description of how the OpenSearch software comes out of the box of the "3rd party"

-7/20 meeting question: Follow-up question: is mod-search a module on top of elastic search? - This question will also be addressed after the meeting. 

KG responses: 

Thread
test-msearch-inventory  1 day ago

While trying to figure out how to search in FOLIO Inventory, we perused the documentation regarding ElasticSearch.
We came to wonder who exactly is responsible for the way the different search keys "behave", because in the meeting notes of the Shanghai library FOLIO project from Nov 23rd, 2022, we found the following paragraph about the way ElasticSearch/OpenSearch is maintained in the FOLIO community:"The FOLIO project doesn’t take responsibility for the 3rd party tools. So it doesn’t distribute or deeply integrate with any of those tools. It does have the client which is used to facilitate the integration. But it doesn’t attempt to manage the indexes on the search engine. The tuning of the search engine, how you do your indexes, is not necessarily going to be managed by FOLIO. It’ s done externally. The responsibility of the search engine and the tools comes with it to let the hosting providers tune that as needed for different conditions."
(https://folio-org.atlassian.net/wiki/display/FOLIJET/Meeting+Notes?preview=%2F79464218%2F96415976%2FMeeting+Notes_11_23_2022.pdf)There is also documentation to be found on Github (https://github.com/folio-org/mod-search#api) about the Search API, but it isn't quite clear to us if this is:
- to be used for generating our own indexes
- a description of the way the FOLIO community uses OpenSearch
- the general description of how the OpenSearch software comes out of the box of the "3rd party"
We are also unsure what the role of the the FOLIO product owner for ElasticSearch is.
We know there have been changes in the indexing environment, so how does that fit in with the abovementioned statement, that FOLIO does not manage the indexes?Does anybody have answers for us? (edited) 
6 replies



  10 hours ago

@Rita Albrecht We're going to discuss multiple open tickets related to search today and I have added these documentation questions to the agenda as well: https://folio-org.atlassian.net/wiki/x/TD9H

  8 hours ago

@Felix Thanks, we appreciate that and hope for a fruitful discussion!

  15 minutes ago

September 2022: Technical council concluded that the license for Elastic Search was unacceptable to the FOLIO community. And that FOLIO would only support Open Search from now on. See Open Search FAQs for details - https://opensearch.org/faqOpenSearch is a fork of open source Elasticsearch 7.10. As such, it provides backwards REST APIs for ingest, search, and management. The query syntax and responses are also the same. In addition, OpenSearch can use indices from Elasticsearch versions 6.0 up to 7.10. We also aim to support the existing Elasticsearch clients that work with Elasticsearch 7.10.
Note that while the OpenSearch API is backwards compatible, some clients or tools may include code, such as version checks, that may cause the client or tool to not work with OpenSearch.

OpenSearchOpenSearch
Frequently Asked Questions
OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

  12 minutes ago

I am unaware of any functional differences between OpenSearch and ElasticSearch.

  12 minutes ago

Also, if a self-hosting organization is interested in implementing a different search capability they are able to do that. They can change the search engine and rewrite mod-search to their taste. At which point they have the ability to hack the indexes anyway they want.

  6 minutes ago

FOLIO's mod-search module creates and manages indexes. Spitfire is the team that owns this module and Christine and I are the POs for this team.  The README file provides a good overview of the Open Search functionality we use in FOLIO https://github.com/folio-org/mod-search/blob/master/README.md. If you feel there are areas that are unclear or need more details, please create a JIRA ticket and assign to the development team = Spitfire.

Notes from discussion:

Khalilah Gambrell For search, the team that manages mod-search is responsible for creating new indexes. Khalilah and Christine will work on the requirements and working with the team to decide on and implement changes. - NB: a fuller response will be provided after the meeting in writing via slack and the meeting notes. 

Is this all a topic for a future MM SIG meeting? MM Sig can decide after the responses are posted. 

Suggestion: How is truncation working and how can the wildcards be used? Maybe have a list of issues that need to be documented in GitHub since Tips and tricks are just for getting started. User documentation is still to come in the future. 

Khalilah suggests creating a wiki page with all these questions to start with, then use that to figure out if we want to address some topics at MM meetings. (like MARC+implementers) --Felix will follow up









MM Dashboard with Bulk Edit

Chat: