Fulltext Search (UXPROD-1045)

[UXPROD-1135] Locale-driven search Created: 25/Sep/18  Updated: 16/Sep/20

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: None
Parent: Fulltext Search

Type: New Feature Priority: P3
Reporter: Jakub Skoczen Assignee: Jakub Skoczen
Resolution: Unresolved Votes: 0
Labels: NFR, suppress-from-capplan
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to UXPROD-745 Tenant Sort Order Setting Open
Epic Link: Fulltext Search
Back End Estimate: XL < 15 days
Back End Estimator: Jakub Skoczen
Development Team: Prokopovych
Rank: 5Colleges (Full Jul 2021): R4
Rank: FLO (MVP Sum 2020): R5
Rank: GBV (MVP Sum 2020): R4
Rank: Lehigh (MVP Summer 2020): R4
Rank: U of AL (MVP Oct 2020): R4

 Description   

Companion item to UXPROD-745 Open which is about locale-drive sorting. This item will focus on ensuring locale-specific dictionary configuration is used for searching. It will also include work to ensure document indexing and matching rules respect locale-specific needs.



 Comments   
Comment by Jakub Skoczen [ 15/Oct/18 ]

Shifting this to Q1 2019 due to lower priority from Chalmers.

Comment by Massoud Alshareef [ 29/Nov/19 ]

Adding to Search Enhancements / Advanced Search conversation.

This is a very critical topic for the Arabic searching and retrieval capabilities in FOLIO. Arabic searching requires stripping off prefixes and suffixes letters from Arabic words before they get indexed, because those letters are connected to the word! For example, in English you write "The Book" where "The" and "Book" are two separate words, and when indexed only the "Book" is inserted in the index table and "The" is likely to be kept out since it is a stop word. In Arabic, "The Book" is written like this "Alkitab الكتاب" which is compromised of two words: Al and Kitab, representing The and Book in English. This means that there should be some mechanism to separate the "Kitab" word from its prefixing and/or suffixing letters first, so that only the core word gets indexed. There are more than ten prefixes like "Al" that can prefix an Arabic word, and sometimes they can go as many as three prefixes proceeding a word. The same thing can be said about suffixes which trail Arabic words, which also means they must be stripped off the Arabic word before they are indexed, so that when an Arabic word(s) is searched for, it will appear in the search results in all of its forms, proceeded and/or trailed by all prefixes or suffixes letters applicable to that word.

Based on our 20 years of experience with ILSs, such as Unicorn/Symphony and Koha, searching for Arabic words and phrases require a full text and retrieval engine (FTR) for these Arabic words to be handled properly. FTRs such as BRS (Unicorn/Symphony) or Solr (Sierra and VuFind) or Zebra (Koha) and now ElasticSearch (Koha), are popular in the library ILS business for long time. In fact, the only new addition in Sierra to differentiate it from Millennium is the addition of Apache's Solr via Encore. Therefore, all new versions of the global ILSs are building their search and retrieval capabilities on FTR engines.

FTRs provide very easy way to define prefixes and suffixes letters, and infixes letters if needed, for a language using the Schema approach. We have been doing this with Solr and Elastic FTR for many years now, supporting DSpace IR, VuFind Federated Search and Discovery, and Koha ILS. To my knowledge, EBSCO EDS unique support of Arabic searching and retrieval is also due to the use of Apache's Solr. We look forward to see Solr and/or ElasticSearch readily integratable with FOLIO core indexing as an optional feature for libraries to use.

Massoud.

Generated at Fri Feb 09 00:13:12 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.