[UXPROD-1045] Fulltext Search Created: 28/Aug/18  Updated: 13/Apr/23  Resolved: 27/Jun/19

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: None

Type: Epic Priority: P3
Reporter: Jakub Skoczen Assignee: Jakub Skoczen
Resolution: Duplicate Votes: 1
Labels: NFR, suppress-from-capplan
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by FOLIO-1246 Implement Postgres Full Text Search f... Closed
Relates
relates to UXPROD-745 Tenant Sort Order Setting Open
relates to UXPROD-746 Performance Closed
relates to UXPROD-869 Advanced Search (within apps) Closed
relates to UXPROD-1015 Boolean/Query Search for Users Draft
Epic Name: Fulltext Search
Back End Estimate: XXL < 30 days
Estimation Notes and Assumptions: Includes only backend estimate for remaining work. Front-end estimates are within specific epics.
Start date (migrated):
End date:
Epic Color: ghx-label-1

 Description   

Integrated solution to provide advanced full text search and sort abilities across base apps, including:

  • High performance on large result sets (millions of records)
  • Language/locale specific stemming and collation
  • Efficient filters and facets
  • Boolean search support

The solution will also provide a foundation for potential future features:

  • Relevancy ranking
  • Spellchecks/synonyms/didyoumean


 Comments   
Comment by Massoud Alshareef [ 13/Feb/19 ]

Providing the FullText Search feature in FOLIO is a very critical issue for the Arabic searching and retrieval capabilities. Arabic searching requires stripping off prefixes and suffixes letters from Arabic words before they get indexed, because those letters are connected to the word! For example, in English you write "The Book" where "The" and "Book" are two separate words, and when indexed only the word "Book" is embedded into the index table, where the word "The" is likely to be kept out of the index since it is a stop word. In Arabic, "The Book" is written like this "Alkitab الكتاب" which is compromised of two words adjoined in word word: Al and Kitab, representing The and Book in English. This means that there should be some mechanism to separate the "Kitab" word from its prefixed letter(s) and/or suffixing letter(s) first, so that only the core word gets indexed. There are more than ten prefixes like "Al" that can prefix an Arabic word, and sometimes they can go as many as three prefixes proceeding the same word. The same thing can be said about suffixes letters trailing Arabic words, which means they must be stripped off the Arabic word before they are indexed, so that when an Arabic word(s) is searched for, it will appear in the search results in all of its forms, proceeded and/or trailed by all prefixes or suffixes letters applicable to that word.

Based on our 20+ years of experience with ILSs, such as Unicorn/Symphony and Koha, searching for Arabic words and phrases require a full text and retrieval engine (FTR) for these Arabic words to be handled properly. FTRs such as BRS (Unicorn/Symphony) or Solr (Sierra and VuFind) or Zebra (Koha) and now ElasticSearch (Koha), are popular in the library ILS business for long times. In fact, the only new addition in Sierra to differentiate it from Millennium is the addition Apache Solr via Encore, but they like to call it an LSP!!! Therefore, almost all global ILSs are building their search and retrieval capabilities on FTR engines.

FTRs provide very easy way to define prefixes and suffixes letters, and infixes letters if needed, to support words stemming for a language using the FTR Schema model. We have been doing this with Solr and Elastic FTR for many years now, supporting DSpace IR, VuFind Federated Search and Discovery, and Koha ILS. To my knowledge, EBSCO EDS unique support of Arabic searching and retrieval is also due to the use of Apache Solr.

We hope to see Solr and/or ElasticSearch readily integratable with FOLIO core indexing as an optional feature for libraries to use.

All the best,
Massoud AlShareef
KnowledgeWare Technologies Est.

Comment by Jakub Skoczen [ 27/Jun/19 ]

Closing this Epic as future items of this kind will be grouped in a wide "Platform and DevOps" epic.

Generated at Fri Feb 09 00:12:30 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.