[UXPROD-1045] Fulltext Search Created: 28/Aug/18 Updated: 13/Apr/23 Resolved: 27/Jun/19 |
|
| Status: | Closed |
| Project: | UX Product |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Epic | Priority: | P3 |
| Reporter: | Jakub Skoczen | Assignee: | Jakub Skoczen |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | NFR, suppress-from-capplan | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||
| Epic Name: | Fulltext Search | ||||||||||||||||||||||||||||
| Back End Estimate: | XXL < 30 days | ||||||||||||||||||||||||||||
| Estimation Notes and Assumptions: | Includes only backend estimate for remaining work. Front-end estimates are within specific epics. | ||||||||||||||||||||||||||||
| Start date (migrated): | |||||||||||||||||||||||||||||
| End date: | |||||||||||||||||||||||||||||
| Epic Color: | ghx-label-1 | ||||||||||||||||||||||||||||
| Description |
|
Integrated solution to provide advanced full text search and sort abilities across base apps, including:
The solution will also provide a foundation for potential future features:
|
| Comments |
| Comment by Massoud Alshareef [ 13/Feb/19 ] |
|
Providing the FullText Search feature in FOLIO is a very critical issue for the Arabic searching and retrieval capabilities. Arabic searching requires stripping off prefixes and suffixes letters from Arabic words before they get indexed, because those letters are connected to the word! For example, in English you write "The Book" where "The" and "Book" are two separate words, and when indexed only the word "Book" is embedded into the index table, where the word "The" is likely to be kept out of the index since it is a stop word. In Arabic, "The Book" is written like this "Alkitab الكتاب" which is compromised of two words adjoined in word word: Al and Kitab, representing The and Book in English. This means that there should be some mechanism to separate the "Kitab" word from its prefixed letter(s) and/or suffixing letter(s) first, so that only the core word gets indexed. There are more than ten prefixes like "Al" that can prefix an Arabic word, and sometimes they can go as many as three prefixes proceeding the same word. The same thing can be said about suffixes letters trailing Arabic words, which means they must be stripped off the Arabic word before they are indexed, so that when an Arabic word(s) is searched for, it will appear in the search results in all of its forms, proceeded and/or trailed by all prefixes or suffixes letters applicable to that word. Based on our 20+ years of experience with ILSs, such as Unicorn/Symphony and Koha, searching for Arabic words and phrases require a full text and retrieval engine (FTR) for these Arabic words to be handled properly. FTRs such as BRS (Unicorn/Symphony) or Solr (Sierra and VuFind) or Zebra (Koha) and now ElasticSearch (Koha), are popular in the library ILS business for long times. In fact, the only new addition in Sierra to differentiate it from Millennium is the addition Apache Solr via Encore, but they like to call it an LSP!!! Therefore, almost all global ILSs are building their search and retrieval capabilities on FTR engines. FTRs provide very easy way to define prefixes and suffixes letters, and infixes letters if needed, to support words stemming for a language using the FTR Schema model. We have been doing this with Solr and Elastic FTR for many years now, supporting DSpace IR, VuFind Federated Search and Discovery, and Koha ILS. To my knowledge, EBSCO EDS unique support of Arabic searching and retrieval is also due to the use of Apache Solr. We hope to see Solr and/or ElasticSearch readily integratable with FOLIO core indexing as an optional feature for libraries to use. All the best, |
| Comment by Jakub Skoczen [ 27/Jun/19 ] |
|
Closing this Epic as future items of this kind will be grouped in a wide "Platform and DevOps" epic. |