Table of Contents |
---|
Scope
To address the poor performance of the inventory search, the In October 2020, Technical Council recommended to investigate a possible Elasticsearch implementation . The scope to address poor performance of the work inventory search. Scope was determined to include the following deliverables for the Iris release:
Back-end:
- Sending Send update notification messages from inventory and source record storage (SRS)
- Providing Provide Inventory and SRS APIs for fetching view for indexing by ids
- Extract common library for using it in other modules
- Build infrastructure necessary to support Elasticsearch
...
- Add Elasticsearch cluster to CI/CD and setup it on environments (k8s conf)
- Check configuration of existing Kafka cluster
In December 2020, source record storage (SRS) it was determined that Elasticsearch is not the right tool for querying MARC records. SRS search has been removed from the scope of Elasticsearch POC and it became has become a separate feature (UXPROD-2791).
Delivered functionality
The work Work delivered by the Falcon team as a part of the Iris release includes:
Back-end:
- Sending Implemented sending add/update/delete notification messages from Inventorymessage publication to Kafka in mod-inventory-storage
- Built Search APIs for searching and faceting
- Combined instances + holding + items into a single index
- Implemented re-index process for existing inventory DB
- Spring base -based implementation that supports:
- Up to five language-specific analyzers configured on the tenant level
- Near real-time inserts, updates and deletions
- Boolean operators (AND, OR, NOT)
- Nested search using brackets
- All or Any keyword search
- Exact phrase search
- Left- , and right-hand truncation, wildcards searches in some fields
Front-end:
Due to the rigid structure of the existing Inventorys app Inventory's Search Component, in order to be able to present existing Inventory functionality and be able to provide an UI for the work implemented in the back endit was not possible to make any changes that would allow for switching between PostgreSQL and Elasticsearch implementation. To provide users with an easy way to evaluate the back-end work, we built an alternative UI (Inventory ES app) that allowed non-technical users compare performance behavior between the existing search and the search powered by Elasticsearch. Inventory ES app The new UI introduced:
- New UI components for advanced search that include:
- autoAuto-resized textbox,
- supported Supported fields and operators auto-suggestion
- Boolean Boolean operators support
- Nested search using brackets
- New UI components for filters and facets
- Default results sort by ranking
- Preserved other non-search related Inventory app functionality
Infrastructure:
- Added Elasticsearch cluster to CI/CD and set it up on the reference environments
- Updated existing Kafka cluster configuration
- Introduced option of setting up performance testing environment in the community
...
Instance | Holdings | Items | |
Keyword search (title, contributor, identifier) | Keyword search (title, contributor, identifier) | Keyword search (title, contributor, identifier) | |
Contributors | ISBN | Barcode | |
Title (all) | ISSN | ISBN | |
Identifiers (all) | Call Number | ISSN | |
ISBN | Holdings HRIDMaterial type | Call Number | |
ISSNCall | NumberItem HRID | ||
Subject | Item HRID | ||
Instance HRID | |||
Instance UUID | |||
Notes (public)* | |||
Electronic access (all fields) |
...
Effective location | Effective location (item) | Item Status |
Language | Holdings permanent location | Effective location |
Resource type | Suppress from discovery | Holdings permanent location |
Format* | Tags | Material type |
Mode of issuance | Suppress from discovery | |
Nature of content | Tags | |
Staff suppress | ||
Suppress from discovery | ||
Date created (from, to)* | ||
Date updated (from, to)* | ||
Source | ||
Tags |
*Back-end only
POC evaluation results
The evaluation Evaluation of the POC took place from April 5th to April 9th, 2021 and it was conducted in the the Bugfest environment (~8 millions records) by eight librarians representing:
...
Almost entire evaluation was done trough UI and 75% of those who participateparticipated, found the POC successful but all participants saw the room for . All participants, however, suggested some improvements. The team addressed the following reported issues that were reported:
Issue | Solution |
Noisy search results | Implemented searches supporting keyword “all” or “any” limiting the number of matches: MSEARCH-91 |
Expected results not found | All provided examples were related to the special characters in the Title that were searched using ASCII representation. The problem will be addressed in scope of MSEARCH-67 |
Bug in sorting by title | |
Support phrase search | |
Ranking refinement | Refinement of the default ranking system will require further analysis to be in the scope of a separate feature |
Discrepancy in saving UUIDs from Action menu | MSEARCH-93 and UISEES-58 |
UI enhancements and bug fixes | UISEES-47, UISEES-57, UISEES-61, UISEES-62, UISEES-48, UISEES-49 |
Those Two evaluators who deemed determined that the POC a failure, provided did not meet their expectations and provided the following reasons:
- Expected to perform complex queries of multiple fields and across record types (including MARC fields)
- Expected a different UI more like a catalog or discovery system advanced search
- Expected support for additional operators (not equal to, starts with, etc.)
- UI not user friendly
- Preferred a simple left-anchored search than the provided relevancy ranking
Search performance comparison
The table below represents comparison of response time for the same query executed in Inventory app and Inventory ES app in the environment with 8 million of instances:
Query | mod-inventory ( |
---|
PostgresSQL), s | Results Count | mod-search ( |
---|
Elsasticsearch), s | Results Count | |||
---|---|---|---|---|
keyword all "April" sortby title&limit=100&offset=0 | 4 | 37408 | 1 | 41268 |
*keyword all "April" sortby title&limit=100&offset=1001 | 5 | 37408 | 1 | 41268 |
keyword |
all |
"agency" |
and |
source=FOLIO |
sortby |
title&limit=100&offset=0 | 3.5 | 1000 | 0.8 | 3536 |
keyword all "bill" sortby title&limit=100&offset=0 | 5 | 50149 | 0.6 | 60992 |
keyword all "set" sortby title&limit=100&offset=0 | 7 | 307 | 0.8 | 156751 |
For all Elasticsearch queries after calling the same query 3 timesa query the first time, the time for all subsequent quires queries is less than 250ms (due to Elasticsearch OOB caching)
Average time
First examples Examples are taken from Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-44
Proposed next steps
- Incorporate UI components created in scope of POC into Stripes components
- Redesign Inventory UI Search component so that it can include new UI components created by POC, especially filters and facets
- Conduct usability study for advanced search textbox
- Use mod-search endpoints for searching
- Conduct analysis of ranking refinements (weights and boosts)
- Conduct analysis of further search refinements
- Define and prioritize work for cross app/cross record types searches
- Define UI for cross app/cross record types searches
- Define requirements for cross-tenant searches
Additional links
- TC Meeting on October 14, 2020
- UXPROD-2791 SRS MARC Query API
- Search – Technical documentation
- POC Evaluation survey
- Detailed responses to raised issues
- POC Overview for MM SIG (slides) and (recording)
- 110 -111 Sprints Demo
- Supported search types