Skip to end of banner
Go to start of banner

Inventory Search - POC Overview

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Scope

To address the poor performance of the inventory search, the Technical Council recommended to investigate a possible Elasticsearch implementation.   The scope of the work was determined to include following deliverables for the Iris release: 

Back-end:

  • Sending update notification messages from inventory and source record storage (SRS)
  • Providing Inventory and SRS APIs for fetching view for indexing by ids
  • Extract common library for using it in other modules
  • Build infrastructure necessary to support Elasticsearch

Front-end:

  • Update Stripes Components to support new Search API
  • Provide “switch” to allow using Elasticsearch or existing search

Infrastructure:

  • Add Elasticsearch cluster to CI/CD and setup it on environments (k8s conf)
  • Check configuration of existing Kafka cluster

In December 2020, it was determined that Elasticsearch is not the right tool for querying  MARC records.   SRS search has been removed from the scope of Elasticsearch POC and it became a separate feature (UXPROD-2791)

Delivered functionality

The work delivered by Falcon team as a part of the Iris release includes:

Back-end:

  • Sending add/update/delete notification messages from Inventory
  • Built Search APIs for searching and faceting
  • Combined instances + holding + items into a single index
  • Implemented re-index process for existing inventory DB
  • Spring base implementation that supports:
    • Up to five language-specific analyzers configured on the tenant level
    • Near real-time inserts, updates and deletions
    • Boolean operators (AND, OR, NOT)
    • Nested search using brackets
    • All or Any keyword search
    • Exact phrase search
    • Left-, right-hand truncation, wildcards searches in some fields

 Front-end:

Due to the rigid structure of the existing Inventory's Search Component,  it was not possible to make any changes that would allow for switching between PostgreSQL and Elasticsearch implementation.    To provide users with an way to evaluate the back-end work, we built an alternative UI (Inventory ES app) that allowed non-technical users compare performance between the existing search and the search powered by Elasticsearch.  The new UI introduced:

  • New UI components for advanced search that include:
    • Auto-resized textbox,
    • Supported fields and operators auto-suggestion
    • Boolean operators support
    • Nested search using brackets
  • New UI components for filters and facets
  • Default results sort by ranking
  • Preserved other non-search related Inventory app functionality

Infrastructure:

  • Added Elasticsearch cluster to CI/CD and set it up on the reference environments
  • Updated existing Kafka cluster configuration
  • Introduced option of setting up performance testing environment in the community


As a result, the following search options and filters are supported:

Search options:

Instance

Holdings

Items

Keyword search (title, contributor, identifier)

Keyword search (title, contributor, identifier)

Keyword search (title, contributor, identifier)

Contributors

ISBN

Barcode

Title (all)

ISSN

ISBN

Identifiers (all)

Call Number

ISSN

ISBN

 Holdings HRID

Call Number

ISSN


Item HRID

Subject



Instance HRID



Instance UUID



Notes (public)*

Electronic access (all fields)



Filters and facets:

Effective location

Effective location (item)

Item Status

Language

Holdings permanent location

Effective location

Resource type

Suppress from discovery

Holdings permanent location

Format*

 Tags

Material type

Mode of issuance


Suppress from discovery

Nature of content


Tags

Staff suppress



Suppress from discovery



Date created (from, to)*



Date updated (from, to)*



Source



Tags



*Back-end only

POC evaluation results

The evaluation of the POC took place from April 5th to April 9th, 2021 and it was conducted in the the Bugfest environment (~8 millions records) by eight librarians representing:

  • Chicago University (2 participants)
  • Duke University
  • Missouri State University (2 participants)
  • Simmons University
  • EBSCO
  • Index Data

Almost entire evaluation was done trough UI and 75% of those who participate, found the POC successful.  Alll participants, however,  saw some room for improvements, in front- and  back-end . The team addressed following issues that were reported:

Issue

Solution

Noisy search results

Implemented searches supporting keyword “all” or “any” limiting the number of matches: MSEARCH-91

Expected results not found

All provided examples were related to the special characters in the Title that were searched using ASCII representation.  The problem will be addressed in scope of  MSEARCH-67

Bug in sorting by title

MSEARCH-99

Support phrase search

MSEARCH-92

Ranking refinement

Refinement of the default ranking system will require further analysis to be in the scope of a separate feature

Discrepancy in saving UUIDs from Action menu

MSEARCH-93 and UISEES-58

UI enhancements and bug fixes

UISEES-47, UISEES-57, UISEES-61, UISEES-62, UISEES-48, UISEES-49


Those evaluators who deemed the POC a failure, provided following reasons:

  • Expected to perform complex queries of multiple fields and across record types (including MARC fields)
  • Expected a different UI more like a catalog or discovery system advanced search
  • Expected support for additional operators (not equal to, starts with, etc.)
  • UI not user friendly
  • Preferred a simple left-anchored search than the provided relevancy ranking

Search performance comparison

querymod-inventory (postgres), s
mod-search (elsasticsearch), s

keyword all "April" sortby title&limit=100&offset=0

437408141268

*keyword all "April" sortby title&limit=100&offset=1001

537408141268
keyword all "agency" and source=FOLIO sortby title&limit=100&offset=0
3.510000.83536
keyword all "bill" sortby title&limit=100&offset=05501490.660992
keyword all "set" sortby title&limit=100&offset=0 73070.8156751


For all Elasticsearch queries after calling a query the first time, the time for all subsequent queries is less than 250ms (due to elasticsearch OOB caching)
Average time

Examples are taken from PERF-44 - Getting issue details... STATUS

Proposed next steps

  • Redesign Inventory UI Search component so that it can include new UI components created by POC, especially filters and facets
  • Conduct usability study for advanced search textbox
  • Use mod-search endpoints for searching
  • Conduct analysis of ranking refinements (weights and boosts)
  • Conduct analysis of further search refinements
  • Define and prioritize work for cross app/cross record types searches
  • Define UI for cross app/cross record types searches
  • Define requirements for cross-tenant searches
  • No labels