CallNumber Browse Refactoring

Summary

Current situation or problem: In order to continue to build on call number browse functionality (including browsing by type and browsing by instance classification), we need to refactor the current implementation.

Business expectations

  • Easily navigate through large datasets.

    • For example, when the call number shares the first 10 characters of the shelving order.

  • Address preceding and succeeding navigation, especially with large datasets.

    • Leaving the first and last pages handling exact and non-exact match

  • Address effective location facet issues

  • Address type-specific browsing issues (i.e. sorting and finding exact matches)

Technical expectations

  • Streamline code. Significantly decrease complexity of code to make it much easier to implement new enhancements.

Requirements

Functional Requirements

Call number browse requirements overview - DRAFT

Non-functional Requirements

  • Configurability - The solution should allow disabling/enabling indexing by a feature flag.

  • Maintainability - The solution should allow changes for different call number types, searching for prefixes/suffixes, etc.

  • Performance - The solution should not impact reindexing time significantly.

[TBD: Create a NFR Page]

Assumptions

Baseline Architecture

https://github.com/folio-org/mod-search/blob/master/doc/browsing.md#call-number-browsing

Target Architecture

Summary

The solution is based on a new reindexing approach proposed in Reindex Improvements. The aspects of the proposed solution points

  • In mod-search PostgreSQL DB

    • create tables for call numbers

      • The following fields should be present in the table callnumber table:

        • callnumber_id

        • effective_callnumber_components - set of components for a call number

          • callnumber

          • prefix

          • suffix

          • callnumber_type_id

        • volume

        • enumeration

        • chronology

        • copynumber

      • The following fields should be present in the table callnumber_instances table:

        • callnumber_id

        • item_id

        • instance_id

        • shared

        • tenant_id

        • location_id

    • on create/update/delete events for items create a new procedure that would extract call numbers from items

  • Adjust the Reindexing procedure and Ongoing domain events consuming for items

  • Create a separate index for call numbers

  • Refactor browse queries to use search_after search_before queries

  • The titles for the browse option can be queried on the fly either from the instances table or the instances search index


Indexing Sequence Diagram

As per the current approach for reindexing the indexing of call numbers is split into two main phases: merge and upload. The merge phase is already present in the reindexing procedure.

For performance purposes, the extraction of call numbers (step 16) should happen on the database side. The current approach uses batch inserts to insert items into the table. It is proposed to create a new PL/pgSQL procedure to extract call numbers. The next section describes the details of the mentioned procedure.

Extract Call Numbers Activity Diagram

The diagram below describes the procedure that should be created for inserting items and extracting call numbers in the PostgreSQL database. The procedure should be used instead of bulk inserts. Key aspects:

  • The flag whether to extract and store call numbers should be stored in the database

  • The arrays of records similar to the table structure should be created inside of the procedure to hold call numbers and call number instances

  • Insert from the arrays into the main tables should be ordered to avoid deadlocks on main table indices

Browsing Sequence Diagram

The browsing call numbers follow the approach used for the classification browse feature (Browse Instance classification numbers - Phase 1 POC).

Holding-level Call Numbers

Problem statement

Some libraries do not create items for holdings, and they need to be able to browse the call number on the holdings record. The solution should provide the capability to browse call numbers in the following situations:

  1. A library has only holdings related to instances

  2. A library has both holdings and items related to instances

  3. A library has holdings, but some holdings do not have items

Solution Options

Option

Description

Pros & Cons

Option

Description

Pros & Cons

Dedicated holdings/items callnumber search indexes

Holdings call number browse and item call number browse are separate features and can enabled/disabled through configuration flags per tenant

Pros:

  • Configurability

  • Minimal Impact on performance

Cons:

  • Does not address case 3

One search index for all callnumbers

If an instance has items, then only item call numbers are filled. If an instance has holdings but not items, the index is filled with callnumbers from holdings

Pros:

  • Covers case 3

Cons:

  • Significant impact on reindex performance

Addressing the holding-level call number browsing

The proposed solution should reuse an approach similar to item-level indexing and use the same call number-related tables. Currently on the mod-inventory-storage side items that have no call number, inherit the call numbers from holdings, hence there is no need to insert them in the call number table. This requires to index items before holdings on the merge stage.

Risks