ARCH-35 Technical approach for Browsing by call number type functionality

Overview

(Approach addresses feature UXPROD-3569 - Getting issue details... STATUS )

It's required to implement items filtering by Call Number Types (LC, NLM, Dewey Decimal, SuDoc and Local Call Numbers) represented as aggregations (facets). Currently Browse functionality is implemented Browsing by Call Number with one available facet Effective location (item).

Current UI/UX

Expected UI/UX

(NOTE: in the recent requirements there's no need to build aggregation view, instead needs to proceed with a dropdown list. See [UIIN-2358] Add new browse options to limit browse by call number type - FOLIO Issue Tracker )

Context on Call Number Types - Technical Designs and Decisions - FOLIO Wiki

Scope

Back-End implementation (considering OpenSearch (mod-search) or (mod-inventory-storage) Postgres schema change, with corresponding amendments on mod-quick-marc)

Front-End representation ui-quick-marc

Potential performance impact analysis

Current state

Browsing is done by call number, refined by effective location facet filtering. Extra capabilities of filtering by call number type have not been implemented yet. 

Nested objects mappings haven't been introduced in any of existing indexes (implies implementation from scratch).

Another concern is on generation of shelving order, as browsing functionality is performed on the first 10 characters from the whole call number that can causes inconsistent search results.

 Click here to expand...

Least efforts approach

To add facet abilities on call number types with codebase changes on UI and BE.

  • Back-end:
    • mod-search: add extra mapping for nested objects relationship on instance → items → call number entry
    • mod-search: API for nested aggregations and queries with corresponding CQL query change (to distinct handling strategy as for nested elements)
  • UI:
    • ui-inventory: Add UI elements facets (drop-down items are considered as another option) for browsing by Call Number Types  

Example of request response:

Facet request callnumber
// response on facet search request with value "callNumberType"
//    /search/instances/facets?query=...&facet=callNumberType
    {
        "facets": {
            "callNumberTypes": {
                "values": [
                    {
                        "id": "NLM",
                        "totalRecords": 1500
                    },
                    {
                        "id": "Local Call Number",
                        "totalRecords": 579
                    },
                    {
                        "id": "Dewey Decimal",
                        "totalRecords": 789
                    }
                ],
                "totalRecords": 2868
            }
        }
    }

Technical impediments / implications / concerns

1) There are concerns on introducing another mapping (nested) for items (ones belong to holdings and holdings itself belong to instance) in terms of potential performance impact on indexing and querying by nested query as Elasticsearch implicitly creates separate documents.

  • Prototype of probation nested objects mapping and querying with measurements of performance, space usage. (link)

2) Used marc4j.jar (v2.9.2) library implements only two kinds of Call Number Types: LC and Dewey. No SuDoc, NLM and others specified. As a solution within ItemEffectiveShelvingOrderProcessor#getFieldValue class getValidShelfKey could be generated separately

3) It's possible to encounter multiple call number types per instance: when there are multiple holdings with items that have different call number types.

4) CQL syntax processing does not support distinction in nested queries or not. To be clarified users expectations.

5) Local call number types (with source "local") are not supposed to be distinguished among each other during indexing and filling a new field typedCallNumber.

Solutions

Solution 1: Application-side join

Steps to describe this approach on trying to Browse by call number with type specified:

  1. Check amount of instances by call number type
  2. Get matched instances by call number
  3. Filter by call number type on application side
  4. In case if after filtering amount of left instances is less than expected and less than total amount, continue search in two directions: backward and forward from anchor with corresponding offsets, filtering by call number type. 

Short descriptionComments
Pros
  1. No need to reimplement indexing, mapping
  2. No impact on other search / browsing functionality
  3. No complex queries to Elasticsearch
  1. Initial query is done by call number that is a regular query to mod-search (Elasticsearch implied)
  2. Due to index retains the same there is no impact on other functionality tied to mod-search interaction
  3. No nested or other types of queries introduced as its regular query with post handling on application side
Cons
  1. Potential excessive queries
  2. Time consuming operation in some cases
  3. Could not return valid results eventually
  1. In case if found lots of records that exceed paging limit needs to filter page-by-page and re-query data again from Elasticsearch to get the next portion of records and filter ones by call number type
  2. In case of millions records indexed and resource limits (CPU, RAM) it can perform slowly due to necessity to handle all records found by call number to filter by call number type
  3. In some cases process can take a while (in comparison to nested queries for example) and paging results could be complicated (needs to store all filtered instances in memory or repeat queries and handling for the next pages)

Cons to be analyzed more.

Solution 2: Nested objects, nested queries and nested aggregation

In case of nested objects introduction there will be possibility to browse by both datasets (call numbers and call number types) as ones will belong to implicitly indexed document (e.g. item that is nested to holding and correspondingly holding is nested to instance - or similar mappings to be considered if required). Nested Objects concept explained - Technical Designs and Decisions - FOLIO Wiki


Short descriptionComment
Pros
  1. Quick and effective search by nested queries and nested aggregations
  2. No need to handle found results and extra filtering afterwards once records found
  1. As indexing is done in another way that implies separate documents for nested objects querying of ones is performed effectively
  2. Results of found instances are sufficient in case of regular nested query or nested aggregation
Cons
  1. Direct impact on another search / browse functionality due to indexing / mapping reimplementation
  2. Complexity of queries increased due to nested queries introduction
  3. More space for index allocation
  4. Potentially slower indexing 
  1. Introducing of nested structure for inner objects within instance (like items and holdings) will lead to non-working queries for the whole search functionality of instance, holdings, items. Needs to do refinements in implementation of ones
  2. Queries could be shown in examples or in Elasticsearch docs
  3. Nested structure will take more space as implicitly nested objects are separate documents. Actual measurements to be provided
  4. Assumption needs to be assured by actual performance testing
Trade-offcould be additional mapping instead of substitution to keep back-compatibility for all potentially affected searches there could be added another sibling mapping for items like (items_nested) with corresponding nested mapping(s) for the entry
effectiveCallNumberComponents

Actual impact on indexing performance (and ways to speed up one) to be analyzed more, however according to what's probated within the prototype running - impact on indexing performance is far from critical.

Implementation details (Solution 2)

ui-inventory



For ui-inventory module the next changes are implied:

  • altering request(s) for browsing (by nested queries) with optional call number type parameter inclusion. Particularly request semantic implies to stay the same, except extra parameter adding for refined request.
  • (facets requirements are considered as skipped UIIN-2358 - Getting issue details... STATUS ) For facet filtering nested bucket aggregation query is implied. There are two ways of how it could be performed: 
    • 1st is to count all instances grouped by call number types
    • 2nd is to break results of browsing by search term on groups of call number types (note, only results for current, previous and next pages are counted. If implement aggregation functionality needs to define if it's required to count in aggregations all instance records or just affine around browsing value).

Implied changes in instance index:

  • change items type to "nested" (current is object)
  • change effectiveCallNumberComponents type to "nested"

Alternative - separate index creation could be considered. See instance_subject as a reference.

 Click here to expand...

endpoints change

Browsing by Call number API

Changes in browsing endpoints imply handling additional parameter within cql-query request param:

  • for browsing functionality additional optional parameter to be handled within cql-query (callNumberType):
    e.g.: (callNumber > F or callNumber < F) and callNumberTypeId = "<UUID(NLM)>"

To support aggregation (facet) functionality by call number types required to add extra endpoints:

  • instances group count for all call number types /count/call-number-types/instances 

Performance measurements

Measurements are done on data set of 5M of records.

With regards to the results of Prototype, search nested queries are performed effectively (within less than 1.5 sec):

 Examples of nested queries

Solution 3: additional field in instance index for compound call number and call number type

As another option could be considered to proceed with built-in call number type in computed (long) call number.


short descriptioncomments
Pros
  1. Minimal impact on multiple scopes (requests, index reimplementation)
  2. Quick non-resource consumptive solution
  3. No impact on re-index speed
  4. Fast query
  1. As there's only adding one extra field (long), no need to adjust multiple already existing search and browse queries
  2. extra field in instance index mapping does not imply separate explicit or implicit (as with case of nested objects) index creation. Thus filling another field (long) will not cause extra significant resource consumption
  3. As advantage considered about zero impact on indexing speed as there are no new indexes (explicit or implicit) / documents created, just one additional (long) field
Cons
  1. Search term maximum characters will be 9 instead of 10 (current) 
  2. Overhead to keep same int value of order for the basic types
  3. Aggregation (count per each classification) could be done by as many requests as call number types or with using painless script
  1. 9 symbols will be due to last symbol implied to be allocated specifically for call number type
  2. Because call number type implies to be built-in it cannot be extracted by regular aggregation request, so needs to do several requests (e.g.: F >= 100...0 and F < 200...0 - to count only records within this range for LC classification - in case of LC type is mapped to value 1). In the recent requirements, there is no need to build aggregation functionality

Typed call number value generation view

Implementation details on additional typed call number field

Current implementation of computing call number is:

Algorithm decimal call number / shelving order
  public Long getCallNumberAsLong(String callNumber) {
    var normalizedCallNumber = normalizeValue(callNumber);
    var cleanCallNumber = normalizedCallNumber.substring(0, Math.min(MAX_CHARS, normalizedCallNumber.length()));
    long result = 0L;
    for (int i = 0; i < cleanCallNumber.length(); i++) {
      var characterValue = getIntValue(cleanCallNumber.charAt(i), 0);
      result += characterValue * (long) Math.pow(52, (double) MAX_CHARS - i);
    }
    return result;
  }

Call number types order

To include call number type in a higher order (the first cell - the first digit of decimal representation of call number) needs to decrease constant MAX_CHARS to 9 (currently 10) and enumerate call number types mapped to actual digit, e.g.:

LC = 1,

Dewey = 2,

NLM = 3,

SuDoc = 4,

Local Call Number Type = 0.  (any call number type with Source equal to local)

Other Scheme = 5

Special cases - Other Scheme and Local

Other schema - is also defined as a separate call number with source "basic" (read-only). Special clarification is required to disambiguate meaning of this call number type.

Initially browsing by "Other Scheme" was considered to browse by all call numbers types which sources are different to "local" or "basic".

But final decision is to introduce another basic call number type with value "Other Scheme" and index one accordingly with the relation to particular call number type setup.

It does not mean that other schema (browse option) comprises all call number types except those, which source is "local" or "basic"; but means browsing by all call numbers with built-in type "Other Schema".

Local Call Number Type - is aggregated type that comprises all created call numbers with source value "local". Typed call number value is indexed with the same value (mapped to "local" call number type order).

mod-inventory-storage call number types entries

(Note the foregoing types are implied to be mapped to the corresponding basic call number type ids within mod-inventory-storage.call_number_type table:

Then needs to include additional parameter to getCallNumberAsLong or just create a new method getCallNumberTypedAsLong(String callNumber, CallNumberType type).

During computation, result variable initially assigned to type.getValue() * (long) Math.pow(52, 10).

Maximum amount of possible call number types in this case is supposed to be 62 to not overwhelm long maximum value

NOTE: To facilitate browsing by all call numbers current call number long value is filled without changes. 


In inventory storage schema there is a call_number_type table that contains call number types in the next format:

{
  "id": "03dd64d0-5626-4ecd-8ece-4531e0069f35",
  "name": "Dewey Decimal classification",
  "source": "folio",
  "metadata": {
    "createdDate": "2023-02-28T16:19:14.812Z",
    "updatedDate": "2023-02-28T16:19:14.812Z",
    "createdByUserId": "8efbfcf6-c55e-4c6a-99b1-d2425db66bd3",
    "updatedByUserId": "8efbfcf6-c55e-4c6a-99b1-d2425db66bd3"
  }
}

Thus, basic call number types must be already part of initial data that are implied to be mapped during indexing as a reference data from item and also as a numeric type for built-in call number type solution.

Implied to filled extra field callNumberTypeId with the UUID for the basic call number types.

Call number types settings

In Inventory settings there are call number types specified with attribute Source which indicates if specific call number type is basic, local or has other name of source.

NOTE: call number types that have source "basic" must be read-only.

5 basic call number types must be always defined (basic): LC, Dewey, NLM, SuDoc. and "Other schema".

For browsing by all call numbers original callNumber decimal field is retained.

Example query of browsing by SuDoc call number type could be as follows:

query="(effectiveShelvingOrder >= F_anchor OR effectiveShelvingOrder < F_anchor) AND callNumberTypeId=<UUID>"

where callNumberTypeId is id from call_number_type table of inventory-storage schema.

F_anchor value is one that represents call number within a range of preselected call number type and is computed like:

F_anchor = typedCallNumber(callNumberStr) - where algorithm counts the first 9 (instead of 10 symbols).


Endpoints changes imply the same change as for Solution 2 and also one new endpoint to fetch all basic types - mapping of UUIDs, names and ones order value (0-5 for current types). Mind change to jsonb schema of inventory_storage.call_number_type: required to add new int field for order of basic call number types. See call number types order.

Data migration plan

To address existing data migration issue needs to:

  1. Update existing mapping by adding a new long field typedCallNumber and string field callNumberTypeId. Elasticsearch API Update mapping API | Elasticsearch Guide [8.6] | Elastic could be utilized.
  2. Create defined five ordered basic call number types (with source equal to "basic") in inventory_storage.call_number_types table. Guarantee immutability for basic call number types
  3. Create altered algorithm to fill typedCallNumber and callNumberTypeId fields, considering value as UUID from mod-inventory-storage that matches to specific basic call number type AND embed as a part of DI process
  4. Create method for partial reindex fields filling for exactly newly created fields or consider full re-indexing with corresponding filling of new fields (typedCallNumber and callNumberTyped)
  5. Ensure generation of original shelving order (long call number) is present to leave functionality of browsing by all call numbers without change.

LOE

Solution 2

scopeunitsimplementation detailsestimate
BE

mod-search

mod-inventory-storage - (to be skipped for Sol 2)

mod-inventory - (to be skipped for Sol 2)

  1. mod-search 
    1. spike for nested objs prototype.
    2. changing of mapping for items and effectiveCallnumberComponent to type "nested"
    3. changing codebase for processing nested queries
  1. total 3-4 sprints
    1. 1 sprint
    2. 1 sprint
    3. 1-2 sprints
FEui-inventory

in case of nested objects mapping implementation, change requests accordingly to nested queries (range queries by shelving order and call number types)

2 sprints
Performance

Tests on data sets size of 1-10M items on:

1) forming facets by distinct call number types

2) measurements of speed load aggregated data


1 sprint

Solution 3

scopeunitsimplementation detailsestimate
BE

mod-search and mod-inventory-storage

  1. mod-search and mod-inventory-storage 
    1. add new fields typedCallNumber and callNumberTypeId to index of instance
    2. filling typedCallNumber during DI process or items / holding creation, alternation
    3. introduce "basic" source for call number types within schema mod_inventory_storage table call_number_type and fill with initial basic entries
    4. migration of existing data

in total 3-4 sprints

    1. 1 sprint
    2. 1 sprint
    3. 2 sprint
    4. 1-2 sprints
FEui-inventory

add new dropdown list element in Browse tab to browse by call number type

adjust browse query to pass extra parameter that accounts call number type. Could be done in two ways:

  • do basic call number types read-only (in inventory settings) 
  • compute and typed call number on FE side for search term of 9 symbols
  • pass two params - search term and call number type and Back-End (recommended as more comprehensive and avoids miss-mapping for FE and BE)
2 sprints

Rationale

There are two solutions recommended to weigh: Solution 2 or 3.

Solution 2 ("nested objects approach") is beneficial in other bunch of valuable search issues and performance of reindex and queries is not significantly affected. However more scopes affected in comparison to Sol. 3.

Solution 3 is more concise and less impactful with agreed trade-offs, thus consider Solution 3 as the most preferrable.

ARCH-35 - Getting issue details... STATUS

[UXPROD-3569] Browse call numbers by their type - FOLIO Issue Tracker

Browse call numbers by type - Metadata Management - FOLIO Wiki

Browse Use Cases - Metadata Management - FOLIO Wiki

2022-02-17 Metadata Management Meeting notes - Metadata Management - FOLIO Wiki

ARCH-35: Browse call number by type questions - Spitfire Development Team - FOLIO Wiki

https://github.com/folio-org/mod-search/blob/master/doc/browsing.md#call-number-browsing

MSEARCH-301 Browsing by LC, DDC and Other type numbers - Folio Development Teams - FOLIO Wiki

Call Numbers Browse

ARCH-35: Browse call number by type questions - Spitfire Development Team - FOLIO Wiki