/
MSEARCH-301 Browsing by LC, DDC and Other type numbers

MSEARCH-301 Browsing by LC, DDC and Other type numbers

The existing approach for browsing by LC numbers assumes that every effective shelf key can be converted to a numeric value. Introducing the Other type numbers makes the existing approach not applicable and other options must be used:

  1. Browsing by call-number using existing instance index with script sorting
  2. Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type
  3. Browsing by call-number using PostgreSQL index on call-number values

Initial dataset

NameValue
Amount of instances2,500,000 instances with 5-7 items (effectiveShelvingOrder is randomly generated)
Amount of call-numbers~7,500,000

Browsing by call-number using existing instance index with script sorting

This approach assumes that browsing can be done using the following query on the instance index:

{
  "from": 0,
  "size": 10,
  "query": {
    "exists": {
      "field": "items.effectiveShelvingOrder"
    }
  },
  "_source": [
    "id",
    "items.effectiveShelvingOrder"
  ],
  "explain":true,
  "search_after": ["NA 22"],
  "sort": {
    "_script": {
      "type": "string",
      "script": {
        "source": "def fields = doc['items.effectiveShelvingOrder'];def a = Collections.binarySearch(fields, params['cn']); if (a > 0) return fields[a]; a = - a - 2; return fields[a < 0 ? 0 : a]",
        "params": {
          "cn": "NA 22"
        }        
      }
    }
  }
}

Items of instance must be ordered during indexing to make the script sorting a little bit faster.

Pros and cons of this approach:

AdvantagesDisadvantages
Indexing speed is not affected

It is slow (~5 seconds per request, for browsing around 2 requests must be executed and first page won't be loaded earlier than 9-10 seconds)

Storage size will increase slightly by adding a new field with keyword mappings

Result collapsing is not supported (it must be done manually, which means that the query must return more records than requested


If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignored

Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type

This approach assumes that browsing can be done using the same way as to subject browsing:

Elasticsearch query:

{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {}
  },
  "search_after": [
    "NA 22"
  ],
  "sort": [
    {
      "callNumber": "asc"
    }
  ]
}
AdvantagesDisadvantages
It is faster than 1st approach (the first page must be loaded in 3-4 seconds)

Elasticsearch storage size will require 10-15% more space to keep a dedicated index

Close values in the same instance for effective shelf order will work correctly

If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignore

No need for result collapsing, but additional term aggregation is required to retrieve resource countsIndexing speed is affected twice or more (need more detailed research)

It's complicated to add filters by location and type (it will require an optimistic lock to save every event)


To populate instance fields - need to run additional request to retrieve them

Browsing by call-number using PostgreSQL index on call-number values

To be researched.

Related content