MSEARCH-301 Browsing by LC, DDC and Other type numbers
The existing approach for browsing by LC numbers assumes that every effective shelf key can be converted to a numeric value. Introducing the Other type numbers makes the existing approach not applicable and other options must be used:
- Browsing by call-number using existing instance index with script sorting
- Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type
- Browsing by call-number using PostgreSQL index on call-number values
Initial dataset
Name | Value |
---|---|
Amount of instances | 2,500,000 instances with 5-7 items (effectiveShelvingOrder is randomly generated) |
Amount of call-numbers | ~7,500,000 |
Browsing by call-number using existing instance index with script sorting
This approach assumes that browsing can be done using the following query on the instance index:
{ "from": 0, "size": 10, "query": { "exists": { "field": "items.effectiveShelvingOrder" } }, "_source": [ "id", "items.effectiveShelvingOrder" ], "explain":true, "search_after": ["NA 22"], "sort": { "_script": { "type": "string", "script": { "source": "def fields = doc['items.effectiveShelvingOrder'];def a = Collections.binarySearch(fields, params['cn']); if (a > 0) return fields[a]; a = - a - 2; return fields[a < 0 ? 0 : a]", "params": { "cn": "NA 22" } } } } }
Items of instance must be ordered during indexing to make the script sorting a little bit faster.
Pros and cons of this approach:
Advantages | Disadvantages |
---|---|
Indexing speed is not affected | It is slow (~5 seconds per request, for browsing around 2 requests must be executed and first page won't be loaded earlier than 9-10 seconds) |
Storage size will increase slightly by adding a new field with keyword mappings | Result collapsing is not supported (it must be done manually, which means that the query must return more records than requested |
If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignored |
Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type
This approach assumes that browsing can be done using the same way as to subject browsing:
Elasticsearch query:
{ "from": 0, "size": 10, "query": { "match_all": {} }, "search_after": [ "NA 22" ], "sort": [ { "callNumber": "asc" } ] }
Advantages | Disadvantages |
---|---|
It is faster than 1st approach (the first page must be loaded in 3-4 seconds) | Elasticsearch storage size will require 10-15% more space to keep a dedicated index |
Close values in the same instance for effective shelf order will work correctly | If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignore |
No need for result collapsing, but additional term aggregation is required to retrieve resource counts | Indexing speed is affected twice or more (need more detailed research) |
It's complicated to add filters by location and type (it will require an optimistic lock to save every event) | |
To populate instance fields - need to run additional request to retrieve them |
Browsing by call-number using PostgreSQL index on call-number values
To be researched.