MSEARCH-301 Browsing by LC, DDC and Other type numbers
The existing approach for browsing by LC numbers assumes that every effective shelf key can be converted to a numeric value. Introducing the Other type numbers makes the existing approach not applicable and other options must be used:
Browsing by call-number using existing instance index with script sorting
Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type
Browsing by call-number using PostgreSQL index on call-number values
Initial dataset
Name | Value |
|---|---|
Amount of instances | 2,500,000 instances with 5-7 items (effectiveShelvingOrder is randomly generated) |
Amount of call-numbers | ~7,500,000 |
Browsing by call-number using existing instance index with script sorting
This approach assumes that browsing can be done using the following query on the instance index:
{
"from": 0,
"size": 10,
"query": {
"exists": {
"field": "items.effectiveShelvingOrder"
}
},
"_source": [
"id",
"items.effectiveShelvingOrder"
],
"explain":true,
"search_after": ["NA 22"],
"sort": {
"_script": {
"type": "string",
"script": {
"source": "def fields = doc['items.effectiveShelvingOrder'];def a = Collections.binarySearch(fields, params['cn']); if (a > 0) return fields[a]; a = - a - 2; return fields[a < 0 ? 0 : a]",
"params": {
"cn": "NA 22"
}
}
}
}
}Items of instance must be ordered during indexing to make the script sorting a little bit faster.
Pros and cons of this approach:
Advantages | Disadvantages |
|---|---|
Indexing speed is not affected | It is slow (~5 seconds per request, for browsing around 2 requests must be executed and first page won't be loaded earlier than 9-10 seconds) |
Storage size will increase slightly by adding a new field with keyword mappings | Result collapsing is not supported (it must be done manually, which means that the query must return more records than requested |
If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignored |
Browsing by call-numbers using the dedicated index and storing the effective location id and call-number type
This approach assumes that browsing can be done using the same way as to subject browsing:
Elasticsearch query:
{
"from": 0,
"size": 10,
"query": {
"match_all": {}
},
"search_after": [
"NA 22"
],
"sort": [
{
"callNumber": "asc"
}
]
}Advantages | Disadvantages |
|---|---|
It is faster than 1st approach (the first page must be loaded in 3-4 seconds) | Elasticsearch storage size will require 10-15% more space to keep a dedicated index |
Close values in the same instance for effective shelf order will work correctly | If effectiveShelvingOrder contains close values (NA 22 A112, NA 22 A113) only one will be used, the second one will be ignore |
No need for result collapsing, but additional term aggregation is required to retrieve resource counts | Indexing speed is affected twice or more (need more detailed research) |
It's complicated to add filters by location and type (it will require an optimistic lock to save every event) | |
To populate instance fields - need to run additional request to retrieve them |
Browsing by call-number using PostgreSQL index on call-number values
To be researched.