MSEARCH-333 Optimize query for subject browsing
Approach
Current query to retrieve the subject counts provides results in 4-6 second. It can be optimized using following approaches:
- move filters from the aggregation to the query level
- Use single msearch request to retrieve subject count without aggregation using basic terms query per each subject
Performance test configuration
Property | Value |
---|---|
Environment | |
mod-search | 3 nodes with CPU limit = 1024m and memoryLimit = 1200MB |
okapi | 3 nodes with CPU limit = 256m and memoryLimit = 536MB |
mod-authtoken | 2 nodes with CPU limit = 128m and memoryLimit = 360MB |
mod-login | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
mod-permissions | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
ElasticSearch | AWS based Number of nodes: 4 Resources: r6g.large, 16 GiB of Memory, 2 vCPUs, EBS only, 64-bit Arm platform Properties: timeout=30 |
Search queries | Terms |
Count of resources | 8,180,456 (bugwest kiwi dataset) |
Count of subjects | 4,078,882 |
Performance Test duration | 300 (5 min) |
V_USERS | 5 |
RAMP_UP | 5 sec |
HOSTNAME | falcon-perf-okapi.ci.folio.org |
Aggregated Results
Current solution
5_curr_5min_subjectBrowse_20220420.csv
Elasticsearch query:
{ "from": 0, "size": 0, "query": { "match_all": { "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }
Optimized query with terms filter
5_opt1_5min_subjectBrowse_20220420.csv
Elasticsearch query example
{ "from": 0, "size": 0, "query": { "bool": { "filter": [ { "terms": { "plain_subjects": [ "s1", "s2" ], "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }
Term query based counts (msearch)
5_opt2_5min_subjectBrowse_20220420.csv
Elasticsearch multisearch request example
{"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s1","boost":1.0}}},"track_total_hits":true} {"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s2","boost":1.0}}},"track_total_hits":true}