MSEARCH-333 Optimize query for subject browsing

MSEARCH-333 Optimize query for subject browsing

Approach

Current query to retrieve the subject counts provides results in 4-6 second. It can be optimized using following approaches:

  • move filters from the aggregation to the query level

  • Use single msearch request to retrieve subject count without aggregation using basic terms query per each subject

Performance test configuration

Property

Value

Property

Value

Environment

https://falcon-perf-okapi.ci.folio.org

mod-search

3 nodes with CPU limit = 1024m and memoryLimit = 1200MB

okapi

3 nodes with CPU limit = 256m and memoryLimit = 536MB

mod-authtoken

2 nodes with CPU limit = 128m and memoryLimit = 360MB

mod-login

2 nodes with CPU limit = 128m and memoryLimit = 536MB

mod-permissions

2 nodes with CPU limit = 128m and memoryLimit = 536MB

ElasticSearch

AWS based

Number of nodes: 4

Resources: r6g.large, 16 GiB of Memory, 2 vCPUs, EBS only, 64-bit Arm platform

Properties: timeout=30

Search queries

Terms
washington africa independence media disk comparison computer series system housing trainer airport priority emotion possession topic appointment mixture committee awareness way hospital success addition hearing worker combination effort fortune interaction hospital lady confusion music throat agency science wedding exam honey pizza leadership marketing road initiative audience poem fortune elevator area dirt height breath charity two success engine length quantity suggestion supermarket reputation shopping administration outcome promotion mode profession shirt funeral hair cheek patience independence psychology effort engineering drawer reflection army resource people writing people volume reception scene son pie population combination weakness impression gate worker computer song proposal history fact love contract

 

Count of resources

8,180,456 (bugwest kiwi dataset)

Count of subjects

4,078,882

Performance Test duration

300 (5 min)

V_USERS

5

RAMP_UP

5 sec

HOSTNAME

falcon-perf-okapi.ci.folio.org

Aggregated Results

Current solution

Elasticsearch query:

{ "from": 0, "size": 0, "query": { "match_all": { "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }

Optimized query with terms filter

Elasticsearch query example

{ "from": 0, "size": 0, "query": { "bool": { "filter": [ { "terms": { "plain_subjects": [ "s1", "s2" ], "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }

Term query based counts (msearch)

Elasticsearch multisearch request example

{"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s1","boost":1.0}}},"track_total_hits":true} {"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s2","boost":1.0}}},"track_total_hits":true}