MSEARCH-333 Optimize query for subject browsing

Approach

Current query to retrieve the subject counts provides results in 4-6 second. It can be optimized using following approaches:

  • move filters from the aggregation to the query level
  • Use single msearch request to retrieve subject count without aggregation using basic terms query per each subject

Performance test configuration

Property

Value

Environment

https://falcon-perf-okapi.ci.folio.org

mod-search3 nodes with CPU limit = 1024m and memoryLimit = 1200MB
okapi3 nodes with CPU limit = 256m and memoryLimit = 536MB
mod-authtoken2 nodes with CPU limit = 128m and memoryLimit = 360MB
mod-login2 nodes with CPU limit = 128m and memoryLimit = 536MB
mod-permissions2 nodes with CPU limit = 128m and memoryLimit = 536MB
ElasticSearch

AWS based

Number of nodes: 4
Resources: r6g.large, 16 GiB of Memory, 2 vCPUs, EBS only, 64-bit Arm platform
Properties: timeout=30
Search queries
Terms
washington
africa
independence
media
disk
comparison
computer
series
system
housing
trainer
airport
priority
emotion
possession
topic
appointment
mixture
committee
awareness
way
hospital
success
addition
hearing
worker
combination
effort
fortune
interaction
hospital
lady
confusion
music
throat
agency
science
wedding
exam
honey
pizza
leadership
marketing
road
initiative
audience
poem
fortune
elevator
area
dirt
height
breath
charity
two
success
engine
length
quantity
suggestion
supermarket
reputation
shopping
administration
outcome
promotion
mode
profession
shirt
funeral
hair
cheek
patience
independence
psychology
effort
engineering
drawer
reflection
army
resource
people
writing
people
volume
reception
scene
son
pie
population
combination
weakness
impression
gate
worker
computer
song
proposal
history
fact
love
contract

Count of resources8,180,456 (bugwest kiwi dataset)
Count of subjects4,078,882
Performance Test duration300 (5 min)
V_USERS5
RAMP_UP5 sec
HOSTNAMEfalcon-perf-okapi.ci.folio.org

Aggregated Results

Current solution

5_curr_5min_subjectBrowse_20220420.csv

Elasticsearch query:

{
  "from": 0,
  "size": 0,
  "query": {
    "match_all": { "boost": 1.0 }
  },
  "aggregations": {
    "subjects": {
      "terms": {
        "field": "plain_subjects",
        "size": 2,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [ { "_count": "desc" }, { "_key": "asc" } ],
        "include": [ "s1", "s2" ]
      }
    }
  }
}

Optimized query with terms filter

5_opt1_5min_subjectBrowse_20220420.csv

Elasticsearch query example

{
  "from": 0,
  "size": 0,
  "query": {
    "bool": {
      "filter": [ 
        { "terms": 
          { "plain_subjects": [ "s1", "s2" ], "boost": 1.0 } 
        } 
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "aggregations": {
    "subjects": {
      "terms": {
        "field": "plain_subjects",
        "size": 2,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [ { "_count": "desc" }, { "_key": "asc" } ],
        "include": [ "s1", "s2" ]
      }
    }
  }
}

Term query based counts (msearch)

5_opt2_5min_subjectBrowse_20220420.csv

Elasticsearch multisearch request example

{"index": "folio_instance_folio"}
{"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s1","boost":1.0}}},"track_total_hits":true}
{"index": "folio_instance_folio"}
{"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s2","boost":1.0}}},"track_total_hits":true}