MSEARCH-333 Optimize query for subject browsing
Approach
Current query to retrieve the subject counts provides results in 4-6 second. It can be optimized using following approaches:
move filters from the aggregation to the query level
Use single msearch request to retrieve subject count without aggregation using basic terms query per each subject
Performance test configuration
Property | Value |
|---|---|
Environment | |
mod-search | 3 nodes with CPU limit = 1024m and memoryLimit = 1200MB |
okapi | 3 nodes with CPU limit = 256m and memoryLimit = 536MB |
mod-authtoken | 2 nodes with CPU limit = 128m and memoryLimit = 360MB |
mod-login | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
mod-permissions | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
ElasticSearch | AWS based
|
Search queries | Termswashington
africa
independence
media
disk
comparison
computer
series
system
housing
trainer
airport
priority
emotion
possession
topic
appointment
mixture
committee
awareness
way
hospital
success
addition
hearing
worker
combination
effort
fortune
interaction
hospital
lady
confusion
music
throat
agency
science
wedding
exam
honey
pizza
leadership
marketing
road
initiative
audience
poem
fortune
elevator
area
dirt
height
breath
charity
two
success
engine
length
quantity
suggestion
supermarket
reputation
shopping
administration
outcome
promotion
mode
profession
shirt
funeral
hair
cheek
patience
independence
psychology
effort
engineering
drawer
reflection
army
resource
people
writing
people
volume
reception
scene
son
pie
population
combination
weakness
impression
gate
worker
computer
song
proposal
history
fact
love
contract
|
Count of resources | 8,180,456 (bugwest kiwi dataset) |
Count of subjects | 4,078,882 |
Performance Test duration | 300 (5 min) |
V_USERS | 5 |
RAMP_UP | 5 sec |
HOSTNAME |
Aggregated Results
Current solution
Elasticsearch query:
{
"from": 0,
"size": 0,
"query": {
"match_all": { "boost": 1.0 }
},
"aggregations": {
"subjects": {
"terms": {
"field": "plain_subjects",
"size": 2,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [ { "_count": "desc" }, { "_key": "asc" } ],
"include": [ "s1", "s2" ]
}
}
}
}Optimized query with terms filter
Elasticsearch query example
{
"from": 0,
"size": 0,
"query": {
"bool": {
"filter": [
{ "terms":
{ "plain_subjects": [ "s1", "s2" ], "boost": 1.0 }
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"subjects": {
"terms": {
"field": "plain_subjects",
"size": 2,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [ { "_count": "desc" }, { "_key": "asc" } ],
"include": [ "s1", "s2" ]
}
}
}
}Term query based counts (msearch)
Elasticsearch multisearch request example
{"index": "folio_instance_folio"}
{"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s1","boost":1.0}}},"track_total_hits":true}
{"index": "folio_instance_folio"}
{"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s2","boost":1.0}}},"track_total_hits":true}