mod-search: Test Reindexing of Instances on consortium environment (Poppy)
Overview
The purpose of the document is to assess reindexing performance on a consortium environment. Calculate reindex time and size of index.
Recommendations & Jiras
- Original ticket to test: - PERF-635Getting issue details... STATUS
- Additional info ElasticSearch Reindex Performance Recommendations
Test Summary
- Reindexing process for consortium environment takes:
- 3 hours for 3 tenants in parallel (1.7M instances);
- 2 hours for central tenant reindexing (1.2M instances);
- 1 hour for secondary tenant reindexing (353K instances);
- 20 minutes for secondary tenant reindexing (202K instances).
- Duration depends not only on instance number, but also its type (source). cs00000int_0001 tenant has a lot of shared instances (343K) but much less unshared ones (10K) compared to cs00000int_0002 (200K unshared). Data details can be found here: Datastructure
- High CPU utilization is observed on nginx-okapi module - up to 413% during 3 tenants test.
- CPU utilization for mod-inventory-storage reached 102% during test on central tenant. mod-search CPU utilization was about 13-27% during all the tests.
- No memory leaks suspected.
Test Runs /Results
Test # | Instances number | Test Conditions reindexing on Poppy release, consortium environment | Duration * | Notes |
1. 2023-11-28 09:20-12:20 UTC | 1766108 | In parallel: 3 tenants | 3 hours |
|
2. 2023-11-29 08:50 - 10:50 UTC | 1212927 | Sequential: cs00000int | 2 hours | |
3. 2023-11-29 14:05 - 15:05 UTC | 353179 | Sequential: cs00000int_0001 | 1 hour | |
4. 2023-11-2915:22 - 15:42 UTC | 200002 | Sequential: cs00000int_0002 | 20 min |
*Duration depends not only on instance number, but also it's type (source). Data details can be found here: Datastructure
Indexing size
All the data from the tables below were capruted after each test was finished.
In parallel: 3 tenants | |||||||||
---|---|---|---|---|---|---|---|---|---|
health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size |
green | open | pcon_instance_subject_cs00000int | oo6lG2KjRBm68SlF4cQf-A | 4 | 2 | 955467 | 188613 | 3.4gb | 1.1gb |
green | open | pcon_contributor_cs00000int | D9izLpcOQWmqZFJhl6kyyA | 4 | 2 | 929045 | 231525 | 1.8gb | 685mb |
green | open | pcon_authority_cs00000int | 5CbWdgFrQSSBRlcNHPk7-A | 4 | 2 | 0 | 0 | 2.4kb | 832b |
green | open | pcon_instance_cs00000int | 6i5vkM6kRHOOu4ahRqESkA | 4 | 2 | 1422889 | 1498 | 23.7gb | 7.8gb |
Results from get-request for reindex monitoring:
Tenant | Reindex id | Get request reindex |
---|---|---|
cs00000int | 5dfca883-6236-438c-b4a4-bfd2274cdc0b | 1212891 |
cs00000int_001 | 86a89e41-2858-4a9c-9e58-578ccf677413 | 353179 |
cs00000int_002 | 01ec9821-aca8-4a28-8c02-844807191ca9 | 200002 |
SUM | 1766072 |
health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size |
---|---|---|---|---|---|---|---|---|---|
Sequential: cs00000int | |||||||||
green | open | pcon_instance_subject_cs00000int | XMqrUxTkTNKz4rIS1fi9Ug | 4 | 2 | 862411 | 106300 | 3.2gb | 1gb |
green | open | pcon_contributor_cs00000int | TGA0HARfRLGXMDuFyQcmfg | 4 | 2 | 835533 | 162792 | 1.5gb | 560.8mb |
green | open | pcon_authority_cs00000int | 5CbWdgFrQSSBRlcNHPk7-A | 4 | 2 | 0 | 0 | 2.4kb | 832b |
green | open | pcon_instance_cs00000int | R_Goc_w2T8CRnfvfbbsxHg | 4 | 2 | 1212891 | 0 | 25.9gb | 8.5gb |
Sequential: cs00000int_0001 | |||||||||
green | open | pcon_instance_subject_cs00000int | XMqrUxTkTNKz4rIS1fi9Ug | 4 | 2 | 865366 | 83466 | 4.6gb | 1.5gb |
green | open | pcon_contributor_cs00000int | TGA0HARfRLGXMDuFyQcmfg | 4 | 2 | 839478 | 90380 | 2.3gb | 806mb |
green | open | pcon_authority_cs00000int | 5CbWdgFrQSSBRlcNHPk7-A | 4 | 2 | 0 | 0 | 2.4kb | 832b |
green | open | pcon_instance_cs00000int | R_Goc_w2T8CRnfvfbbsxHg | 4 | 2 | 1222891 | 161584 | 28.5gb | 9.4gb |
Sequential: cs00000int_0002 | |||||||||
green | open | pcon_instance_subject_cs00000int | XMqrUxTkTNKz4rIS1fi9Ug | 4 | 2 | 955467 | 126534 | 4.5gb | 1.2gb |
green | open | pcon_contributor_cs00000int | TGA0HARfRLGXMDuFyQcmfg | 4 | 2 | 929045 | 189949 | 2.7gb | 1gb |
green | open | pcon_authority_cs00000int | 5CbWdgFrQSSBRlcNHPk7-A | 4 | 2 | 0 | 0 | 2.4kb | 832b |
green | open | pcon_instance_cs00000int | R_Goc_w2T8CRnfvfbbsxHg | 4 | 2 | 1422889 | 161588 | 32.6gb | 10.9gb |
Results from get-request for reindex monitoring:
Tenant | Reindex id | Get request reindex |
---|---|---|
cs00000int | bb944cf4-b99f-4aa3-b13e-f5c92dc630ed | 1212891 |
cs00000int_001 | cf943d63-50db-4085-9629-783d7acdc67b | 353179 |
cs00000int_002 | c62e2662-6a21-47fd-9bb8-eea11364c2c1 | 200002 |
SUM | 1766072 |
Service CPU Utilization
Test #1 (3 tenants in parallel)
Maximum CPU utilization:
nginx-okapi - 413%
mod-inventory-storage - 95%
okapi - 73%
mod-search - 27%
Test #2 (cs00000int main tenant, sequential)
Maximum CPU utilization:
nginx-okapi - 332%
mod-inventory-storage - 102%
okapi - 63%
mod-search - 27%
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Maximum CPU utilization:
nginx-okapi - 285%
mod-inventory-storage - 98%
okapi - 63%
mod-search - 13%
Memory Utilization
Test #1 (3 tenants in parallel)
Memory utilization:
mod-search - 38% → 50%
mod-inventory-storage - 11% → 31%
Test #2 (cs00000int main tenant, sequential)
Memory utilization:
mod-search - 37% → 50%
mod-inventory-storage - 13% → 21%
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Memory utilization:
mod-search - 33% → 49%
mod-inventory-storage - 26% → 31%
DB CPU Utilization
Test #1 (3 tenants in parallel)
Maximum DB CPU utilization - 37%
Test #2 (cs00000int main tenant, sequential)
Maximum DB CPU utilization - 56%
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Maximum DB CPU utilization - 36%
DB Connections
Test #1 (3 tenants in parallel)
Test #2 (cs00000int main tenant, sequential)
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Open Search CPU
Test #1 (3 tenants in parallel)
Maximum CPU utilization - 57%
Test #2 (cs00000int main tenant, sequential)
Maximum CPU utilization - 47%
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Maximum CPU utilization - 53%
Open Search Indexing Data Rate
Test #1 (3 tenants in parallel)
Test #2 (cs00000int main tenant, sequental)
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Open Search Indexing Latency
Test #1 (3 tenants in parallel)
Test #2 (cs00000int main tenant, sequential)
Test #3 and #4 (cs00000int_0001, cs00000int_0002 secondary tenants, sequential)
Appendix
Infrastructure
PTF-environment pcon
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 2 instances of db.r6g.xlarge database instances, one reader and one writer
- MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Data structure
Tenant | Source | Instance number | Instances sum |
---|---|---|---|
cs00000int | FOLIO | 115035 | 1212927 |
MARC | 1097892 | ||
cs00000int_0001 | CONSORTIUM-FOLIO | 38712 | 353179 |
CONSORTIUM-MARC | 304467 | ||
FOLIO | 1000 | ||
MARC | 9000 | ||
cs00000int_0002 | CONSORTIUM-MARC | 4 | 200002 |
FOLIO | 30000 | ||
MARC | 169998 |
Module versions
Methodology/Approach
- Use consortium cluster for testing (pcon in our case).
- Configure the environment in accordance with Infrastructure parameters to the one that FSE commonly uses.
- Run reindex, get the results for indexing time, index size. Use Steps for testing process#Reindex for details.
- Compare results.