Testing reindex operation for BugFest dataset
Environment configuration
1 Elastcisearch:
Property | Value |
---|---|
Mode | AWS service |
Instance type | m5.xlarge.elasticsearch (CPU 4, RAM 16Gb) |
Replica count | 2 |
Volume | 500 GiB SSD |
Version | 7.9 |
2 Mod-inventory-storage
Property | Value |
---|---|
Replica count | 10 for 1st million of records and 15 after |
CPU | 0.128 |
RAM | 800mb |
Version | 20.1.0 |
3 Mod-search
Property | Value |
---|---|
Replica count | 10 for 1st million of records and 15 after |
CPU | 0.5 |
RAM | 512mb |
Version | 1.1.0 |
Testing results
All instance IDs has been published to Kafka in 16 minutes.
Full index:
Records processed | Time spent |
---|---|
1 000 000 | 24 minutes |
2 000 000 | 17 minutes (faster, because additional instances deployed) |
3 000 000 | 15 minutes |
5 000 000 | 29 minutes (~15 minutes/million) |
6 000 000 | 17 minutes |
7 214 799 | 26 minutes |
Summary
As it can be seen, scaling instances improves the speed of the process, but at some point Postgres and OKAPI will be bottlenecks so that increasing replicas does not give a benefit, however in our case when we deployed 5 new instances it gave us 30% speed improvement.