Testing reindex operation for BugFest dataset

Environment configuration

1 Elastcisearch:

PropertyValue
ModeAWS service
Instance typem5.xlarge.elasticsearch (CPU 4, RAM 16Gb)
Replica count

2

Volume500 GiB SSD
Version7.9


2 Mod-inventory-storage

PropertyValue
Replica count10 for 1st million of records and 15 after
CPU0.128
RAM800mb
Version20.1.0


3 Mod-search

PropertyValue
Replica count10 for 1st million of records and 15 after
CPU0.5
RAM512mb
Version1.1.0

Testing results 

All instance IDs has been published to Kafka in 16 minutes.

Full index: 

Records processedTime spent
1 000 00024 minutes
2 000 00017 minutes (faster, because additional instances deployed)
3 000 00015 minutes
5 000 00029 minutes (~15 minutes/million)
6 000 00017 minutes
7 214 79926 minutes

Summary 

As it can be seen, scaling instances improves the speed of the process, but at some point Postgres and OKAPI will be bottlenecks so that increasing replicas does not give a benefit, however in our case when we deployed 5 new instances it gave us 30% speed improvement.