Testing reindex operation for BugFest dataset

Testing reindex operation for BugFest dataset

Environment configuration

1 Elastcisearch:

Property

Value

Property

Value

Mode

AWS service

Instance type

m5.xlarge.elasticsearch (CPU 4, RAM 16Gb)

Replica count

2

Volume

500 GiB SSD

Version

7.9

 

2 Mod-inventory-storage

Property

Value

Property

Value

Replica count

10 for 1st million of records and 15 after

CPU

0.128

RAM

800mb

Version

20.1.0

 

3 Mod-search

Property

Value

Property

Value

Replica count

10 for 1st million of records and 15 after

CPU

0.5

RAM

512mb

Version

1.1.0

Testing results 

All instance IDs has been published to Kafka in 16 minutes.

Full index: 

Records processed

Time spent

Records processed

Time spent

1 000 000

24 minutes

2 000 000

17 minutes (faster, because additional instances deployed)

3 000 000

15 minutes

5 000 000

29 minutes (~15 minutes/million)

6 000 000

17 minutes

7 214 799

26 minutes

Summary 

As it can be seen, scaling instances improves the speed of the process, but at some point Postgres and OKAPI will be bottlenecks so that increasing replicas does not give a benefit, however in our case when we deployed 5 new instances it gave us 30% speed improvement.