Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  • Reindex could be done in 1 hour 25 minutes (db.r6g.8xlarge) for 10 million instances. It is 6.8 times faster than the Poppy release with db.r6g.xlarge. Duration 2 hours 25 minutes with the db.r6g.xlarge (same database size) and it is 4 times faster compared to Poppy.

  • It is possible to run reindex with the small-size database (xlarge). duration -- hours and we have 10 mln records

  • It is not possible to run multitenant reindex. If starting 3 reindex in parallel for 3 tenants from 1 to 3 reindex will fail.

  • Service CPU utilization was up to 50% for mod-search and 40% for mod-inventory-storage. For all other services CPU did not exceed 20%.

  • Memory utilization was stable and no memory leaks or OOM issues were observed.

  • RDS CPU utilization was about 90% for the database db.r6g.xlarge and up to 35% for db.r6g.8xlarge.

  • A larger database instance type typically results in faster reindexing times. However, for the db.r6g.8xlarge and db.r6g.4xlarge, the reindexing duration is nearly identical. Therefore, for 10 million instance records reindex, it's more efficient to use the db.r6g.4xlarge or db.r6g.2xlarge database instance.

Recommendations & Jiras

It is not possible to run multitenant reindex. If starting 3 reindex in parallel for 3 tenants from 1 to 3 reindex will be failed.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMSEARCH-868

...

Test #

Start time

End time

Instances number

Test Conditions

reindexing on Poppy release, consortium environment

Duration *

Notes


1

2024-10-17T12:41:14

2024-10-17T14:06:46

10,099,620

Sequential: fs07000001

1 hour 25 min

  • mod-search:

  1. task count = 4

  2. Mem Hard Limit = 2592

  3. Mem Soft Limit = 2480

  4. Xmx = -XX:MaxRAMPercentage=85.0

  • mod-inventory-storage task count = 4

  • open search Data nodes instance scaled up to r6g.4xlarge.search

2

2024-10-17T14:35:59

2024-10-17T19:49:26

27,957,839

Sequential: fs09000000

5 hours 14 min

3

2024-10-17T19:58:07

2024-10-17T20:12:24

1,210,000

Sequential: fs07000002

14 min

4

2024-10-17T20:21:23

2024-10-17T22:46:34

In parallel: 3 tenants

All tenants reindex FAILED in
2 hours 25 min "MERGE_FAILED"

5

2024-10-16T14:38:08

2024-10-16T15:53:37

10,099,620

Sequential: fs07000001

1 hour 15 min

  • mod-search:

  1. task count = 4

  2. Mem Hard Limit = 4592

  3. Mem Soft Limit = 4480

  4. Xmx = -XX:MaxRAMPercentage=85.0

  • mod-inventory-storage task count = 4

  • open search Data nodes instance scaled up to r6g.4xlarge.search

6

2024-10-16T16:11:53

2024-10-16T21:08:00

27,957,839

Sequential: fs09000000

4 hours 57 min

7

2024-10-17T06:20:22

2024-10-17T06:34:04

1,210,000

Sequential: fs07000002

14 min

8

2024-10-17T06:40:22

2024-10-17T09:12:00

In parallel: 3 tenants

reindex FAILED for 1 tenant [tenantId: fs09000000, error: java.util.concurrent.CompletionException: org.folio.search.exception.FolioIntegrationException: Failed to publish reindex records range]

9

2024-10-15T10:16:50

2024-10-15T12:04:51

10,099,620

Sequential: fs07000001

1 hour 48 min

  • mod-search:4.0.0-SNAPSHOT.278

10

2024-10-16T08:01:14

2024-10-16T12:32:19

10,099,620

Sequential: fs07000001

4 hours 31 min

  • Data nodes instance scaled down to r6g.large.search

11

2024-10-21T09:49:10

2024-10-21T12:14:38

10,099,620

Sequential: fs07000001

2 hours 25 min

  • 2 instances of database db.r6g.xlarge (reader and writer)

12

2024-10-21T16:36:18

2024-10-21T18:26:15

10,099,620

Sequential: fs07000001

1 hour 50 min

  • 1 instance of database db.r6g.2xlarge

Indexing size

13

2024-10-22T09:16:23

2024-10-22T10:38:20

10,099,620

Sequential: fs07000001

1 hour 22 minutes

  • 1 instance of database db.r6g.4xlarge

Indexing size

All the data from the tables below were captured after each test. Results from request for reindex monitoring GET /search/index/instance-records/reindex/status:

...

Ramsons

Poppy

Delta absolut

Delta

Compared to the database 8xlarge for Ramsons

1 hour 25 min

9 hours 38 min

8 hours 13 min

6.8 times

Compared to 2 instances of database xlarge for Ramsons (the same as for Poppy testing)

2 hours 25 min

9 hours 38 min

7 hours 13 min

4 times

Reindex duration and database size correlation:

A larger database instance type typically results in faster reindexing times. However, for the db.r6g.8xlarge and db.r6g.4xlarge, the reindexing duration is nearly identical. Therefore, for 10 million instance records reindex, it's more efficient to use the db.r6g.4xlarge or db.r6g.2xlarge database instance.

Database size

Duration

2 instances of database database db.r6g.xlarge

2 hours 25 min

database db.r6g.2xlarge

1 hour 50 min

database db.r6g.4xlarge

1 hour 22 minutes

database db.r6g.8xlarge

1 hour 25 min

Resource utilization

Service CPU Utilization

Service CPU utilization was up to 50% for mod-search and 40% for mod-inventory-storage. For , for all other services CPU did not exceed 20%.

...

Database use the same average amount of connections

...

Open Search metrics

CPU utilization percentage for all data nodes (Average).

...

Memory usage percentage for all data nodes (Average).

...

chrome_tuP3wGC9cE.png

Test #11 2 instances of db.r6g.xlarge database: writer and reader instances.

...

Infrastructure

PTF-environment rcp1

  • 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 

  • 1 instance of db.r6g.8xlarge database, writer instance.

  • MSK 
    4 kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

  • OpenSearchcluster

    • OpenSearch version 2.13;

    • Data nodes

      • Availability Zone(s) - 2-AZ without standby

      • Instance type - r6g.4xlarge.search

      • Number of nodes - 4

      • EBS volume size (GiB) - 300

      • Provisioned IOPS - 3000IOPS

      • Provisioned Throughput (MiB/s) - 250 MiB/s

    • Dedicated master nodes
      Enabled - No

...