Run index on Rancher env with bugfest dataset

Run index on Rancher env with bugfest dataset

Steps:

Create a namespace with Bugfest dataset

For successful indexation on extensive datasets (such as Bugfest). It would help if you had an environment (namespace) with next resources:

  • DB: RDS

  • Kafka: Shared (AWS MSK). But also possible built-in with at least 50Gb of disk space and 2 brokers

  • OpenSearch: Shared (AWS OpenSearch)

So with Project job provision a namespace with restore from RDS snapshot.

Upgrade the environment to the latest version. (If needed)

Check Kafka topics

Before starting of indexation ensure that topics for modules responsible for indexation have 50 partitions.

You could check this information with Kafka UI

Topics:

  • inventory.instance

  • search.instance-contributor

  • search.instance-subject

Example
Example

Pic. 1 Example "Kafka UI topics & partitions"

Adjust Kafka messages retention (OPTIONAL)

Before starting of indexation find log.retention.minutes (if log.retention.ms is null) property and set it to 24 hours (1440 minutes). Could be set on broker level.

If decided to do this on a topic level only - log.retention.ms should be changed because it has higher precedence over ..minutes and it's usually already set to some value.

Tune mod-search config(REQUIRED)

KAFKA_EVENTS_CONCURRENCY (default - 2) with higher value could increase instances reindex.

KAFKA_CONTRIBUTORS_CONCURRENCY (default - 2) with higher value could increase instances reindex.

KAFKA_SUBJECTS_CONCURRENCY(default - 2) with higher value could increase instances reindex.

No sense to make this higher than topic partition number because consumers will be created max 1 for partition.

So if we have 50 partitions and 4 mod-search instances - we may set KAFKA_SUBJECTS_CONCURRENCY to 13 so 4*13 = 52 and 12-13 consumers will be created for each app instance.

Considering that there should always be more subjects/contributors than instances - only subjects/contributors should be tuned. If there's an observation that subjects/contributors are read from topic faster than published - then we may want to also tune instances topic.

Scale-up backend modules(REQUIRED)

For better performance, please scale up backend modules.

You could perform this operation via Rancher in the Deployment section

Modules: 

  • mod-search (1 → 4) (or 2 → 4 for namespaces with HA mode)

  • mod-inventory-storage (1 → 2) (or not scale up for namespaces with HA mode)

Pic. 2 Example "Backend module scale up"

For ECS Consortia tenants


In pgadmin ran this query to identify current value and change the value to false as in the screenshot

SELECT feature_id, enabled

FROM cs00000int_mod_search.feature_config;

Start index

After completion of all pre-required steps, trigger index with POST Postman request.

URI: /search/index/inventory/reindex

Headers: X-Okapi-Tenant & X-Okapi-Token

items for resourceName: instance, authority, location

Body: 

{ "recreateIndex": true, "resourceName": "instance", "indexSettings": { "numberOfShards": 1, "numberOfReplicas": 1 } }

Help

More information about index and requests you could find here:

Adjust indices settings

An additional configuration for improving indexation duration and stability is switching-off replication and refresh interval for indices.

On the Rancher environment, you could easily do this with OpenSearch Dashboars in "Dev Tools" section.

You need to perform a PUT request to modify the setting of each newly created indices.

Indices:

  • instance

  • instance_subject

  • contributor

 

// Request PUT /folio-testing-sprint_instance_fs09000000/_settings { "index": { "number_of_replicas": "0", "refresh_interval": "-1" } } // Response { acknowledged: true }

Wait for complete

Now, just wait for index completion.

It is 3 possible ways how to track index progress and completeness: