...

Steps:

Create a namespace with

...

Bugfest dataset

For successful indexation on extensive datasets (such as bugfestBugfest). It would help if you had an environment (namespace) with next resources:

DB: RDS
Kafka: Shared (AWS MSK). But also possible built-in with at least 50Gb of disk space and 2 brokers
OpenSearch: Shared (AWS OpenSearch)

...

Pic. 1 Example "Kafka UI topics & partitions"

Scale-up OpenSearch

As indexation is a heavy process that has a high CPU and memory resources consumptions, it is required (and strongly recommended) to scale up the shared OpenSearch AWS service.

...

Adjust Kafka messages retention (OPTIONAL)

Before starting of indexation find log.retention.minutes (if log.retention.ms is null) property and set it to 24 hours (1440 minutes). Could be set on broker level.

If decided to do this on a topic level only - log.retention.ms should be changed because it has higher precedence over ..minutes and it's usually already set to some value.

Image Added

Tune mod-search config(REQUIRED)

KAFKA_EVENTS_CONCURRENCY (default - 2) with higher value could increase instances reindex.

KAFKA_CONTRIBUTORS_CONCURRENCY (default - 2) with higher value could increase instances reindex.

KAFKA_SUBJECTS_CONCURRENCY(default - 2) with higher value could increase instances reindex.

No sense to make this higher than topic partition number because consumers will be created max 1 for partition.

So if we have 50 partitions and 4 mod-search instances - we may set KAFKA_SUBJECTS_CONCURRENCY to 13 so 4*13 = 52 and 12-13 consumers will be created for each app instance.

Considering that there should always be more subjects/contributors than instances - only subjects/contributors should be tuned. If there's an observation that subjects/contributors are read from topic faster than published - then we may want to also tune instances topic.

Scale-up backend modules(REQUIRED)

For better performance, please scale up backend modules.

You could perform this operation via Rancher in the Deployment section

Modules:

mod-search (1 → 4) (or 2 → 4 for namespaces with HA mode)
mod-inventory-storage (1 → 2) (or not scale up for namespaces with HA mode)

Pic. 2 Example "Backend module scale up"

For ECS Consortia tenants

In pgadmin ran this query to identify current value and change the value to false as in the screenshot

SELECT feature_id, enabled

FROM cs00000int_mod_search.feature_config;

Image Added

Start index

After completion of all pre-required steps, trigger index with POST Postman request.

URI: /search/index/inventory/reindex

Headers: X-Okapi-Tenant & X-Okapi-Token

items for resourceName: instance, authority, location

Body:

Code Block

language	groovy

{
  "recreateIndex": true,
  "resourceName": "authority"
instance",
  "indexSettings": {
    "numberOfShards": 1,
    "numberOfReplicas": 1
  }
}

Info

title	Help

More information about index and requests you could find here:

Adjust indices settings

An additional configuration for improving indexation duration and stability is switching-off replication and refresh interval for indices.

On the Rancher environment, you could easily do this with OpenSearch Dashboars in "Dev Tools" section.

You need to perform a PUT request to modify the setting of each newly created indices.

Indices:

instance
instance_subject
contributor

Code Block

language	groovy

// Request
PUT /folio-testing-sprint_instance_fs09000000/_settings
{
    "index": {
        "number_of_replicas": "0",
        "refresh_interval": "-1"
    }
}

// Response
{
	acknowledged: true
}

Wait for complete

Now, just wait for index completion.

It is 3 possible ways how to track index progress and completeness:

mod-search and mod-inventory-storage logs
Kafka messages number for topics mentioned above
OpenSearch "Indexing Data Rate" in AWS Management Console

Info

title	Tip

07.13.2023 and all previous runs "Indexing Data Rate" on the Rancher environment have the next pattern. So if you see something similar in AWS Management Console for OpenSearch service, then indexation goes well.

Image AddedImage Added

Adjust indices settings (Part II)

!Important

After the indexation process is finished, do not forget to bring indices replicas and refresh interval setting back.

Indices:

instance
instance_subject
contributor

Code Block

language	groovy

// Request
PUT /folio-testing-sprint_instance_fs09000000/_settings
{
    "index": {
        "number_of_replicas": "1",
        "refresh_interval": "1s"
    }
}

// Response
{
	acknowledged: true
}

Scale-down backend modules

...

(REQUIRED)

After the indexation process is finished, do not forget to scale down the backend modules in Rancher

Modules:

mod-search (4 → 1) (or 4 → 2 for namespaces with HA mode)
mod-inventory-storage (2 → 1) (or not scale down for namespaces with HA mode)

Adjust Kafka messages retention back(OPTIONAL, if previously modified)

Return previous value to log.retention property (usually 8 hours).

Tune mod-search config back(REQUIRED)

Return modified env variables to default values

An additional approach in case if reindex doesn't work properly (failing, stuck, etc...)

1. Recreate Kafka topics from KafkaUI

Image Added

2. Remove existing indexes from OpenSearch

Image Added

3. Send PUT and POST requests from OpenSearch to clone indexes, do it for all the necessary tenants,
select all the rows and send a request

Image Added

In this example for tenant fs09000000

PUT /general_instance_subject_fs09000000/_block/write
PUT /general_instance_fs09000000/_block/write
PUT /general_contributor_fs09000000/_block/write
PUT /general_authority_fs09000000/_block/write

POST /general_instance_fs09000000/_clone/folio-testing-sprint_instance_fs09000000
POST /general_instance_subject_fs09000000/_clone/folio-testing-sprint_instance_subject_fs09000000
POST /general_contributor_fs09000000/_clone/folio-testing-sprint_contributor_fs09000000
POST /general_authority_fs09000000/_clone/folio-testing-sprint_authority_fs09000000

Versions Compared

Old Version 3

New Version Current

Key

Table of Contents

Steps:

Create a namespace with

Bugfest dataset

Scale-up OpenSearch

Adjust Kafka messages retention (OPTIONAL)

Tune mod-search config(REQUIRED)

Scale-up backend modules(REQUIRED)

For ECS Consortia tenants

Start index

Adjust indices settings

Wait for complete

Adjust indices settings (Part II)

Scale-down backend modules

(REQUIRED)

Adjust Kafka messages retention back(OPTIONAL, if previously modified)

Tune mod-search config back(REQUIRED)

In this example for tenant fs09000000

Page Comparison

Versions Compared

Old Version 3

New Version Current

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

Steps:

Create a namespace with

Bugfest dataset

Scale-up OpenSearch

Adjust Kafka messages retention (OPTIONAL)

Tune mod-search config(REQUIRED)

Scale-up backend modules(REQUIRED)

For ECS Consortia tenants

Start index

Adjust indices settings

Wait for complete

Adjust indices settings (Part II)

Scale-down backend modules

(REQUIRED)

Adjust Kafka messages retention back(OPTIONAL, if previously modified)

Tune mod-search config back(REQUIRED)

In this example for tenant fs09000000