Skip to end of banner
Go to start of banner

Run index on Rancher env with bugfest dataset

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Steps:

Create a namespace with bugfest dataset

For successful indexation on extensive datasets (such as bugfest). It would help if you had an environment (namespace) with next resources:

  • DB: RDS
  • Kafka: Shared (AWS MSK). But also possible built-in with at least 50Gb of disk space
  • OpenSearch: Shared (AWS OpenSearch)

So with Project job provision a namespace with restore from RDS snapshot.

Upgrade the environment to the latest version. (If needed)

Check Kafka topics

Before starting of indexation ensure that topics for modules responsible for indexation have 50 partitions.

You could check this information with Kafka UI

Topics:

  • inventory.instance
  • search.instance-contributor
  • search.instance-subject

Example

Pic. 1 Example "Kafka UI topics & partitions"

Scale-up OpenSearch

As indexation is a heavy process that has a high CPU and memory resources consumptions, it is required (and strongly recommended) to scale up the shared OpenSearch AWS service.

r6g.xlarge → r6g.2xlarge

Scale-up backend modules

For better performance, please scale up backend modules.

You could perform this operation via Rancher in the Deployment section

Modules: 

  • mod-search (1 → 4)
  • mod-inventory-storage (1 → 2)

Pic. 2 Example "Backend module scale up"

Start index

After completion of all pre required steps, trigger index with POST Postman request.

URI: /search/index/inventory/reindex

Headers: X-Okapi-Tenant & X-Okapi-Token

Body: 

{
  "recreateIndex": true,
  "resourceName": "authority"
}

Help

More information about index and requests you could find here:

Adjust indices settings

An additional configuration for improving indexation duration and stability is switching-off replication and refresh interval for indices.

On the Rancher environment, you could easily do this with OpenSearch Dashboars in "Dev Tools" section.

You need to perform a PUT request to modify setting of each newly created indices.

Indices:

  • instance
  • instance_subject
  • contributor


// Request
PUT /folio-testing-sprint_instance_fs09000000/_settings
{
    "index": {
        "number_of_replicas": "1",
        "refresh_interval": "1s"
    }
}

// Response
{
	acknowledged: true
}

Wait for complete

Now, just wait for index completion.

It is 3 possible ways how to track index progress and completeness:

  • mod-search and mod-inventory-storage logs
  • Kafka messages number for topics mentioned above
  • OpenSearch "Indexing Data Rate" in AWS Managment Console


Tip

07.13.2023 Indexing Data Rate on Rancher environment have next pattern. So if you see something similar in AWS Management Console for OpenSearch service, then indexation goes well.

Scale-down backend modules

After successfully completed of indexation, do not forget to scale down backend modules in Rancher

Modules: 

  • mod-search (4 → 1) (or 4 → 2 for namespaces with HA mode)
  • mod-inventory-storage (2 → 1) (or not scale down for namespaces with HA mode)

Scale-down OpenSearch

  • No labels