Steps:
Create a namespace with Bugfest dataset
For successful indexation on extensive datasets (such as Bugfest). It would help if you had an environment (namespace) with next resources:
- DB: RDS
- Kafka: Shared (AWS MSK). But also possible built-in with at least 50Gb of disk space and 2 brokers
- OpenSearch: Shared (AWS OpenSearch)
So with Project job provision a namespace with restore from RDS snapshot.
Upgrade the environment to the latest version. (If needed)
Check Kafka topics
Before starting of indexation ensure that topics for modules responsible for indexation have 50 partitions.
You could check this information with Kafka UI
Topics:
- inventory.instance
- search.instance-contributor
- search.instance-subject
Pic. 1 Example "Kafka UI topics & partitions"
Scale-up OpenSearch
As indexation is a heavy process with high CPU and memory resource consumption, it is required (and strongly recommended) to scale up the shared OpenSearch AWS service.
r6g.xlarge → r6g.2xlarge
Scale-up backend modules
For better performance, please scale up backend modules.
You could perform this operation via Rancher in the Deployment section
Modules:
- mod-search (1 → 4) (or 2 → 4 for namespaces with HA mode)
- mod-inventory-storage (1 → 2) (or not scale up for namespaces with HA mode)
Pic. 2 Example "Backend module scale up"
Start index
After completion of all pre-required steps, trigger index with POST Postman request.
URI: /search/index/inventory/reindex
Headers: X-Okapi-Tenant & X-Okapi-Token
Body:
{ "recreateIndex": true, "resourceName": "inventory" }
Help
More information about index and requests you could find here:
Adjust indices settings
An additional configuration for improving indexation duration and stability is switching-off replication and refresh interval for indices.
On the Rancher environment, you could easily do this with OpenSearch Dashboars in "Dev Tools" section.
You need to perform a PUT request to modify the setting of each newly created indices.
Indices:
- instance
- instance_subject
- contributor
// Request PUT /folio-testing-sprint_instance_fs09000000/_settings { "index": { "number_of_replicas": "0", "refresh_interval": "-1" } } // Response { acknowledged: true }
Wait for complete
Now, just wait for index completion.
It is 3 possible ways how to track index progress and completeness:
- mod-search and mod-inventory-storage logs
- Kafka messages number for topics mentioned above
- OpenSearch "Indexing Data Rate" in AWS Management Console
Tip
07.13.2023 and all previous runs "Indexing Data Rate" on the Rancher environment have the next pattern. So if you see something similar in AWS Management Console for OpenSearch service, then indexation goes well.
Adjust indices settings (Part II)
!Important
After the indexation process is finished, do not forget to bring indices replicas and refresh interval setting back.
Indices:
- instance
- instance_subject
- contributor
// Request PUT /folio-testing-sprint_instance_fs09000000/_settings { "index": { "number_of_replicas": "1", "refresh_interval": "1s" } } // Response { acknowledged: true }
Scale-down backend modules
After the indexation process is finished, do not forget to scale down the backend modules in Rancher
Modules:
- mod-search (4 → 1) (or 4 → 2 for namespaces with HA mode)
- mod-inventory-storage (2 → 1) (or not scale down for namespaces with HA mode)
Scale-down OpenSearch
After the indexation process is finished, do not forget to scale down the shared OpenSearch AWS service
r6g.2xlarge → r6g.xlarge (If no other conditions/constraints)