/
Reindex Improvements

Reindex Improvements

Arch ticket: https://folio-org.atlassian.net/browse/ARCH-273

Summary

Implementation of the classification browse feature required the introduction of a new search index. Adding a new index negatively impacted the reindexing procedure performance. For tenants with large datasets, the reindexing procedure exceeds the maintenance time window.

Especially, the impact is significant on ECS environment due to the need to aggregate data across multiple tenants' inventory storage. The event model of reindexing involves receiving "create/update" domain events by mod-search which have only identifiers of related instances and the module would fetch the full information on the entity through HTTP requests. This is the root cause of the reindexing procedure slowness. The proposed solution describes the approach to address the issue with database schema-to-schema communication instead of HTTP communication.

Requirements

Functional requirements

  1. There should be no impact on the current behavior of search capabilities.

  2. The event model for indexing newly created/updated documents should remain as-is

Non-functional requirements

  1. Performance

  2. ECS Support

Baseline Architecture

The baseline architecture is described here:

  1. Search indexing procedure architecture

  2. ECS indexing procedure

Drawbacks of existing solution:

  1. HTTP calls to inventory impact the latency of indexing of a single instance.

  2. Slow-running “upsert scripts” for partial updates in OpenSearch/Elasticsearch.

  3. The need to aggregate instances across multiple tenants in an ECS environment requires multiple updates for every instance

  4. Data duplication in the consortium_instance table might cause additional overhead in Postgres performance for big dataset

Solution Options

#

Option

Description

Pros

Cons

Decision

#

Option

Description

Pros

Cons

Decision

0

Existing architecture

The reindexing procedure is based on the domain event model.

 

  • Reaches the limits of a maintenance window

  • Change of search mapping requires full reindexing

  • Domain events during reindexing only contain instance id and require additional HTTP request for each instance to get the whole entity

 

1

Database-to-Database query

The reindexing is split into “merge” and “indexing” stages and the