ARCH-326 -Investigate solution to create separate indexes for holdings and items

ARCH-326 -Investigate solution to create separate indexes for holdings and items

https://folio-org.atlassian.net/browse/ARCH-326

 

1. Executive Summary

The current FOLIO Search architecture (mod-search) uses a monolithic index model where every Instance record embeds all associated Holdings and Items into a single, large JSON document. While simple in its original design, this approach has reached its technical limits. It causes performance degradation in large-scale environments, logical errors in consortial (ECS) search results, and blocks the delivery of several high-priority product requirements.

This document proposes splitting the single Instance index into three specialized indices: Instance (Accelerator), Holdings (Record), and Item (Record). The resulting Three-Index Bridge Model introduces a relational search capability that provides 100% logical accuracy for complex queries while maintaining high-speed search performance required by modern library operations.

Key Outcomes

Dimension

Current State

Proposed State

Dimension

Current State

Proposed State

Index Structure

1 monolithic index (Instance + nested Holdings + nested Items)

3 specialized indices (Instance, Holdings, Item)

Query Logic

"OR" relationships across record types

"AND" logic with precise boolean intersections

Write Performance

Updating 1 item re-indexes 1,000+ siblings

Atomic, single-document updates

Scalability

OOM risk for large instances (5,000+ items)

Small, flat documents with predictable memory usage

2. Problem Statement

2.1 The Monolithic Model

In the current architecture, mod-search stores all data in a single OpenSearch index per resource type. An Instance document with 500 holdings and 2,000 items becomes a single JSON document containing all nested records. This creates several critical limitations.

2.2 Core Limitations

Performance Degradation with Scale
When a single Instance has thousands of items, the nested document becomes extremely large. Every update to any child record (item barcode change, status update) triggers a full re-index of the entire Instance document, including all sibling items and holdings that were not modified. This leads to high CPU/memory consumption and write contention.

Incorrect Boolean Logic in ECS Environments
The nested model produces "OR" relationships where "AND" logic is required. For example, filtering for "Missing items at My Library" actually returns instances where any tenant has a missing item AND any tenant matches the library filter, regardless of whether they are the same tenant. This is a fundamental architectural limitation, not a bug that can be patched.

Blocked Product Requirements
The following JIRA features cannot be delivered with the current monolithic model:

  • UXPROD-5687: Direct holdings and item search (requires record-level indices)

  • UXPROD-5787: Record-specific results list and routing (requires items as first-class search results)

  • UXPROD-5788: Item and holdings usability updates including true numeric sort (requires flat item documents)

  • UXPROD-5789: Cross-record queries with AND facet logic (requires isolated record indices)

  • UXPROD-4906: Analysis confirmed that nested object approach is a dead end

The "Needle in a Haystack" Problem
Finding a specific "Missing" item in an Instance that has 1,000 items requires the user to open the Instance, expand the accordion, and manually scroll through pages of holdings and items. There is no way to search within the holding/item list server-side.

3. Goals and Strategic Benefits

3.1 Logical Precision

Transition from implicit "OR" logic (Instance-centric filters) to explicit "AND" logic (Record-centric boolean intersections). Filters across different record types (e.g., Tenant A + Missing Status) return only exact matches.

3.2 Operational Scalability

Eliminate Out-of-Memory risks associated with massive Instances. Break them into small, flat documents with predictable and bounded memory usage. Adding one item to an Instance with 1,000 items requires indexing only that single item document, not the entire Instance.

3.3 Zero-Downtime Resilience

Enable atomic updates to items and holdings without re-indexing the parent Instance or any sibling records. This drastically reduces system load and eliminates write contention from concurrent modifications.

3.4 User Experience

Provide "direct-to-record" navigation (barcode scan to item detail view), true numeric sorting for serials/enumeration, and server-side searching within an Instance's items and holdings.

3.5 ECS Consortium Accuracy

Guarantee tenant-level isolation for cross-record queries in multi-tenant environments. Eliminate "tenant leakage" that currently produces incorrect results in consortial search.

4. Proposed Architecture: The Three-Index Bridge Model

The architecture separates the monolithic Instance index into three purpose-built indices connected by relational keys.

4.1 Instance Index (The "Accelerator")

  • Role: The primary search target for 90% of queries. Contains all bibliographic metadata plus "Hot Field" summary arrays that represent aggregated holdings/item data.

  • Document Count: One document per unique Instance UUID.

  • Key Characteristic: Retains flattened summaries of high-frequency item/holdings fields (barcodes, statuses, locations) so that most searches remain single-index operations.

4.2 Holdings Index (The "Record")

  • Role: Direct search, browse, and management of individual holdings records.

  • Document Count: One document per unique Holdings UUID.

  • Key Characteristic: Contains denormalized instanceId and instanceTitle for display context and Bridge joins.

4.3 Item Index (The "Record")

  • Role: Direct search, precise filtering (ECS), and true numeric sorting of individual items.

  • Document Count: One document per unique Item UUID.

  • Key Characteristic: Contains denormalized instanceId, holdingsRecordId, and instanceTitle for display context and Bridge joins.

4.4 Relational Links

From Index

Link Field

To Index

Relationship

From Index

Link Field

To Index

Relationship

Item

instanceId

Instance

Many-to-One

Item

holdingsRecordId

Holdings

Many-to-One

Holdings

instanceId

Instance

Many-to-One

Instance

Hot Field summaries

Items/Holdings

Aggregated One-to-Many

5. The "Hot Fields" Strategy (Instance Accelerator)

5.1 Rationale

To ensure that the majority of user searches remain instantaneous single-index operations, the Instance index retains a set of "Hot Fields" — high-frequency search and facet fields aggregated from child holdings and items into flat summary arrays. These fields are selected based on analysis of search frequency, user behavior patterns documented in UXPROD requirements, and consortial logic needs.

5.2 Selected Hot Fields

Field

Type

Source

Selection Rationale

Field

Type

Source

Selection Rationale

itemBarcodes

keyword[]

All items

Most common exact-match search. Librarians scan barcodes expecting instant results. Without this field, every barcode lookup would require a two-step Bridge join.

itemHrids

keyword[]

All items

Human-Readable IDs are frequently used for quick lookups and copy/paste searches.

holdingsHrids

keyword[]

All holdings

Same rationale as item HRIDs — direct lookup by Holdings identifier.

itemStatusNames

keyword[]

All items

The most-used facet in Inventory UI. Users filter by "Available", "Missing", "Withdrawn" to assess collection health.

itemMaterialTypeIds

keyword[]

All items

Primary format facet (Book, DVD, Microform). Used in acquisition decisions and patron discovery.

itemLocationIds

keyword[]

All items

Branch/library filter. Critical for ECS "Held by my institution" queries. Prevents consortial noise in multi-tenant searches.

holdingsCallNumbers

keyword[]

All holdings

Shelf-list search and call number range browsing. Frequent in cataloging workflows. Enables high-speed prefix matching.

holdingTenantIds

keyword[]

All holdings

ECS-specific. Array of all tenant IDs that have holdings for this Instance. Enables instant "Held by my institution" filtering without joins.

5.3 Use Case Detail

Barcode Scan ("The Quick Find")
A librarian scans a barcode. Because the barcode exists in the Instance Accelerator, the system immediately finds the parent Instance in one query. The response includes the instanceId, from which the UI can route to the Item Detail View. Without this Hot Field, every barcode scan would require first querying the Item index, extracting the instanceId, and then querying the Instance index — doubling latency.

Collection Cleanup ("Status Filtering")
A manager filters by "Status: Missing" in the Instance results list. The itemStatusNames summary array allows this to be a standard term filter on the Instance index. The result is a list of all Instances that have at least one missing item. The user can then drill into the specific Instance to see which items are missing.

Format-Based Discovery ("Material Type Facet")
A patron searches for "The Great Gatsby" and filters by "DVD". The itemMaterialTypeIds field in the Instance Accelerator returns only Instances that contain at least one DVD item, without querying the Item index.

ECS Institutional Filter ("Held by My Library")
In a consortium, a staff member clicks the "Held by my institution" facet. The system adds holdingTenantIds: "Member_A" to the query. Only Instances where Member A owns at least one holding appear. No join required.

6. Query Execution and the Bridge Strategy

The system uses a Query Router component within mod-search to analyze incoming CQL queries and determine the optimal execution path. The router classifies queries into one of four strategies.

6.1 Strategy A: Direct Record Search

When: The user is explicitly searching for Items or Holdings (e.g., in the "Item" tab or "Holdings" tab).

Execution:

  1. The CQL query is sent directly to the item or holdings index.

  2. OpenSearch returns a flat, paginated list of records.

  3. Each document includes denormalized instanceTitle for display context.

Example: "Find all items with status Missing"

CQL: items.status.name == "Missing" Target: item index Query: { "term": { "status.name": "Missing" } }

Benefit: Results are actual Item records, not Instance records containing items. Each row in the UI represents one unique barcode, enabling true pagination, sorting, and UUID export.

6.2 Strategy B: Accelerator-Only Search

When: The query targets only Instance-level fields or Hot Field summaries, which covers approximately 90% of typical searches.

Execution:

  1. The CQL query is executed against the Instance index only.

  2. Hot Field summaries enable filtering by item/holdings data without joins.

Example: "Find Instances with language English and at least one Missing item"

CQL: languages == "eng" AND items.status.name == "Missing" Target: instance index Query: { "bool": { "must": [ { "term": { "languages": "eng" } }, { "term": { "itemStatusNames": "Missing" } } ] } }

Benefit: Single-index query. No Bridge join required. Millisecond response times preserved.

6.3 Strategy C: Bridge Join (Cross-Record Query)

When: The query contains fields that span multiple record types and the Hot Fields alone cannot satisfy the precision requirement.

Execution (Two-Phase):

  1. Phase 1: Query the item or holdings index with the record-level filter. Extract the list of instanceIds from the results.

  2. Phase 2: Query the instance index with the bibliographic filter, adding id IN [extracted instanceIds] as an additional constraint.

Example: "Find English Instances where a specific item is On Order at Location X"

Phase 1 (item index): { "bool": { "must": [ { "term": { "status.name": "On order" } }, { "term": { "effectiveLocationId": "location_X_uuid" } } ]}} → Returns instanceIds: ["uuid-1", "uuid-5", "uuid-12"] Phase 2 (instance index): { "bool": { "must": [ { "term": { "languages": "eng" } }, { "terms": { "id": ["uuid-1", "uuid-5", "uuid-12"] } } ]}}

Benefit: Produces 100% accurate AND logic. There is zero possibility of "tenant leakage" or "OR confusion" because the two-phase join enforces strict boolean intersection.

6.4 Strategy D: Scoped Search Within a Record

When: The user is viewing an Instance Detail page and wants to search or filter within its items or holdings.

Execution:

  1. Query the item index with instanceId: "X" AND [user filter].

  2. Results are paginated server-side.

Example: "Find all Available items within Instance X at Main Library"

Target: item index Query: { "bool": { "must": [ { "term": { "instanceId": "instance-uuid-X" } }, { "term": { "status.name": "Available" } }, { "term": { "effectiveLocationId": "main-library-uuid" } } ] } }

Benefit: Solves the "Needle in a Haystack" problem. Users no longer need to scroll through accordion lists of 1,000+ items in the browser. Server-side search, filter, and sort within a specific Instance.

7. Real-Time Update Strategy

7.1 Overview

Maintaining consistency between the three indices during real-time operations (item status changes, barcode updates, new items added) is critical. The design uses a combination of atomic document updates and targeted partial updates to keep all indices synchronized.

7.2 Event-Driven Architecture

All updates are driven by Kafka events emitted by mod-inventory. The ResourceEvent schema carries both the old and new states of the modified record:

ResourceEvent: properties: id: Resource UUID type: CREATE | UPDATE | DELETE tenant: Tenant ID new: Complete new state of the record (JSON object) old: Complete previous state of the record (JSON object)

The presence of both old and new states enables surgical, field-level updates without requiring full document retrieval.

7.3 Strategy A: Surgical Scripted Update (Preferred)

For high-frequency, single-field changes (barcode update, status change), the system sends an OpenSearch Painless script to modify only the affected value in the Instance Accelerator's Hot Field arrays.

Example: Barcode Change

When an item's barcode changes from OLD_123 to NEW_456:

  1. Item Index: Atomic document replacement (standard index operation).

  2. Instance Accelerator: Scripted partial update:

json POST /instance/_update/{instanceId} { "script": { "source": "ctx._source.itemBarcodes.removeIf(it -> it == params.oldVal); ctx._source.itemBarcodes.add(params.newVal);", "params": { "oldVal": "OLD_123", "newVal": "NEW_456" } } }

Advantages: No network round-trip for the Instance document. Submillisecond execution. No impact on other fields.

7.4 Strategy B: Re-flattening (Safety Net)

For complex changes (item moved between holdings, multiple simultaneous updates) or when the old state is ambiguous, the system performs a full summary refresh:

  1. Query the item index for all items belonging to the instanceId.

  2. Recalculate all Hot Field summary arrays from the current item set.

  3. Send a partial document update to the Instance Accelerator.

POST /instance/_update/{instanceId} { "doc": { "itemBarcodes": ["barcode1", "barcode2", "barcode3"], "itemStatusNames": ["Available", "Missing"], "itemLocationIds": ["loc-uuid-1", "loc-uuid-2"] } }

Advantages: Guarantees 100% consistency. Self-healing — if a previous event was missed, this approach corrects the summary. Acceptable latency for lower-frequency operations.

7.5 Update Scenarios

Operation

Item Index

Holdings Index

Instance Accelerator

Operation

Item Index

Holdings Index

Instance Accelerator

Item Created

Create new document

No change

Script: add barcode, status, location to summary arrays

Item Deleted

Delete document

No change

Script: remove values from summary arrays

Barcode Updated

Replace document

No change

Script: swap old/new barcode in itemBarcodes

Status Changed

Replace document

No change

Script: update itemStatusNames array

Item Moved (Holdings)

Update holdingsRecordId

No change

Re-flatten: recalculate summaries for old and new Instance

Holdings Created

No change

Create new document

Script: add HRID, call number, tenantId to summaries

Holdings Deleted

Delete related items

Delete document

Re-flatten: recalculate all summaries

Instance Updated

No change

No change

Standard document update

7.6 Consistency Guarantee

The IndexConsistencyManager component within mod-search implements the following rules:

  1. Primary Write: The record-level index (Item or Holdings) is always updated first. This is the source of truth.

  2. Secondary Write: The Instance Accelerator summary is updated after the primary write succeeds.

  3. Failure Recovery: If the secondary write fails, the system logs the inconsistency and schedules a re-flattening task. The Instance Accelerator may be temporarily stale, but no data is lost. The next full reindex or explicit re-flatten corrects the state.

8. (ECS) Multi-Tenant Architecture

8.1 Current ECS Model and Its Limitations

In the current architecture, ECS builds all shared records into one single consolidated index per resource type. A consortium with 10 member tenants has one instance index containing records from all members, with tenantId embedded at every level (Instance, Holdings, Items).

The fundamental limitation is that nested documents within a single Instance do not enforce cross-field tenant isolation. A filter for "Tenant A" combined with "Status: Missing" executes as:

"Find an Instance that has ANY holding from Tenant A AND ANY item (from any tenant) that is Missing."

This produces incorrect results — items from Tenant B appear because the filter matched at the Instance level, not the record level.

8.2 Proposed ECS Architecture: Consolidated Accelerator / Isolated Records

The Three-Index Bridge Model introduces a layered strategy for consortia.

Instance Index (Consolidated Accelerator)

  • One document per unique Instance UUID across the entire consortium.

  • Contains shared bibliographic data (title, author, subjects).

  • Hot Field summary arrays contain values from all member tenants (all barcodes, all statuses, all locations).

  • New field holdingTenantIds lists every tenant that owns at least one holding for the Instance.

  • Standard tenantId and shared fields control visibility.

Holdings and Item Indices (Isolated Records)

  • One document per Holdings/Item UUID.

  • Every document contains a mandatory tenantId field.

  • Queries against these indices naturally enforce tenant isolation because tenantId is a first-class filterable field on a flat document.

8.3 ECS Query Execution

Simple Search (From a Member Tenant)

Every query from a member tenant against the Instance Accelerator is automatically wrapped in a tenant-visibility filter:

{ "bool": { "must": [ { "match": { "title": "Gatsby" } }, { "bool": { "should": [ { "term": { "shared": true } }, { "term": { "tenantId": "Member_A" } } ] } } ] } }

This ensures Member A can see shared/consortium records and their own local records, but never another member's private records.

"Held by My Institution" Facet

When a user activates the "Held by my institution" facet:

{ "bool": { "must": [ { "match": { "title": "Gatsby" } }, { "term": { "holdingTenantIds": "Member_A" } } ] } }

This returns only Instances where Member A physically holds at least one copy. Single-index, millisecond response.

Precise Cross-Record ECS Query (Bridge Join)

"Find English books that are Missing at My Library"

  1. Phase 1 (Item Index): tenantId: "Member_A" AND status.name: "Missing" → Returns instanceIds.

  2. Phase 2 (Instance Index): languages: "eng" AND id IN [instanceIds from Phase 1].

Result: Only Instances where Member A's own items are Missing. Zero tenant leakage. 100% boolean AND accuracy.

8.4 ECS Query Logic Summary

Search Scenario

Execution Path

Target Indices

Search Scenario

Execution Path

Target Indices

Simple keyword search

Tenant visibility filter on Instance

Instance

"Held by my institution" facet

holdingTenantIds filter on Instance

Instance

Item-level filter (status, barcode)

Direct query with tenantId constraint

Item

Cross-record filter (English + Missing at my library)

Bridge: Item → Instance join

Item → Instance

Exact barcode scan

Direct lookup with tenantId

Item

Holdings call number search

Direct query with tenantId

Holdings

8.5 ECS Real-Time Update Flow

When a member tenant updates a record in a consortium:

  1. Event: ITEM_UPDATED arrives from Member B's Kafka topic with old and new states.

  2. Item Index: Atomic update to the specific Item document (isolated to Member B's record).

  3. Instance Accelerator: Scripted update to the shared Instance document's Hot Field summaries.

    • The update adds Member B's new values and removes old values from the consortium-wide summary arrays.

    • The holdingTenantIds field is updated if the change affects holdings ownership.

Write Contention Mitigation: In the current monolithic model, two tenants updating different items on the same shared Instance both compete to update the same massive document. In the new model, each tenant updates only its own small Item document atomically. The Instance Accelerator receives lightweight scripted updates that operate on specific array elements, not the entire document.

8.6 Challenges Solved by the ECS Architecture