DRAFT: MOD-SEARCH Refactor: Streaming Reindex with a Dual-Index Model for Operational and Search Workloads
Executive Summary
This document presents a comprehensive reimagining of mod-search. It combines two critical initiatives—a streaming-based, zero-downtime reindexing system and a Dual-Index Architecture that separates operational updates from search queries—into a single, cohesive system. This new architecture moves mod-search from a complex, batch-oriented system with a rigid data structure to a resilient, and performant service.
The proposed architecture introduces a dual-index system: a flat Operational Index optimized for fast, granular updates (e.g., item status changes) and a nested Search View Index, automatically generated via OpenSearch Transforms, that preserves the existing complex data structure required for bibliographic search. The initial population of this system will be handled by a new streaming reindex process that pulls data directly from mod-inventory-storage with zero downtime, eliminating the current two-phase batch system and its associated overhead.
Key benefits of this unified approach include:
Instantaneous Operational Updates: Item and holdings changes are reflected quickly without reindexing the entire parent instance.
Zero-Downtime Reindexing: Searches continue uninterrupted on the current index while a new one is populated in the background.
Preserved Search Functionality: The Search View Index is identical in structure to the current index, ensuring 100% compatibility with all existing CQL queries, faceting, and browsing features.
System-Wide Simplification: Eliminates complex state management tables and intermediate data storage in Postgres, reducing operational complexity and load.
Guaranteed Data Consistency: A Delta Reconciliation phase ensures that any in-flight updates during reindexing are synchronized, eliminating race conditions and data staleness.
Reduced Load on Postgres & Kafka: Load is reduced on Postgres since intermediate table data processing is no longer needed. Kafka is spared since data is flowing via HTTP instead. This also means reduced hosting costs.
Incremental Updates
Full Reindex
Search Queries
- 1 Executive Summary
- 1.1 Incremental Updates
- 1.2 Full Reindex
- 1.3 Search Queries
- 2 1. Current Architecture and Problem Statement
- 3 2. Proposed Architecture
- 4 3. Detailed Workflows
- 5 4. Critical Considerations & Mitigation
- 6 5. Conclusion
1. Current Architecture and Problem Statement
The existing mod-search architecture suffers from two fundamental, interconnected problems: an inefficient reindexing process and a monolithic index structure that is poorly suited for both fast updates and complex queries.
Current Limitations
Inefficient Reindexing Process: The current two-phase batch process (
MergeandUpload) is complex, requires storing the entire dataset in intermediate postgres tables, and necessitates index downtime. It places a high batch load onmod-inventory-storage. There is limited parallelism due to:Strict Sequencing of the Merge & Upload Phase: The upload phase cannot begin until 100% of the merge phase is done.
Database contention as a central bottleneck: Both phases are heavily dependent on intermediate tables in Postgres. It is leading to row locking, I/O bottlenecks, and database CPU saturation. Even if you have 20 parallel workers, they all get stuck waiting for their turn to access the database.
Complex State Management Overhead: The system has to track the status (IN_PROGRESS, COMPLETED, FAILED) of hundreds or thousands of individual range jobs in the database. This coordination creates its own overhead. Each worker constantly needs to query and update the status tables, which adds to the database contention mentioned above and slows down the actual work of processing data.
Significant Kafka storage is required for re-indexing. When failures needing another reindex occurs, Kafka storage is utilized until its cleared by an administrator or aged out. There have been instances where Kafka storage is expanded to continue reindexing.
In a developer environment with minimal or zero data, the reindex process takes about a minute because Kafka messages for ranges have to be consumed even when there is no data.
Monolithic Index Structure: The use of a single, deeply nested document for instances, holdings, and items creates severe performance bottlenecks. Any update to a single item or holding (e.g., a checkout) triggers a full re-fetch and reindex of the entire instance document, causing high memory pressure and update latency. There is no mechanism for granular updates.
Cascading Reindexing: This monolithic structure leads to cascading reindexing operations, where a simple holdings location change can trigger the reindexing of thousands of associated items. It assembles these two large, flat arrays (holdings and items) into the single instance document and sends it to OpenSearch for a full replacement.
2. Proposed Architecture
The new architecture is founded on four pillars: a dual-index structure to separate workloads, automated data transformation within OpenSearch, a streaming ingestion pipeline from mod-inventory-storage to mod-search, and zero-downtime alias management.
2.1. Dual-Index Architecture for Operational and Search Workloads
A single index cannot be optimized for both high-throughput, granular writes and complex, hierarchical reads. We will solve this by implementing a dual-index architecture to serve these two distinct workloads.
2.1.1. The Operational Index ({env}_{resource}_{tenant}_operational_{stream_id})
A flat, denormalized index designed for fast, granular updates. Each instance, holding, and item will exist as a separate document. This is the primary destination for all real-time data changes from Kafka events.
Purpose: To handle fast CRUD operations from operational workflows (circulation, inventory). Optimized for writes.
Structure: Flat documents with
document_type,instanceId,holdingsId, anditemIdfields for relation.Refresh Interval: Fast (e.g.,
1s) for near real-time ingestion.(See Appendix A for Operational Index Mapping)
2.1.2. The Search View Index ({env}_{resource}_{tenant}_view_{stream_id})
A nested index whose structure is identical to the current instance.json model. This index is a materialized view, automatically built and maintained by OpenSearch based on the data in the Operational Index.
Purpose: To serve all existing user-facing search queries (CQL, faceting, browsing). Optimized for reads.
Structure: Nested instance-holdings-item hierarchy.
Refresh Interval: Slower (e.g.,
30s) as it's updated in batches by the transform.(See Appendix B for View Index Mapping)
2.2. OpenSearch Index Transforms
The "glue" between the Operational and Search View indices. An OpenSearch Index Transform is a background job that continuously queries the operational index, aggregates the flat data into the required nested structure, and writes the result to the view index.
Continuous Mode: After the initial build, the transform runs on a schedule (e.g., every 60s), processing only the data that has changed in the Operational Index. This is highly efficient.
Automation: The process of building and maintaining the search hierarchy is offloaded to the OpenSearch cluster, simplifying application logic. Postgres is spared from maintaining inventory data in the mod-inventory-storage and mod-search schemas & querying to get the full inventory hierarchy for every update. This will provide more headroom for the database to handle production load(e.g. Data Import, OAI-PMH) while indexing is ongoing.
Resilience: Transforms are managed by OpenSearch and will retry on failure.
(See Appendix C for Index Transform Configuration)
2.3. Streaming Data Ingestion
To perform a full reindex, we will replace the batch system with a direct streaming pipeline from mod-inventory-storage.
2.3.1. New Cursor-Based Endpoint in mod-inventory-storage
A new endpoint will be introduced to allow mod-search to stream all instance records efficiently.
GET /instance-storage/instances/stream?cursor={cursor}&limit={limit}
Content-Type: application/x-ndjson
X-Next-Cursor: 01936b304d2a7c5e8000abcdef123456
X-Has-More: trueOther data points like authorities, holdings & items can follow this pattern. The main point is to stream data via HTTP instead of Kafka.
Because this endpoint does not follow strict RMB standards and its purpose is for mod-search only AND Folio does not have private/public APIs. The recommendation is to have the path for these endpoint behind _internal like GET /_internal/instance-storage/instances/stream. Ample warning is add to the API documentation stressing that it is an internal endpoint that should not be integrated and its output can change.
This streaming approach provides significant benefits across the entire data pipeline:
Efficient Client-Side Processing (
mod-search): The NDJSON (Newline Delimited JSON) stream allowsmod-searchto process records one by one. This results in minimal and constant memory overhead, as the entire dataset is never buffered in memory. The cursor provides a simple, stateless mechanism for resuming an interrupted stream without data loss or duplication.Reduced Memory Pressure on
mod-inventory-storage: The benefits extend beyond the client. Formod-inventory-storage, instead of loading large batches of records into application memory before sending them, the server can stream results directly from the database to the HTTP response. This results in a minimal memory footprint, reducing garbage collection pressure and making the service more stable under reindexing load. The system still needs to support regular FOLIO usage.Greater Database Efficiency: The current range-based approach (
WHERE id >= X AND id <= Y) forces the database to repeatedly plan and execute queries for each data segment. A single, cursor-based streaming query is more efficient. The database creates one execution plan for the entire result set and simply streams the rows back to the application, minimizing query planning overhead and making better use of its internal caches. ALIMITwill still be set, but it can be a larger value than beforeDecoupling Reindexing from Kafka: This architecture fundamentally removes Kafka from the full reindexing data path, which is a critical improvement for the stability and purpose of the event bus:
Eliminates Topic Bloat: All inventory data is no longer persisted to Kafka topics simply to support reindex operations. The inventory topics are no longer flooded with millions of temporary reindexing events and can be reserved for their intended purpose: communicating real-time business entity changes.
Prevents Duplicate Events on Retry: Because the data transfer is a direct, stateless HTTP stream, multiple failed reindex attempts will not cause
mod-inventory-storageto publish duplicate events into Kafka repeatedly.
2.3.2. Streaming Reindex Service in mod-search
This new service orchestrates the full reindexing process by calling the streaming endpoint and populating the new versioned Operational Index.
2.3.3. Delta Reconciliation System
To address the race condition where records can be updated via Kafka events during a multi-hour reindex process, a Delta Reconciliation phase is implemented:
Change Logging: During active reindexing, all resource IDs that are modified via Kafka events are logged in a
reindex_deltasdatabase table.Reconciliation Phase: After streaming completes, the system re-fetches the latest state of all changed resources from their source of truth and updates the new index.
Data Consistency Guarantee: This ensures the new index contains the absolute latest data before going live, eliminating the race condition where live updates go to the old index while the new index contains stale versions.
2.4. Zero-Downtime Index & Alias Management
Stable Alias: A main alias (e.g.,
folio_instance_diku) will always point to the active Search View Index. All search queries will target this alias.Versioned Physical Indices: Each reindex operation creates a new set of versioned indices identified by a time-ordered UUIDv7 (
stream_id).folio_instance_diku_operational_01936b2f8c7e...folio_instance_diku_view_01936b2f8c7e...
Atomic Switch: Once the new Search View Index is fully populated and verified, the main alias is atomically switched to point to it. The switch is instantaneous and invisible to users.
Cleanup: The old set of indices can be safely removed after a retention period.
3. Detailed Workflows
3.1. Workflow: Full Reindexing (Streaming)
Initiation: A reindex is triggered via API call.
State Creation: A record is created in the
reindex_stream_statetable with a newstream_id(UUIDv7). The status isINITIALIZING.Index Preparation:
New physical indices are created:
..._operational_{stream_id}and..._view_{stream_id}.A new continuous transform is created for the new index pair, initially paused to prevent premature updates during the streaming phase.
Streaming: The
StreamingReindexServicebegins streaming data frommod-inventory-storage. As records arrive, they are converted to the flat operational format and bulk-indexed into the..._operational_{stream_id}index. Thereindex_stream_statetable is updated periodically with the latest cursor. Status:STREAMING.Delta Reconciliation: After streaming completes, the status changes to
RECONCILING. The system processes any resources that were updated during the streaming phase by re-fetching their latest state and updating the new operational index.Transform Build: Once reconciliation is complete, a one-time, full transform is triggered to populate the
..._view_{stream_id}index from the now-complete Operational Index. Status:SWITCHING.Alias Switch: After the transform completes, the main search alias is atomically switched to point to the new
..._view_{stream_id}.Completion: The state is marked
COMPLETED. The new continuous transform is activated to handle incremental updates going forward. The old continuous transform is stopped and the old indices are scheduled for cleanup.
3.2. Workflow: Incremental Updates (Kafka Events)
This workflow completely replaces the current indexInstancesById flow for child updates.
Event Reception:
KafkaMessageListenerreceives an event for an item, holding, or instance.Reindex State Check: The system checks if there is an active reindex stream for the tenant.
If no active reindex: Proceed with normal indexing to the current live index.
If active reindex exists: Update only the old, currently live index (to ensure users see real-time changes) and log the resource ID in the
reindex_deltastable for later reconciliation.
Direct Indexing: The event handler converts the payload into the appropriate flat document structure.
Operational Update: The document is indexed directly into the appropriate Operational Index. This is a single, fast operation. For an item status update, only the item document is touched.
Automatic Propagation: The OpenSearch Index Transform will automatically pick up this change on its next run and update the corresponding nested document in the Search View Index. The typical delay for this propagation to search results will be 30-60 seconds.
3.3. Workflow: Delta Reconciliation
This critical workflow ensures data consistency during reindexing by handling the race condition where records are updated while streaming is in progress.
3.3.1. Database Schema: reindex_deltas Table
CREATE TABLE reindex_deltas (
id BIGSERIAL PRIMARY KEY,
stream_id VARCHAR(255) NOT NULL REFERENCES reindex_stream_state(stream_id) ON DELETE CASCADE,
resource_type VARCHAR(50) NOT NULL, -- 'instance', 'holdings', 'item', etc
resource_id VARCHAR(255) NOT NULL,
tenant_id VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_reindex_deltas_stream_id ON reindex_deltas(stream_id);
CREATE INDEX idx_reindex_deltas_lookup ON reindex_deltas(stream_id, resource_type);3.3.2. Change Logging During Active Reindex
Kafka Event Processing: When
KafkaMessageListenerprocesses events for a tenant with an active reindex:Extract resource IDs and determine resource type from the events:
Instance events →
resource_type: 'instance',resource_id: instanceIdHoldings events →
resource_type: 'holdings',resource_id: holdingsIdItem events →
resource_type: 'item',resource_id: itemId
Create
ReindexDeltaobjects with thestream_id,resource_type,resource_id, andtenant_id.Bulk-save these objects to the
reindex_deltastable usingReindexDeltaRepository.saveAll().Continue processing the event normally (updating the live index only).
3.3.3. Reconciliation Phase Execution
Trigger: After the streaming phase completes,
StreamingReindexServiceupdates status toRECONCILINGand invokesReconciliationService.reconcileDeltas(activeStream).Fetch Changed Resources: Query
reindex_deltastable to get the list of unique combinations ofresource_typeandresource_idthat changed during the reindex window.Process in Batches: If changes exist, partition the resources into manageable batches by type (e.g., 100 instances, 100 holdings, 100 items per batch).
Re-fetch Latest Data: For each batch:
Call
mod-inventory-storageto fetch the absolute latest versions of these resources.Convert the fresh records into flat operational document format.
Bulk-index these documents into the new
..._operational_{stream_id}index, overwriting any stale versions.
Completion: After all batches are processed, the reconciliation phase is complete and the workflow proceeds to the Transform Build phase.
3.4. Workflow: Consortium Reindexing
Consortium reindexing extends the standard reindexing workflow to handle multi-tenant consortia where a central tenant maintains shared indices for all member tenants. This workflow can only be initiated by the central tenant.
3.4.1. Consortium Architecture Overview
In a FOLIO consortium:
Central Tenant: The primary tenant that maintains all physical indices (both Operational and Search View). Owns the database schemas, indices, and transforms.
Member Tenants: Secondary tenants that share the central tenant's indices but have their own independent data. Do not have their own indices or database schemas. Only have Kafka topics and system users for event processing.
Shared vs. Tenant-Specific Data: Resources in the Operational Index include a
tenant_idfield to distinguish ownership. Resources from the central tenant are additionally marked withshared: trueto indicate consortium-wide visibility.
3.4.2. Tenant Identification and Validation
Tenant Type Discovery: When a reindex is initiated, the system determines the tenant's role:
Queries the user-tenants API to retrieve consortium membership information
Extracts
centralTenantIdto determine if the current tenant is central or a memberA tenant is considered "central" only if its ID matches the
centralTenantIdreturned by the API
Authorization Check: Before initiating a full reindex:
The system validates that the requesting tenant is the central tenant
If a member tenant attempts to initiate reindex, the request is rejected with an error indicating member tenants are not allowed to trigger reindexing
Only the central tenant has permission to trigger consortium-wide reindexing
3.4.3. Consortium Reindex Workflow
When the central tenant initiates a full reindex, the workflow includes both central and member tenant data:
Initiation: Central tenant calls
POST /search/index/instance-records/reindex/fullState Creation: A single
reindex_stream_staterecord is created with a newstream_idfor the entire consortium.Index Preparation:
New versioned indices are created:
{env}_instance_{centralTenant}_operational_{stream_id}and{env}_instance_{centralTenant}_view_{stream_id}A new continuous transform is created for the central tenant index pair, initially paused
Important: No indices are created for member tenants. All consortium data resides in the central tenant's indices.
Multi-Tenant Streaming:
Central Tenant Stream: The streaming service fetches all instances from the central tenant's inventory storage:
Documents are marked with
tenant_id: {centralTenant}andshared: trueIndexed into
..._operational_{stream_id}
Member Tenant Streams: For each member tenant:
The system switches execution context to the member tenant using system-user credentials
The streaming service fetches instances from that member tenant's inventory storage
Documents are marked with
tenant_id: {memberTenant}andshared: falseIndexed into the same central tenant operational index:
{env}_instance_{centralTenant}_operational_{stream_id}
The cursor position for each tenant stream is tracked independently in the
reindex_stream_statetable or a related tenant-specific tracking structure
Consortium-Wide Delta Reconciliation:
During the streaming phase, any Kafka events from any tenant (central or member) that modify resources are logged in the
reindex_deltastable with their respectivetenant_idAfter all tenant streams complete, the reconciliation phase re-fetches the latest state of changed resources from each tenant's inventory and updates the new operational index
This ensures consistency across all tenants in the consortium
Transform Build:
A one-time, full transform is triggered to populate the
..._view_{stream_id}indexThe transform aggregates data from the operational index, grouping by
instanceIdand preserving thetenant_idandsharedfields in the nested outputThe resulting Search View Index contains instances from all consortium tenants, properly tagged for filtering
Alias Switch:
The main search alias (e.g.,
folio_instance_{centralTenant}) is atomically switched to point to the new..._view_{stream_id}Search queries from any consortium tenant now target this unified index
Consortium search logic applies tenant-specific filters to show appropriate results based on user affiliation
Cleanup:
The old operational and view indices are removed after a retention period
The new continuous transform is activated to handle incremental updates from all consortium tenants
The old continuous transform is stopped
3.4.4. Consortium Incremental Updates
Kafka events from member tenants are processed identically to central tenant events:
Event Reception: The event listener receives an event from any consortium tenant (central or member)
Tenant Context: The event includes the originating
tenant_idin the Kafka headers or payloadReindex State Check:
For member tenants, the system checks if the central tenant has an active reindex stream
Updates go to the central tenant's current live Operational Index
Direct Indexing: The event is converted to a flat document with
tenant_idandsharedfields set appropriatelyOperational Update: Document is indexed into the central tenant's
..._operational_{current}indexAutomatic Propagation: The central tenant's transform picks up the change and updates the Search View Index within 30-60 seconds
3.4.5. Consortium-Specific Considerations
Search Filtering: The search service ensures that queries automatically filter results based on:
User's home tenant and affiliated tenants
Central tenant resources marked with
shared: trueare visible to all consortium membersMember tenant resources are only visible to that specific tenant unless explicitly shared
Transform Configuration: The transform aggregation for consortium indices must handle the multi-tenant nature:
Group by
instanceId(may have duplicates across tenants, but combined withtenant_idcreates uniqueness)Preserve
tenant_idandsharedfields in the output documentsApply appropriate size limits that account for consortium scale (e.g., if 5 member tenants each have 2000 holdings, the limit must accommodate 10,000+ holdings per instance)
Failure Isolation: If a member tenant's stream fails during reindexing:
The failure is logged but does not abort the entire consortium reindex
The central and other member tenant streams continue
A retry mechanism or manual intervention can re-stream the failed tenant's data
Database Schema Isolation: Member tenants do not have their own database schemas for reindex tracking tables.
4. Critical Considerations & Mitigation
4.1. Transform Aggregation Size Limits
Risk: The OpenSearch
termsaggregation has a defaultsizelimit (e.g., 10,000). If an instance has more items or holdings than this limit, the transform will silently truncate the data, leading to incomplete records in the Search View Index.Mitigation:
Increase Limits: The
sizecan be increased, but this has memory implications for the OpenSearch cluster.Monitoring: We must implement proactive monitoring to query the Operational Index and alert when any instance approaches the configured limits (e.g., at 80% capacity).
Validation: A daily validation job will compare counts between the operational and view indices to detect discrepancies.
Policy: For extreme edge cases (e.g., a journal with 50,000+ items), a policy of splitting the bibliographic record may be required.
4.2. Deletion Propagation Lag
Trade-off: When an item or holding is deleted, it is removed from the Operational Index instantly. However, it will remain visible in the Search View Index (and thus search results) until the next transform run removes it (a 30-60 second delay).
Mitigation:
Transform Tuning: The transform's
delaycan be tuned to balance between performance and propagation speed.
4.3. Reindex Data Consistency (Race Conditions)
Risk: During multi-hour reindexing operations, records that are streamed early can be subsequently updated via Kafka events. Without intervention, the new index would contain stale data while the live updates only reach the old index, creating a race condition where the new index becomes inconsistent upon activation.
Impact: When the alias switches to the new index, users would see outdated information for records that were modified during the reindex window, potentially causing operational confusion and data integrity issues.
Mitigation - Delta Reconciliation:
Change Tracking: All resource modifications during active reindexing are logged in the
reindex_deltastable.Reconciliation Phase: Before the new index goes live, the system re-fetches the latest state of all changed resources from the authoritative source.
Consistency Guarantee: The new index is updated with fresh data, ensuring 100% consistency when it becomes active.
Automated Cleanup: The delta tracking data is automatically purged when the reindex operation completes via ON DELETE CASCADE constraints.
4.4. Live Search Continuity During Reindexing
Design Principle: Users must continue to see up-to-date search results throughout the entire reindexing process, which can take several hours for large tenants.
Transform Management Strategy: