[TRILIUM] ECS-member tenant reindex
- 1 Overview
- 2 Summary
- 3 Baseline results
- 3.1 Reindex status
- 3.2 DB metrics
- 3.3 Services metrics
- 3.4 Open Search metrics
- 4 Full reindex (ECS reindex branch)
- 4.1 Reindex status
- 4.2 DB metrics
- 4.3 Services metrics
- 4.4 Open Search metrics
- 5 Member tenant reindex
- 5.1 Reindex status
- 5.2 DB metrics
- 5.3 Services metrics
- 5.4 Open Search metrics
- 6 Appendix
Overview
PERF-1236: [] [ECS] Reindex for mod-search (ecs member reindex)In Review
Te purpose of this report is to highlight performance results of ECS-member tenant reindex. In scope of this testing following goals need to be achieved:
Test, establish and describe baseline of full reindex results using master branch snapshot
Test with ecs-memeber-tenant reindex branch full reindex and compare with baseline (acceptance criteria is to be comparable with baseline)
Summary
Baseline
Baseline test finished successfully in 3 hr 49 min (reruns may differ from 3.5 hours to 4.5 hours due to complexity of process).
Merge phase finished in 2 hr 14 min.
All ranges merged successfully. NOTABLE OBSERVATION: during merge phase of instances it’s impossible to duration of merge as afterwards, when merge finished, instance row start to track upload and overrides merge durations. (see table below).
Upload phase finished in 1 hr 35 min.
Two upload ranges got failed status due to timeouts. ‘subject' and 'contributor’
NOTABLE OBSERVATION: if one of ranges fail to upload - status changed to UPLOAD_FAILED and never changing back even if retry was successful.
Resource usages:
DB CPU spikes up to 40% during merge phase (may consider scaling DB down to 2x)
There’s only two main modules included into process mod-search and mod-inventory-storage.
mod-search max CPU 360% (during upload phase)
mod-inventory-storage max CPU 110%
Full reindex (ECS reindex branch)
Full reindex with member tenant reindex brach finished successfully (tested 2 times) with comparable performance. Full duration 3 hr 35 min
Merge phase finished in 2 hr 15 min
Upload phase finished in 1 hr 20 min
For most part resource usage looks the same - except CPU usage on mod-search. CPU usage on mod-search grew due to latest fixes released
Member tenant reindex
Member tenant (001 tenant with 6.5M instances) reindex completed successfully
Total duration 1 hour 25 minutes 18 seconds
Staging phase (parallel)
~17 minutes
Merge phase
37 minutes
Upload phase
~30 minutes
Resource usage following the same patterns as in full reindex.
Baseline results
Reindex status
in order to track reindex status use query:
SELECT entity_type, status, total_merge_ranges, processed_merge_ranges, total_upload_ranges, processed_upload_ranges, start_time_merge, end_time_merge, start_time_upload, end_time_upload
FROM [tenant]_mod_search.reindex_status;Merge phase finished in 2 hr 14 min
Upload phase finished in 1 hr 35 min.
Total duration 3 hr 49 min
Entity type | status | total ranges | processed ranges | total to upload | processed upload | start time merge | end time merge | start time upload | end time upload |
ITEM | MERGE_COMPLETED | 57871 | 57871 | 0 | 0 | 4/6/26 14:23 | 4/6/26 16:23 |
|
|
HOLDINGS | MERGE_COMPLETED | 46275 | 46275 | 0 | 0 | 4/6/26 14:23 | 4/6/26 16:37 |
|
|
SUBJECT | UPLOAD_FAILED* | 0 | 0 | 65536 | 65536 |
|
| 4/6/26 16:37 | 4/6/26 16:43 |
CONTRIBUTOR | UPLOAD_FAILED* | 0 | 0 | 65536 | 65536 |
|
| 4/6/26 16:37 | 4/6/26 16:49 |
CLASSIFICATION | UPLOAD_COMPLETED | 0 | 0 | 65536 | 65536 |
|
| 4/6/26 16:37 | 4/6/26 17:50 |
CALL_NUMBER | UPLOAD_COMPLETED | 0 | 0 | 65536 | 65536 |
|
| 4/6/26 16:37 | 4/6/26 17:54 |
INSTANCE | UPLOAD_COMPLETED | 0 | 0 | 20827 | 20827 |
|
| 4/6/26 16:37 | 4/6/26 18:12 |
NOTE: even that ‘subject' and 'contributor’ entities upload marked as failed total to upload ranges and uploaded ranges are equal mens that there was retry and all ranges being uploaded successfully.
On order to verify that no ranges failed run:
SELECT id, entity_type, lower, upper, created_at, finished_at, status, fail_cause
FROM [tenant]_mod_search.upload_range where status !='SUCCESS';
*on screen above - no failed ranges.
DB metrics
DB CPU is stable, during merge phase reached max 40%. DB scaling down to 2XL may be considered.
Database load is predictable and without anomalies.
Locks that may be visible on a screen happened on reindex_status table because of constant monitoring while mod-search was updating statuses (not affecting performance or process itself).
No DB deadlocks or other anomalies observed. (with previous versions deadlocks was happening during reindex on subject table, they were ignored as they not affecting process itself (when deadlock happening mod-search processing records one-by-one) however they severely affecting performance of instance merge ).
Services metrics
Mod-search
snapshot #401 (build date: Apr 5) from deadlocks-improvements branch
mod-inventory-storage
Open Search metrics
Nothing significant on other charts. not including them into report.
indexing rate
CPU data nodes
Full reindex (ECS reindex branch)
Reindex status
Full reindex completed successfully without failed merge ranges or upload ranges.
Merge phase completed in 2 hr 15 min
Upload phase completed in 1 hr 20 min
Performance is comparable with baseline 3 hr 49 min in baseline test vs 3 hr 35 min with ECS-member-tenant reindex branch
Entity type | status | total ranges | processed ranges | total to upload | processed upload | start time merge | end time merge | start time upload | end time upload |
ITEM | MERGE_COMPLETED | 57871 | 57871 | 0 | 0 | 4/8/26 9:21 | 4/8/26 11:24 |
|
|
HOLDINGS | MERGE_COMPLETED | 46275 | 46275 | 0 | 0 | 4/8/26 9:21 | 4/8/26 11:37 |
|
|
SUBJECT | UPLOAD_COMPLETED | 0 | 0 | 4096 | 4096 |
|
| 4/8/26 11:37 | 4/8/26 12:25 |
CLASSIFICATION | UPLOAD_COMPLETED | 0 | 0 | 4096 | 4096 |
|
| 4/8/26 11:37 | 4/8/26 12:27 |
CONTRIBUTOR | UPLOAD_COMPLETED | 0 | 0 | 4096 | 4096 |
|
| 4/8/26 11:37 | 4/8/26 12:27 |
CALL_NUMBER | UPLOAD_COMPLETED | 0 | 0 | 4096 | 4096 |
|
| 4/8/26 11:37 | 4/8/26 12:29 |
INSTANCE | UPLOAD_COMPLETED | 0 | 0 | 20827 | 20827 |
|
| 4/8/26 11:38 | 4/8/26 12:52 |
DB metrics
Locks on screen below occurred due to constant monitoring of reindex status.
Services metrics
Open Search metrics
indexing rate
CPU utilization
Member tenant reindex
Reindex status
Reindex of member tenant completed successfully. No issues found during test.
Total duration 1 hour 25 minutes 18 seconds
Staging phase (parallel)
~17 minutes
Merge phase
37 minutes
Upload phase
~30 minutes
Entity type | Status | Total ranges | Processed ranges | Total ranges | Processed ranges | Start time merge | End time merge | start time upload | end time upload | tenant id | start time staging | end time staging |
CONTRIBUTOR | UPLOAD_COMPLETED | 0 | 0 | 4096 | 4096 |
|
| 4/10/26 22:51 | 4/10/26 22:53 | cs00000int_0001 |
|
|
ITEM | STAGING_COMPLETED | 19927 | 19927 | 0 | 0 | 4/10/26 21:57 | 4/10/26 22:11 |
|
| cs00000int_0001 | 4/10/26 22:14 | 4/10/26 22:51 |
HOLDINGS | STAGING_COMPLETED | 14560 | 14561 | 0 |