Problem statement
Issue: - MSEARCH-899Getting issue details... STATUS
Description: With the introduction of the improvements to the Reindex the procedure was split into two different phases. The merge phase is aimed at preprocessing all the documents from the inventory and storing them in the Postgres database. The second phase is an “upload” phase and aims to load preprocessed search documents to the OpenSearch database. This approach allowed us to run only the second phase when there were no major changes to inventory.
However, the ongoing indexing creates the issue mentioned in the ticket because the “merge” and “upload” phases should happen sequentially for each document and the aggregate query should be run on each insert/delete operation. That creates a heavy load on the Postgres database because of a large number of simultaneous write/read operations to the same tables.
Solution options
Option | Description | Pros & Cons | Decision | |
---|---|---|---|---|
1 | Query optimization | Creation of additional indexes and/or reorganization of the query | Small effort | DECLINED due to low to none impact on the performance |
2 | Extracting upload into a separate thread | The extracting upload operation will allow reducing the amount of the same or similar aggregate queries on each insert/delete operation | Pros:
Cons:
| IN PROGRESS |
Extracting upload into a separate thread
The approach is as follows (using contributors as an example):
Add
last_updated_date
to thecontributors
tableCreate the
sub_resources_locks
table(id, entity_type, locked_flag, last_updated_date
) and fill it for all types of subresourcesCreate a job that every minute gets the list of tenants
select t.tenantjson -> 'descriptor' ->> 'id' from public.tenants t where t.tenantjson -> 'descriptor' ->> 'id' <> 'supertenant';
and for each of the tenants:Tries to
update sub_resources_locks set locked_flag=true where entity_type='contributor' and locked_flag = false returning id, last_updated_date
if the
id
has returned, then fetch the aggregate query of contributors withwhere last_updated_date >=? order by last_updated_date limit 100
clause. The value of 100 is the amount of records that could be processed in a minute, even if not, the lock will not allow it to run in the next minute.Load aggregates to OpenSeach
Run
update sub_resources_locks set locked_flag=false, last_updated_date=:timestamp where entity_type='contributor' and locked_flag = true
, where:timestamp
is the latest timestamp among the records that were fetched in step (b)
NB! In ECS mode the job only should run for the central tenant