Page Comparison

...

Option

Description

Pros & Cons

Decision

1

Query optimization

Creation of additional indexes and/or reorganization of the query

Small effort

Status

colour	Red
title	declined

due to low to none impact on the performance

2

Extracting upload into a separate thread

The extracting upload operation will allow reducing the amount of the same or similar aggregate queries on each insert/delete operation

Pros:

Medium effort

Cons:

Eventual consistency for sub-resources (subjects, contributors, classifications

Status

colour	Blue
title	in progress

3

Aggregation of counts on update/insert

To address a large amount of persisted records it is proposed to persist not relations like contributor-instances, but rather just counts related to each sub-resource. The approach can be applied to improve #2 as they do not contradict

Pros:

Less data to store
Less load on Postgres CPU

Cons:

Same as option 2

Status

title	draft

Extracting upload into a separate thread

...

NB! In ECS mode the job only should run for the central tenant

Query to check if there are any unprocessed contributors:

Code Block

language	sql

SELECT * FROM contributor

WHERE last_updated_date > (SELECT last_updated_date

FROM sub_resources_lock WHERE entity_type='contributor')

ORDER BY last_updated_date DESC

Aggregation of counts on update/insert

The approach consists of the following steps (using contributors as an example):

Create table contribututor_counts with the following structure:

Column	Type	PK
contributor_id	varchar(40)	yes
tenant_id	varchar(100)	yes
type_id	varchar(40)	yes
count	bigint	no

On the create/update/delete operations on an instance, the following actions should be applied:

Operations	Sub-resources	Insert query	Delete query
instance inserted	list to insert	`insert into contributor_counts... on conflict update count=count+1`	N/A
instance updated	list to insert + list to delete	`insert into contributor_counts... on conflict update count=count+1`	`update contributor_counts set count=count-1 where contributor_id =? and tenant_id = ? And type_id =? And count >0`
instance deleted	list to delete	N/A	`update contributor_counts set count=count-1 where contributor_id =? and tenant_id = ? And type_id =? And count >0`

The aggregation query:

Code Block

language	sql

SELECT
        c.id,
        c.name,
        c.name_type_id,
        c.authority_id,
        json_agg(
            json_build_object(
                'count', sub.instance_count,
                'typeId', sub.type_ids,
                'shared', false,
                'tenantId', sub.tenant_id
            )
        ) AS instances
    FROM
      (
        SELECT
            ins.contributor_id,
            ins.tenant_id,
            sum(ins.count) as instance_count,
            array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') as type_ids
        FROM  contributor_count ins
        WHERE
          ins.contributor_id in (?, ?)
        GROUP BY
            ins.contributor_id,
            ins.tenant_id
        ) sub
    JOIN
        contributor c ON c.id = sub.contributor_id
    WHERE
        c.id in (?, ?)
    GROUP BY
        c.id;

POC: https://github.com/folio-org/mod-search/compare/master...poc/runtime-index-kt

Versions Compared

Old Version 5

New Version Current

Key

Extracting upload into a separate thread

Aggregation of counts on update/insert