Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Option

Description

Pros & Cons

Decision

1

Query optimization

Creation of additional indexes and/or reorganization of the query

Small effort

Status
colourRed
titledeclined
due to low to none impact on the performance

2

Extracting upload into a separate thread

The extracting upload operation will allow reducing the amount of the same or similar aggregate queries on each insert/delete operation

Pros:

  • Medium effort

Cons:

  • Eventual consistency for sub-resources (subjects, contributors, classifications

Status
colourBlue
titlein progress

3

Aggregation of counts on update/insert

To address a large amount of persisted records it is proposed to persist not relations like contributor-instances, but rather just counts related to each sub-resource. The approach can be applied to improve #2 as they do not contradict

Pros:

  • Less data to store

  • Less load on Postgres CPU

Cons:

  • Same as option 2

Status
titledraft

Extracting upload into a separate thread

...

NB! In ECS mode the job only should run for the central tenant

Aggregation of counts on update/insert

The approach consists of the following steps (using contributors as an example):

  1. Create table contribututor_counts with the following structure:

Column

Type

PK

contributor_id

varchar(40)

yes

tenant_id

varchar(100)

yes

type_id

varchar(40)

yes

count

bigint

no

  1. On the create/update/delete operations on an instance, the following actions should be applied:

Operations

Sub-resources

Insert query

Delete query

instance inserted

list to insert

insert into contributor_counts... on conflict update count=count+1

N/A

instance updated

list to insert + list to delete

insert into contributor_counts... on conflict update count=count+1

update contributor_counts set count=count-1 where contributor_id =? and tenant_id = ? And type_id =? And count >0

instance deleted

list to delete

N/A

update contributor_counts set count=count-1 where contributor_id =? and tenant_id = ? And type_id =? And count >0

  1. The aggregation query:

Code Block
languagesql
SELECT
        c.id,
        c.name,
        c.name_type_id,
        c.authority_id,
        json_agg(
            json_build_object(
                'count', sub.instance_count,
                'typeId', sub.type_ids,
                'shared', false,
                'tenantId', sub.tenant_id
            )
        ) AS instances
    FROM
      (
        SELECT
            ins.contributor_id,
            ins.tenant_id,
            sum(ins.count) as instance_count,
            array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') as type_ids
        FROM  contributor_count ins
        WHERE
          ins.contributor_id in (?, ?)
        GROUP BY
            ins.contributor_id,
            ins.tenant_id
        ) sub
    JOIN
        contributor c ON c.id = sub.contributor_id
    WHERE
        c.id in (?, ?)
    GROUP BY
        c.id;