/
Item/instance database performance suggestions

Item/instance database performance suggestions

High level suggestions

Partitioning

A table can be partitioned along one or more columns with a variety of strategies. A good rule of thumb is to target approximately 1-2M rows per partition. Ideally there will be roughly an even distribution between tables. For evenly partitioning a table with randomly distributed keys, the ideal strategy is to use hash partitioning.

Hash partitioning is implemented in terms of a modulus and a remainder. A simple scalable strategy for managing partitions is to base the modulus on a power of 2 and create a partition for each value less than the modulus.
This allows for a reasonable scaling potentially up to billions of rows and simplifies the process of scaling the number of partitions up or down. Ideally, there would be an independent mechanism for configuring the exponent for each tenant and possibly each table.

For example, the trivial case is if the exponent of the modulus for an instance table is 0, it will have a modulus of 1, which means there will be a single partition with a remainder of 0

Table creation

The DDL for the `instance` table will be consistent regardless of partition count and might look like:

create table instance
(
    id uuid not null primary key,
    jsonb jsonb not null,
    creation_date timestamp,
    created_by text,
    instancestatusid uuid
        constraint instancestatusid_instance_status_fkey
        references instance_status,
    modeofissuanceid uuid
        constraint modeofissuanceid_mode_of_issuance_fkey
        references mode_of_issuance,
    instancetypeid uuid
        constraint instancetypeid_instance_type_fkey
        references instance_type
) partition by hash(id);

As a convention the name of the partition might be of the form

<parent_table_name>_<modulus>_<remainder.

For the trivial case of modulus 0, the partition would look like

create table instance_1_0 partition of instance for values with (modulus 1, remainder 0);


In the sample dataset, there are a number of tables that would benefit from partitioning. For example

  • item  has 23901856  rows
  • instance  has 22618027 rows
  • holdings_record has 23901856 rows

Based on a target of 1-2M rows per partition table and a power of 2 partitioning strategy, the ideal number of partitions for each table would be based on an exponent of 4, yielding 16 partitions. This would result in ~1.4M rows for each partition table for all three tables.

Presumably, these 3 tables are joined together frequently, so it makes sense to partition along the same value in order to allow the planner to pick the correct partitions on common joins.  Unfortunately, there is not a common value between these three tables.