Item/instance database performance suggestions
High level suggestions
Partitioning
A table can be partitioned along one or more columns with a variety of strategies. A good rule of thumb is to target approximately 1-2M rows per partition. Ideally there will be roughly an even distribution between tables. For evenly partitioning a table with randomly distributed keys, the ideal strategy is to use hash partitioning.
Hash partitioning is implemented in terms of a modulus and a remainder. A simple scalable strategy for managing partitions is to base the modulus on a power of 2 and create a partition for each value less than the modulus.
This allows for a reasonable scaling potentially up to billions of rows and simplifies the process of scaling the number of partitions up or down. Ideally, there would be an independent mechanism for configuring the exponent for each tenant and possibly each table.
For example, the trivial case is if the exponent of the modulus for an instance table is 0, it will have a modulus of 1, which means there will be a single partition with a remainder of 0
Table creation
The DDL for the `instance` table will be consistent regardless of partition count and might look like:
create table instance ( id uuid not null primary key, jsonb jsonb not null, creation_date timestamp, created_by text, instancestatusid uuid constraint instancestatusid_instance_status_fkey references instance_status, modeofissuanceid uuid constraint modeofissuanceid_mode_of_issuance_fkey references mode_of_issuance, instancetypeid uuid constraint instancetypeid_instance_type_fkey references instance_type ) partition by hash(id);
As a convention the name of the partition might be of the form
<parent_table_name>_<modulus>_<remainder.
For the trivial case of modulus 0, the partition would look like
create table instance_1_0 partition of instance for values with (modulus 1, remainder 0);
In the sample dataset, there are a number of tables that would benefit from partitioning. For example
- item has 23901856 rows
- instance has 22618027 rows
- holdings_record has 23901856 rows
Based on a target of 1-2M rows per partition table and a power of 2 partitioning strategy, the ideal number of partitions for each table would be based on an exponent of 4, yielding 16 partitions. This would result in ~1.4M rows for each partition table for all three tables.
Presumably, these 3 tables are joined together frequently, so it makes sense to partition along the same value in order to allow the planner to pick the correct partitions on common joins. Unfortunately, there is not a common value between these three tables.