Database backup/restore process

Unified PostgreSQL Database Backups

At present, we have two types of PostgreSQL database backups which can only be restored on the same type of instances it was taken from:

  • RDS Snapshot
  • PostgreSQL psql dump fetched with pg_dumpall

Our objective is to unify this process and store all dumps in a single format that can be restored to both AWS RDS and embedded PostgreSQL. This would simplify our backup and restore processes and make it easier to manage our PostgreSQL databases.

Problems

  1. pg_dumpall could take dumps not more than 100Gi and couldn't dump roles from AWS RDS

  2. If we dump only schemas with pg_dump then during restore we have an issue with schema permissions for RMB modules

  3. We restore the dump with all tenants even if we need only one

Solution

Dump schemas with pg_dump and roles (from rds without password) with pg_dumpall

This approach implies that we separately dump schemas only with pg_dump

pg_dump --host=localhost --port=5432 --username=postgres --verbose --format=directory --jobs=N --compress=K --schema=<tenant>-* --file=<schemas_dump_name>

And dump roles only with pg_dumpall

pg_dumpall --host=localhost --port=5432 --username=postgres --verbose --roles-only --no-role-passwords --file=<roles_dump_name>

Dump each tenant's schemas and role separately

Implementing a logic that enables the dumping of each tenant's schemas and roles is highly recommended. By doing so, the system will be equipped with a highly efficient mechanism that allows for the dumping of a single tenant as opposed to the entire database. Additionally, this functionality will also permit the restoration of not a full dump, but just one tenant, significantly reducing the usage of system resources.

To achieve this, the following logic should be implemented:

  1. Obtain a comprehensive list of tenants from the public.tenants table.

  2. Utilize a regular expression to dump only the schemas pertaining to each tenant. Specifically, schemas should be dumped using the regular expression <tenant_name>_*. This will ensure that only the necessary schemas are dumped while also maintaining data integrity.

  3. Dump shared schemas (public, mod_licenses__system, mod_oa__system, mod_serials_management__system, mod_service_interaction__system, pubsub_config, source_record_storage_config)
  4. Dump all roles associated with each tenant and split them according to the respective tenant's name. By doing so, the restoration process will be much more streamlined and efficient, allowing for greater system efficiency and reliability.

In summary, implementing the logic to dump each tenant's schemas and roles separately is a highly beneficial practice that will enhance the system's functionality and resource utilization. Therefore, it is strongly recommended that this approach be implemented to achieve optimal performance and reliability.

Full dump

Pic. 1 Full dump

Single tenant dump

Pic. 2 Single tenant dump

Store all backups in a single format in S3 Bucket

Using a single naming format for PostgreSQL dumps stored in AWS S3 can improve data management and organization. A consistent naming format can make it easier to find and retrieve specific dumps when needed. It can also facilitate the automation of processes such as backup and restore. Additionally, a uniform naming format can help avoid confusion and errors that may arise from using different naming conventions. It can also enhance compatibility and interoperability between different tools and systems used for working with PostgreSQL dumps. Finally, a single naming format can promote data security and compliance, as it ensures that all dumps are labeled in a consistent and clear manner.

Naming convention:

<backup-name> = $SNAPSHOT_KEY-mm-dd-yyyy where SNAPSHOT_KEY = <cluster-name>-<project-name> for rancher dumps and folio-bugfest-<release-name> for bugfest dumps

Example
folio-dev-nla-01-02-2023
folio-bugfest-nolana-01-02-2023

Folder structure:

<backup-name>-tenants.json

<backup-name>-<tenant1...tenantN>-install.json

<backup-name>-shared-schemas.psql

<backup-name>-<tenant>-schemas.psql

<backup-name>-<tenant>-roles.psql

Use separate flow for backup and restore

Create a shared library for interaction with PostgreSQL cli 

Pic. 3 Shared library structure