Database backup/restore process
Unified PostgreSQL Database Backups
At present, we have two types of PostgreSQL database backups which can only be restored on the same type of instances it was taken from:
- RDS Snapshot
- PostgreSQL psql dump fetched with pg_dumpall
Our objective is to unify this process and store all dumps in a single format that can be restored to both AWS RDS and embedded PostgreSQL. This would simplify our backup and restore processes and make it easier to manage our PostgreSQL databases.
Problems
pg_dumpall could take dumps not more than 100Gi and couldn't dump roles from AWS RDS
If we dump only schemas with pg_dump then during restore we have an issue with schema permissions for RMB modules
We restore the dump with all tenants even if we need only one
Solution
Dump schemas with pg_dump and roles (from rds without password) with pg_dumpall
This approach implies that we separately dump schemas only with pg_dump
pg_dump --host=localhost --port=5432 --username=postgres --verbose --format=directory --jobs=N --compress=K --schema=<tenant>-* --file=<schemas_dump_name>
And dump roles only with pg_dumpall
pg_dumpall --host=localhost --port=5432 --username=postgres --verbose --roles-only --no-role-passwords --file=<roles_dump_name>
Dump each tenant's schemas and role separately
Implementing a logic that enables the dumping of each tenant's schemas and roles is highly recommended. By doing so, the system will be equipped with a highly efficient mechanism that allows for the dumping of a single tenant as opposed to the entire database. Additionally, this functionality will also permit the restoration of not a full dump, but just one tenant, significantly reducing the usage of system resources.
To achieve this, the following logic should be implemented:
Obtain a comprehensive list of tenants from the public.tenants table.
Utilize a regular expression to dump only the schemas pertaining to each tenant. Specifically, schemas should be dumped using the regular expression <tenant_name>_*. This will ensure that only the necessary schemas are dumped while also maintaining data integrity.
- Dump shared schemas (public, mod_licenses__system, mod_oa__system, mod_serials_management__system, mod_service_interaction__system, pubsub_config, source_record_storage_config)
Dump all roles associated with each tenant and split them according to the respective tenant's name. By doing so, the restoration process will be much more streamlined and efficient, allowing for greater system efficiency and reliability.
In summary, implementing the logic to dump each tenant's schemas and roles separately is a highly beneficial practice that will enhance the system's functionality and resource utilization. Therefore, it is strongly recommended that this approach be implemented to achieve optimal performance and reliability.
Pic. 1 Full dump
Pic. 2 Single tenant dump
Store all backups in a single format in S3 Bucket
Using a single naming format for PostgreSQL dumps stored in AWS S3 can improve data management and organization. A consistent naming format can make it easier to find and retrieve specific dumps when needed. It can also facilitate the automation of processes such as backup and restore. Additionally, a uniform naming format can help avoid confusion and errors that may arise from using different naming conventions. It can also enhance compatibility and interoperability between different tools and systems used for working with PostgreSQL dumps. Finally, a single naming format can promote data security and compliance, as it ensures that all dumps are labeled in a consistent and clear manner.
Naming convention:
<backup-name> = $SNAPSHOT_KEY-mm-dd-yyyy where SNAPSHOT_KEY = <cluster-name>-<project-name> for rancher dumps and folio-bugfest-<release-name> for bugfest dumps
folio-dev-nla-01-02-2023 folio-bugfest-nolana-01-02-2023
Folder structure:
<backup-name>-tenants.json
<backup-name>-<tenant1...tenantN>-install.json
<backup-name>-shared-schemas.psql
<backup-name>-<tenant>-schemas.psql
<backup-name>-<tenant>-roles.psql
Use separate flow for backup and restore
Create a shared library for interaction with PostgreSQL cli
Pic. 3 Shared library structure