Use Data Migration Jenkins job
Recording of presentation:
This document provides instructions on how to use the Jenkins job for data migration. The job allows users to migrate data from one release version to another in the Rancher performance environment.
It supports various parameters and follows a series of stages to ensure a successful data migration process.
Introduction
The Data Migration Pipeline is designed to facilitate the migration process of data from older versions of modules to the latest versions. The pipeline primarily focuses on two main objectives: measuring the time it takes to migrate data for each module and ensuring the consistency of schemas in the database after migration.
The main purpose of the Data Migration Pipeline:
Time Measurement: The pipeline aims to measure the time taken for data migration from older module versions to the latest versions. It provides insights into the duration of migration for individual modules as well as the overall migration process. This information helps in identifying any performance bottlenecks, optimizing migration procedures, and setting expectations for future migrations.
Schema Comparison: Another key purpose of the pipeline is to compare the schemas of the migrated tenant with the installed schemas. The goal is to identify any discrepancies or differences in the database schemas after the migration. If there are any variations found, the pipeline triggers the creation of a Jira ticket, notifying the team responsible for managing the schemas. This proactive approach ensures that any schema inconsistencies are promptly addressed, leading to a more stable and consistent data environment.
Parameters
The following parameters can be configured when running the data migration job:
Parameter name | Mandatory | Description |
|---|---|---|
folio_repository | true | Specifies the repository from which to fetch the versions of the modules. |
folio_branch_src | true | Specifies the branch of the source repository for the migration. |
folio_branch_dst | true | Specifies the branch of the destination repository for the migration. |
backup_name | false | Sets the name of the RDS snapshot for the migration. Provide the name of the DB backup placed in folio-postgresql-backups AWS s3 bucket. |
slackChannel | true | Defines the Slack channel name to receive the migration report (optional, without the '#' symbol). |
Data Migration pipeline modes
Data Migration with Database Restoration (if backup_name set value): This mode of data migration involves restoring the target database from a backup before initiating the data migration process. This mode is useful when there is a need to check the time that needed for migration, ensuring a clean slate for the migration process.
Data Migration without Database Restoration (if backup_name is NOT set): In this mode, the data migration process is performed without restoring the target database from a backup. This mode is typically utilized when there is a requirement to make a quick check for Schemas differences.
Differences in the run between these 2 modes:
| with Database Restoration | without Database Restoration |
|---|---|---|
Parameter backup_name | set name of backup from the bucket | left empty value |
Costs | more expensive (deployed RDS in AWS) | cheaper (all infrastructure run in Rancher) |
Speed of run | depends on dataset and modules (takes more time than without backup) | around 1 hour |
By providing two data migration modes, the system accommodates different scenarios and allows flexibility in selecting the most suitable approach for each migration task.
Data Migration with Database Restoration
Stages
The data migration job follows the following stages in sequence:
Init: Initializes the data migration process.
Destroy data-migration project: Destroys the existing data migration project, if any.
Restore data-migration project from backup: Restores the data migration project from the specified backup.
Update with dst release versions: Updates the project with the destination versions.
Generate Data Migration Time report: Generates a report on the data migration time.
Create clean tenant: Creates a clean tenant for the data migration with the destination release versions.
Get schemas difference: Retrieves the difference between updated and clear schemas.
Publish HTML Reports: Publish HTML reports related to the data migration process.
Create Jira tickets: Create a Jira ticket to the development team if after the Data Migration pipeline was found some difference in schemas.
Send Slack notification: Sends a notification to the specified Slack channel with the migration report.
Backup DB state: make a backup of fs09000000 and clean tenants. (Stage in development now)
Destroy data-migration project: delete the environment. If in schemas were found some difference will destroy after 6 hours. If not - immediately.
Data Migration without Database Restoration
Stages
The data migration job follows the following stages in sequence:
Init: Initializes the data migration process.
Destroy data-migration project: Destroys the existing data migration project, if any.
Create data-migration project: Create the data migration project from scratch with the source versions.
Update with dst release versions: Updates the project with the destination versions.
Generate Data Migration Time report: Generates a report on the data migration time.
Create clean tenant: Creates a clean tenant for the data migration with the destination release versions.
Get schemas difference: Retrieves the difference between updated and clear schemas.
Publish HTML Reports: Publish HTML reports related to the data migration process.
Create Jira tickets: Create a Jira ticket to the development team if after the Data Migration pipeline was found some difference in schemas.
Send Slack notification: Sends a notification to the specified Slack channel with the migration report.
Backup DB state: make a backup of diku and clean tenants. (Stage in development now)
Destroy data-migration project: delete the environment. If in schemas were found some difference will destroy after 6 hours. If not - immediately.