Use Data Migration Jenkins job

Use Data Migration Jenkins job

Recording of presentation:

This document provides instructions on how to use the Jenkins job for data migration. The job allows users to migrate data from one release version to another in the Rancher performance environment.

It supports various parameters and follows a series of stages to ensure a successful data migration process.

 

Introduction

The Data Migration Pipeline is designed to facilitate the migration process of data from older versions of modules to the latest versions. The pipeline primarily focuses on two main objectives: measuring the time it takes to migrate data for each module and ensuring the consistency of schemas in the database after migration.

The main purpose of the Data Migration Pipeline:

  • Time Measurement: The pipeline aims to measure the time taken for data migration from older module versions to the latest versions. It provides insights into the duration of migration for individual modules as well as the overall migration process. This information helps in identifying any performance bottlenecks, optimizing migration procedures, and setting expectations for future migrations.

  • Schema Comparison: Another key purpose of the pipeline is to compare the schemas of the migrated tenant with the installed schemas. The goal is to identify any discrepancies or differences in the database schemas after the migration. If there are any variations found, the pipeline triggers the creation of a Jira ticket, notifying the team responsible for managing the schemas. This proactive approach ensures that any schema inconsistencies are promptly addressed, leading to a more stable and consistent data environment.

Parameters

The following parameters can be configured when running the data migration job:

Parameter name

Mandatory

Description

Parameter name

Mandatory

Description

folio_repository

true

Specifies the repository from which to fetch the versions of the modules.

folio_branch_src

true

Specifies the branch of the source repository for the migration.

folio_branch_dst

true

Specifies the branch of the destination repository for the migration.

backup_name

false

Sets the name of the RDS snapshot for the migration. Provide the name of the DB backup placed in folio-postgresql-backups AWS s3 bucket. 

slackChannel

true

Defines the Slack channel name to receive the migration report (optional, without the '#' symbol).

Data Migration pipeline modes

  1. Data Migration with Database Restoration (if backup_name set value): This mode of data migration involves restoring the target database from a backup before initiating the data migration process. This mode is useful when there is a need to check the time that needed for migration, ensuring a clean slate for the migration process.

  2. Data Migration without Database Restoration (if backup_name is NOT set): In this mode, the data migration process is performed without restoring the target database from a backup. This mode is typically utilized when there is a requirement to make a quick check for Schemas differences.

Differences in the run between these 2 modes: 

 

with Database Restoration

without Database Restoration

 

with Database Restoration

without Database Restoration

Parameter backup_name

set name of backup from the bucket

left empty value

Costs

more expensive (deployed RDS in AWS)

cheaper (all infrastructure run in Rancher)

Speed of run

depends on dataset and modules

(takes more time than without backup)

around 1 hour

By providing two data migration modes, the system accommodates different scenarios and allows flexibility in selecting the most suitable approach for each migration task. 

Data Migration with Database Restoration 

Stages

The data migration job follows the following stages in sequence:

  1. Init: Initializes the data migration process.

  2. Destroy data-migration project: Destroys the existing data migration project, if any.

  3. Restore data-migration project from backup: Restores the data migration project from the specified backup.

  4. Update with dst release versions: Updates the project with the destination versions.

  5. Generate Data Migration Time report: Generates a report on the data migration time.

  6. Create clean tenant: Creates a clean tenant for the data migration with the destination release versions.

  7. Get schemas difference: Retrieves the difference between updated and clear schemas.

  8. Publish HTML Reports: Publish HTML reports related to the data migration process.

  9. Create Jira tickets: Create a Jira ticket to the development team if after the Data Migration pipeline was found some difference in schemas.

  10. Send Slack notification: Sends a notification to the specified Slack channel with the migration report.

  11. Backup DB state: make a backup of fs09000000 and clean tenants. (Stage in development now)

  12. Destroy data-migration project: delete the environment. If in schemas were found some difference will destroy after 6 hours. If not - immediately.

Data Migration without Database Restoration

Stages

The data migration job follows the following stages in sequence:

  1. Init: Initializes the data migration process.

  2. Destroy data-migration project: Destroys the existing data migration project, if any.

  3. Create data-migration project: Create the data migration project from scratch with the source versions.

  4. Update with dst release versions: Updates the project with the destination versions.

  5. Generate Data Migration Time report: Generates a report on the data migration time.

  6. Create clean tenant: Creates a clean tenant for the data migration with the destination release versions.

  7. Get schemas difference: Retrieves the difference between updated and clear schemas.

  8. Publish HTML Reports: Publish HTML reports related to the data migration process.

  9. Create Jira tickets: Create a Jira ticket to the development team if after the Data Migration pipeline was found some difference in schemas.

  10. Send Slack notification: Sends a notification to the specified Slack channel with the migration report.

  11. Backup DB state: make a backup of diku and clean tenants. (Stage in development now)

  12. Destroy data-migration project: delete the environment. If in schemas were found some difference will destroy after 6 hours. If not - immediately.