[FOLIO-2451] SPIKE: figure out data retention plan for continuous build Created: 07/Feb/20  Updated: 03/Jun/20  Resolved: 28/Feb/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Ian Hardy Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: devops, platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint: DevOps: sprint 82, DevOps: sprint 83
Development Team: FOLIO DevOps

 Description   

We have a build at snapshot-core-continuous.dev.folio.org that persists data through upgrades. One of the main reasons to build this was to allow data migrations to occur. There are some open questions on data persistence:

1. Integration tests. If we run integration tests, some data is created but not cleaned up. I've turned off integration tests for now so we don't get a pileup of all that testing data, but we need to decide if we want them to run at all. Running ui integration tests could help us detect if something has gone wrong during a migration. On the other hand it would require work from the ui team to add a cleanup step for each test that creates data. Also, this environment is less likely to be "clean" since data is persisted which calls into question whether any integration tests performed here are valid at all.

2. purging/reloading data: It would probably make sense to clean this environment by using the purge=true parameter to get a fresh install at some interval. This would give us a chance to pick up new reference data if it exists, and clean out any cruft anyone may have added. We want to balance this with the need to see actual migrations happen.



 Comments   
Comment by Craig McNally [ 12/Feb/20 ]

One thing to keep in mind is that there currently aren't any rollback scripts for data migration. So if a migration script fails part way through the process, the data may be in a funky state, leading to all sorts of unexpected behavior.

Also note that reloading data presents some potential issues as well... consider the following scenario:
1. Stat with a fresh Edelweiss environment
2. Several module upgrades occur and succeed - data for those modules is migrated
3. A module upgrade fails - data migration stops part-way through the process and data is essentially corrupt.
4. We purge and reload the standard Edelweiss data

Depending on how 4 is done we could get into trouble.

  • If this is done for ALL modules, we will be undoing any migrations that had previously succeeded
  • If we take a targeted approach and only purge/reload data for certain modules (e.g. those that had a migration fail) we avoid that problem, but opens to the door for other problems...
  • There are many places in FOLIO where cross-module references occur. So Orders references Inventory records, etc. Purging individual module data might lead to reference integrity issues, e.g. if we reload inventory data, orders might be referencing a record that no longer exists, or the inventory record is in an inconsistent state from what orders would expect.
  • It's also possible that when purging/reloading data that the particular records that were the cause of the data migration failure will be removed. Repeating the upgrade/migration after the data reload might succeed. On one hand, this might be considered desirable, but the point of this is to exercise/test the upgrades/migrations. These migrations are largely dependent on the data in the system. If the data in the system isn't varied enough to catch edge cases we're only giving a false sense of assurance.

One option might be to take a snapshot/backup of the database (or individual schema of the module being upgraded) before the migration, and if the upgrade fails, restore the backup.

Comment by Marc Johnson [ 13/Feb/20 ]

It would probably make sense to clean this environment by using the purge=true parameter to get a fresh install at some interval. This would give us a chance to pick up new reference data if it exists, and clean out any cruft anyone may have added. We want to balance this with the need to see actual migrations happen.

I think in order to best answer this question, I think we need to clarify what goals are intended to be met by our use of this environment. What are the goals of this environment?

For example, I think if the goal is to better mimic a production environment, then I don't think purging makes sense, as it seems unlikely folks would do this in their production systems.

Comment by Ian Hardy [ 13/Feb/20 ]

Thanks Craig and Marc for weighing in. I'd stop short of saying we're trying to mimic a production environment (we're just using sample data, and using snapshots instead of releases at least for this build). I think a modest goal here is that if someone writes a migration for a core module, it will get executed in this environment.

the way this is built now, purging/reloading would build from the top of master which as Craig pointed out would leave the problematic migration behind. Maybe if a full rebuild is necessary the way to do it would be to build from the platform-core commit before it broke. I realize we'd be dumping any additional data beyond the sample that may have been loaded, but I don't think we can keep track of whatever people put in there indefinitely. Maybe if we had a larger controlled sample we could regularly load in that would make things more realistic.

Comment by Jakub Skoczen [ 18/Feb/20 ]

Ian Hardy Marc Johnson Craig McNally Maybe purging is something that is done at the point where we know the environment is broken – e.g when data is corrupted due to broken migration scripts?

Craig McNally I don't think by "puring" we should assume reloading original Edelweiss (or any release after) data. If we assume that the sample data is kept up to date with the module schema we could essentially "purge and bootsrap" to the last known-working state. We could also do it on a module-by-module basis.

Comment by Marc Johnson [ 18/Feb/20 ]

Maybe purging is something that is done at the point where we know the environment is broken – e.g when data is corrupted due to broken migration scripts?

If an upgrade breaks the existing data in the system, what are the expectations for this module? Is it assumed that the data is rolled back to a known good state and then a fixed upgrade will upgrade data from that to the new state? Or is it expected that a fixed upgrade would be able to fix the data?

Comment by Ian Hardy [ 20/Feb/20 ]

If purging is done when things break as Jakub Skoczen suggests, maybe whats needed are optional parameters to purge data and build from a particular commit of platform core. Since we're just working with sample data (acknowledging that its far from a comprehensive test of a migration) the result would be rolling back to a known good state and then trying the next upgrade.

Comment by Ian Hardy [ 21/Feb/20 ]

Actually trying to build from a particular commit will not work since the build will just pick up the latest snapshots anyway. Maybe whats needed is to publish the install.json files as an artifact with each build. This would make it more transparent which modules were updated when without digging through the console, and make it possible to roll back when things are broken.

Comment by Ian Hardy [ 28/Feb/20 ]

Plan going forward:

Generated at Thu Feb 08 23:20:44 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.