The Problem (as stated in https://discuss.folio.org/t/reference-data-and-upgrades/2858)
A FOLIO upgrade, as it is currently implemented, involves replacing the set of modules enabled for a tenant with a new set of modules. Each module is responsible for upgrading its storage for the tenant in place – for example, updating existing records with new required fields, or adding a database index to improve performance.
A new version of a module (or a new module that was not previously included in the tenant’s module set) might contain new or updated reference data. It seems reasonable that an operator might choose to specify loadReference=true
in the call to the tenant install
API to load the new reference data.
As currently implemented in most modules, this will cause the module to attempt to load all reference data (not just new data). New records will be created if needed, and existing records (matched by UUID) will be overlaid.
Due to this, issues arise if the tenant has altered or deleted any of the reference data loaded by the module when it was first enabled. Any changes will of course be overwritten with the system default, and deleted records will be re-created.
More subtle problems arise if the record type in question has data constraints (for example, the requirement that a particular property be unique), and the tenant has created a new record of that type which causes a conflict with incoming reference data. As currently implemented, this kind of conflict causes the module upgrade to fail, potentially leaving the tenant data in an inconsistent state.
These kinds of issues would very likely also arise if an operator specified loadSample=true
in an upgrade, but that is currently untested, and seems like an unlikely use case, at least for production.
Upgrade Desired Behaviors
- The upgrade process should leave the system in a usable state unless a truly fatal error is encountered.
- Rigorous error handling in upgrade scripts to ensure trivial errors do not derail process
- When fatal errors occur, the output should clearly indicate where the error occurred.
- The upgrade process should produce a log output of changes made.
- The upgrade process should be able to run in a simulated mode to facilitate planning.
- The upgrade process should account for the presence of Overlay Data and preserve it entirely.
Proposal One
- Eliminate Reference Data as a category.
- Create a new category called "System Data" which holds all schema data and records currently loaded by specifying
loadReference=true
.- Make this category immutable towards the system operator/users.
- Create a category called "Overlay Data" which is changeable by system operator/users and is used to modify values or schema in "System Data".
- This will allow the system to avoid overwriting user-specified values when performing system upgrades.
Proposed Data Types and Definitions for Proposal One
Name | Definition/notes | Overwritten on upgrade | Immutable (towards system operator) | Example of data stored |
---|---|---|---|---|
System Data | Data necessary for operation of the system. These should be values that are immutable toward the user or system operator. Example: sane defaults for field labels. Note: at present, these sort of default values for certain modules are only loaded when you specify LoadReferenceData=true on module initialization. This proposes we move those into an immutable category. | YES | YES | schema_book{
} |
Overlay Data | Data that may be used to supersede System Data when a user or system operator wants to change something immutable Example: a default field label. Can be used to override any System Data, as well as introducing new values. | NO | NO | schema_book{
} |
Sample Data/User Data | Data that can be used to demo the system or is useful for providing examples to users. Data entered by users specific to an institution. This data is not necessary for the operation of the system. Example: a user record for a fake (or real) patron. This layer also holds sample data, since it's essentially real data for a fictional tenant. | NO To load example data (like diku), introduce another switch (LoadExampleData=true) | NO | book: uuid: asdf034 title: 1984 authors: George Orwell publisher: Secker & Warburg release date: June 8, 1949 blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime. |
--- Result --- | schema_book{
} book: uuid: asdf034 title: 1984 authors: George Orwell publisher: Secker & Warburg release date: June 8, 1949 blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime. |
Proposal Two
- Create a data layer on top of reference data that allows users to overlay system-provided values with local values. This will prevent overwriting user-specified values when performing system upgrades.
Proposed Data Types and Definitions for Proposal Two
Data type | Notes | Behavior on module upgrade | Examples |
System data | Data that are necessary for operation of the system. These should be values that are immutable toward the user or system operator (but may be visible in the UI as a value list, for example). | Overwrite | Inventory item statuses |
Reference data | Data that are referred to by other records in the system, which may be optionally loaded on module initialization using the loadReference tenant parameter. | Overlay | User address types Inventory controlled vocabularies |
User/sample data | Data that are created by the user, or loaded using the loadSample tenant parameter. | Upgrade | Users Inventory instances |