Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

...

TimeItemWhoNotes
1 minScribeAll

Jeremy Huff is next, followed by  Marc Johnson 

Reminder:  Please copy/paste the Zoom chat into the notes.  If you miss it, this is saved along with the meeting recording, but having it here has benefits. 

1 minRemindersAll

Quick reminders to TC members...

  1. Please review Marc's draft changes to the Ramsons OST page.  See thread in #tc-internal for additional context/details. 
    1. Ideally we will have time to go over this (and feedback) together on .  Depending on how that goes we may or may not need a dedicated hour.
  2. Please review the PR for proposed changes to the TCR process:  https://github.com/folio-org/tech-council/pull/55
    1. This will be the topic of discussion on  
  3. Please review the adjustments to the OST page wrt timing of upkeep activities. 
    1. Like #1 I'm hoping we can get to this on  
    2. From Maccabee Levine in #tc-internal:

Please see the thread above about an error on the OST page for when to do one of the status transitions. I have added two new example tables to the OST page, in place of the examples that were listed previously:

  • one which shows all the dates & triggering events that would affect the Quesnelia OST page
  • and one that shows all the events during the Quesnelia release cycle that affect various OST pages (for Orchid, Quesnelia, Ramsons and Sunflower).

Wish I could do a PR on Confluence, but I just published the changes, here's the diff and we can always revert or adjust further.

*Direct DB Migration ScriptsAll

Context/Background:

From Ingolf Kuss in #tc-internal.  See full thread here.

Sys Ops SIG wants to reach out formally to the TC because of the topic of direct db upgrade scripts in Poppy migration.
In the Poppy release notes, there are a number of db scripts described which need to be run by the operator after migration of the tenant.
I find the following scripts in the action column of the Release Notes:
  - Script 3 of https://wiki.folio.org/display/FOLIOtips/Scripts+for+Inventory%2C+Source+Record+Storage%2C+and+Data+Import+Cleanup
  - https://wiki.folio.org/display/FOLIJET/Call-numbers+migration
  - https://wiki.folio.org/display/FOLIJET/Authorities+migration
  - https://wiki.folio.org/display/FOLIOtips/Migration+scripts+for+OAI-PMH
  - https://wiki.folio.org/display/FOLIJET/Scripts+to+populate+marc_indexers+version
  - https://wiki.folio.org/display/FOLIJET/Adding+a+new+member+tenant+to+consortium.+mod-entities-links+scope
So far, in earlier releases, those kind of scripts have (in almost all cases) been part of the module, and have been triggered automatically when the new module is first being enabled for the tenant, while the old module is still enabled for the tenant (the old module is then being disabled and removed in the course of the upgrade).
SysOps SIG strongly feels that some of these scripts should be handled in that way: to be part of the module upgrade triggered by enablement for the tenant. FOLIO operators at Index Data find it pretty burdensome to deal with the upgrade scripts with many tenants in a multiple environments. Other members of Sys Ops agree and are confused why those script are not part of the modules's db migration.

If the migration is long-running (e.g. 4-5 hours), it appears reasonable to put it into a separate script. However, Sys Ops think, it should be available by some post-upgrade API, like /inventory-storage/migrations/jobs is for inventory migration.

If a decision has to be made during upgrade, in a ideal world, a tenant admin (not a sysadmin) should get notified (via UI) about that he/she has to make a decision. Until the decision has been made the module may stop to work as usual.

We think the TC could document some standard expectation for the POs.

Also, SysOps should be involved in the release retrospective.

Notes:

  • ...


Ingolf Kuss mentioned that Wayne from index data broached the subject of the number of database update scripts, and if this is an ideal situation. The unanimous response was that, "no", this isn't ideal.

Aleksey Petrenko, agreed that it is not ideal and clarified that this is not specific to Poppy.

Taras Spashchenko explains why this decision was made: performance tests with plain SQL scripts were taking up to 14 hours. This was not acceptable, and the scripts were rewritten by splitting the data sets into separate chunks (16). For each chunk it took an hour, but they can be run in parallel. This decision was made to save time with the overall runtime of the migration. This approach also allows for remediation per chunk if something goes wrong.

Craig McNally is this decision handled on a case by case basis, or are there guidelines?

Taras Spashchenko it was made on based on the specific circumstances.

Marc Johnson what Ingolf is wanting to talk about is a general set of procedures for this process. Marc is questioning what the origin of the practice of splitting sql into chunks came from.

Aleksey Petrenko says that this change is an improvement when there is a significant amount of data.

Marc Johnson these changes might be improvements, but where do we want this conversation to go? He is hearing that this is what we have to do, and others are saying they are not happy with this approach.

Craig McNally It is helpful to hear the background on how this decision was made. Since the improvement brought the runtime from 15 hours to 15 minutes, it might be run inbound. Can we do these optimizations beforehand, so the need to split the scripts into out of band is not needed.

Aleksey Petrenko: Is would be good to get feedback from EBSCO. It would be beneficial to involve team leads who have performed these migrations.  

Taras Spashchenko  when new fields have been added to the instance table that needs to be populated with values based on the holdings and items data, because the sql update takes place on single thread, this is a candidate for paralyzation. In regards to callnumber updates, the update requires an update of the json object. As a single update it does not take advantage of the DB resources, and parallelization.

Marc Johnson if we are going to talk about specific examples we should get the dev teams observation. It would be good to have an EBSCO rep in the sys ops sig. He appreciates the background information, but is not sure if this information will help us move forward with the questions at hand.

Ingolf Kuss he is hearing for the first time that it is necessary to run these updates in parallel. Maybe this should be expressed in the release notes. Ingolf Kuss has invited EBSCO representatives to the sysops sig.

Craig McNally It seems clear keeping these as in-band scripts is not going to work. It is also a pain point to run these out of band. It is doable but not ideal. It is better but still inconvenient. Maybe it is sufficient for us to just have a better understanding of the situation, and maybe we can document these things as general guidelines for how to approach these decisions. It is being handled on case by case basis. Can we parallelize in the inbound process?

Ingolf Kuss there was not enough time to test this. This explanation helps him understand, and we can produce stadardize documentation.

Craig McNally Documenting what the process is will help. If we can look at improvements that will also be useful.

Florian Gleixner Do out of band running of scripts need to be run on a FOLIO system that is pristine. Maybe inbound scripts are better since the usage of modules during migration can be controlled. There are possible situation where the upgrade of one tenant may have a negative impact on other tenants. Even if this upgrade does nto present these issues, future updates may.

Jeremy Huff what are the blockers for parallelization during an in-bound upgrade.

Taras Spashchenko RMB cannot parallelize database interactions. Spring modules may be able to do this with some changes. Maybe providing some sort of driver script for the out of band script approach might make sense.

Maccabee Levine what is really missing here is documenting the out of band approach. 

Craig McNally Communication is always important. He is not sure if this was mentioned in the release notes of poppy.

Florian Gleixner two questions, do you have to shut down the tenant for upgrade (the answer is yes), and how big was the tenant which took 14 hours to upgrade.

Taras Spashchenko it was 9 million records. 

Florian Gleixner Maybe we only do parallelism on large tenants? For the idea of providing a shell script with the out of bound scripts, this would be nice to have but it is not necessary. 

Craig McNally If the script could be parameterized for number of threads or data chunks, this could be a good idea.

Ingolf Kuss he thinks a shell script could be helpful. What sort of deployment is documented. What sort of deployment should we document the process for, he feels it should only be for the single server. Sometimes you need to deactivate kafka during the upgrade. He has heard that jroot  does this. Was this a factor in this upgrade?

Craig McNally would prefer if this question was addressed in the sysops group

Aleksey Petrenko appreciates this feedback. He is happy to see us in the development team, feel free to join.

Craig McNally maybe there could be a tighter integration between development and sysops. Better communication between these two groups might make sense. What are the concrete action items. We want to document what is the decision process is for when migrations need to be split out into asynchronous migrations. Do we have a volunteer?

Marc Johnson The TC has limited experience with this. It would be better for this documentation to be produces by the people who have the most experience with this.

Taras Spashchenko will draft the rationale that was used for poppy, and we can derive general rules from that.

Craig McNally we can use the poppy release as a case study for creating general guidelines. We can also take a look at how these can be improved. We will have additional follow conversations about this topic.


NAZoom Chat


...