2020-02-14 - System Operations and Management SIG Agenda and Notes
Date
Feb 14, 2020
Attendees
@Ingolf Kuss
@Hkaplanian
@Former user (Deleted)
@Johannes Drexl
@Florian Ruckelshausen
@Anton Emelianov (Deactivated)
@Dale Arntson
@Catherine Smith
@Ian Walls
@jackie.gottlieb@duke.edu (Deactivated)
@jroot (Unlicensed)
@Robert Douglas
@Brandon Tharp
@spampell
@Christopher Creswell
@Brandon Tharp
Goals
Learn about database upgrade testing and other things
Discussion items
Time | Item | Who | Notes |
|---|---|---|---|
5 | Welcome | Ingolf |
|
| Database upgrade testing |
| TAMU team: TAMU in-place upgrade, round #2 Round 1 during Q4 upgrade (in place migration) failed. The code was not tested. Jason figured out where and logged defects. Round 2. Going from Q3 to Q4. For licenses and agreements scripts work. But other modules don't. Jason will add tickets and John Melconian are getting back to the dev teams responsible for the upgrade scripts. The scripts don't leave the system in a good and stable state. Not recoverable. No log of what went wrong. No exception handling. What completed and what didn't is unknown. There is no way to roll back. It fails badly. Not acceptable. Rollback is a requirement One can clone and test before placing in production, but we still need a graceful exist. Not sure a full rollback is reasonable at this time during the project. Need to know what it's going to do, log what it has done, if it fails, where & why, need to re-run starting at that point. We didn't know what had been done to the system. What steps have been completed and what steps have not been completed. We need some more verbous output. The system needs to tell us what is going on. Where did it fail and why. Right now, no visibility at all. We might be relying too much on OKAPI. So we have no insight. The scripts interact with the tenant API through OKAPI. There is no documentation. The scripts are running automatically when you POST "enable" to Okapi. |
| System / database backup strategies | Ingolf | All data must be in the shared Postgres outside the container. In MARCcat, does the database live inside or outside the container? It supposedly uses a database outside of the module. Data import cache's data internally for streaming. Might be an operational concern. It uses RMB streaming. Might be a problem for high availability because the cache is not shared. Postgress 12 and java8 is not compatible for embedded postgres. The project will move to a newer version of java at the right point of time. |
| Outstanding charges of the SIG | Ingolf | Create an architectural diagram → topic for another session |
| Missing documentation of the container images on hub.docker.com/u/folioci | Ingolf | see above. Ingolf will talk to Jakub. |
| Old software versions in the installation documentation; causing potential security vulnerabilities | Ingolf / hbz |
|
| Topics for next meetings
|
| Architectural diagram. Wayne & Jason are working on the one for Kubernetes-Rancher installation. Do we need to create a more general diagram, for those who do not plan to use Kubernetes/Rancher ? |