Infrastructure Issues
This page tracks infrastructure issues and time resolving them
Issue Description | Environment (carrier or FOLIO) | Troubleshoot Date | Time spent (hours) | Notes | Resolved? | By |
---|---|---|---|---|---|---|
Enabling profiler on mod-data-export, mod-source-record-storage - building mod-data-export and mod-source-record-storage from old scripts | FOLIO | 8/7, 8/11 | 3 | Seeing strange errors for mod-source-record-storage. Had to research, ask around. mod-data-export's profiler was enabled but the tasks did not stabilize, even after hours of waiting and a few times shut them all down and restarting them | Yes | Martin |
Data Export App crashing with Reactjs(white screen) | FOLIO | 8/7 | 1 | Troubleshoot mod-data-export draining connections caused by enabling/disabling profiler | Yes | Varun, Martin |
Rebuilding Okapi to enable profiler results in Hazelcast errors | FOLIO | 8/12 | 3 | 2 hours for fcp1, 1 hour for fcp1. Hongwei pointed out what to do and it quickly fixed the issues. (Could have taken 15-30 mins if had Hongwei's knowledge. | Yes | Martin |
Created a new branch of FSE configurations to feed in the custom docker entrypoint file to enable profiler on mod-data-export and mod-source-record-storage. | FOLIO | 8/11 | 2 | Used the workaround Wayne provided, still does not work | No | Martin |
InfluxDB and Telegraf data on gcp1 nodes not shipping to carrier-io box | FOLIO | 8/7, 8/10 | 1 | Closely looked at the differences of subnets in gcp1 and fcp1 (where there is no such issue). Hongwei added a rule to the carrier-io security group to enable data to flow from gcp1 to fcp1 | Yes | Martin |
Profiler on mod-data-export and mod-source-record-storage not profiling anything other than CPUandMemory | FOLIO? carrier-io | 8/12 | 2 | Looked at the logs of modules where enabling worked, compared them against these two modules' logs. Found a pattern. Follow up with carrier-io team is needed to understand more on how the profiler works. | No | Martin, Roman |
Upgrade gcp1 to latest q22020 release | FOLIO | 8/10 | 2 | Backend modules were failing to register to okapi. Restarting all modules in ECS and retrying fixed the issue. | Yes | Varun |
After upgrading gcp1 to latest q22020, Data Export App crashing with same Reactjs(white screen) | FOLIO | 8/10 | 1 | Corrupt data in the mod-data-export schema. Deleting an incorrect entry from the table fixed the issue. | Yes | Varun, Martin |
Help Gulfstream test their fix in perf env by deploying an unreleased version of mod-oai-pmh from the branch in git | FOLIO | 9/1 | 3.5 | The module was not getting deployed in gcp1 so spend 2 hours troubleshooting by digging into CloudWatch, RDS, Jenkins logs but no luck. Then Wayne posted a message in Teams that fse Jenkin's job was broken. Later, Jenkin's job was fixed soon by Wayne. Then spend the remaining 1.5 hours understanding what went wrong and redoing all steps. This time, the module was deployed successfully. | Yes | Varun |
While enabling mod-inventory-16.1.2 in bhs1, its throwing 400 POST request for mod-authtoken-2.6.0 /_/tenantpermissions failed with Search | FOLIO | 11/13 | 3 | After troubleshooting for a few hours, found out that all discovery records pointing to auto lb. Replacing auto with bhs1 fixed the issue. | Yes | Varun |
mod-inventory-storage and few other modules(mod-template-engine, mod-patron-blocks, mod-circulation) had incorrect discovery entry in the deployments table. As result, check-in and check-out were failing. | FOLIO | 11/24 | 2 | Replacing auto with bhs1 fixed the issue. Hongwei helped troubleshoot the issue. | Yes | Varun |