Infrastructure Issues




This page tracks infrastructure issues and time resolving them

Issue DescriptionEnvironment (carrier or FOLIO)Troubleshoot DateTime spent (hours)NotesResolved?By
Enabling profiler on mod-data-export, mod-source-record-storage - building mod-data-export and mod-source-record-storage from old scriptsFOLIO8/7, 8/113Seeing strange errors for mod-source-record-storage. Had to research, ask around. mod-data-export's profiler was enabled but the tasks did not stabilize, even after hours of waiting and a few times shut them all down and restarting themYesMartin
Data Export App crashing with Reactjs(white screen)FOLIO8/71Troubleshoot mod-data-export draining connections caused by enabling/disabling profiler YesVarun, Martin
Rebuilding Okapi to enable profiler results in Hazelcast errorsFOLIO8/1232 hours for fcp1, 1 hour for fcp1.  Hongwei pointed out what to do and it quickly fixed the issues.  (Could have taken 15-30 mins if had Hongwei's knowledge.YesMartin
Created a new branch of FSE configurations to feed in the custom docker entrypoint file to enable profiler on mod-data-export and mod-source-record-storage. FOLIO8/112Used the workaround Wayne provided, still does not workNoMartin
InfluxDB and Telegraf data on gcp1 nodes not shipping to carrier-io boxFOLIO8/7, 8/101Closely looked at the differences of subnets in gcp1 and fcp1 (where there is no such issue). Hongwei added a rule to the carrier-io security group to enable data to flow from gcp1 to fcp1YesMartin
Profiler on mod-data-export and mod-source-record-storage not profiling anything other than CPUandMemoryFOLIO? carrier-io8/122Looked at the logs of modules where enabling worked, compared them against these two modules' logs. Found a pattern. Follow up with carrier-io team is needed to understand more on how the profiler works. NoMartin, Roman
Upgrade gcp1 to latest q22020 releaseFOLIO8/102Backend modules were failing to register to okapi. Restarting all modules in ECS and retrying fixed the issue.YesVarun
After upgrading gcp1 to latest q22020, Data Export App crashing with same Reactjs(white screen) FOLIO8/101Corrupt data in the mod-data-export schema. Deleting an incorrect entry from the table fixed the issue.YesVarun, Martin
Help Gulfstream test their fix in perf env by deploying an unreleased version of mod-oai-pmh from the branch in gitFOLIO9/13.5The module was not getting deployed in gcp1 so spend 2 hours troubleshooting by digging into CloudWatch, RDS, Jenkins logs but no luck. Then Wayne posted a message in Teams that fse Jenkin's job was broken. Later, Jenkin's job was fixed soon by Wayne. Then spend the remaining 1.5 hours understanding what went wrong and redoing all steps. This time, the module was deployed successfully. YesVarun

While enabling mod-inventory-16.1.2 in bhs1, its throwing

400
POST request for mod-authtoken-2.6.0 /_/tenantpermissions failed with Search 
FOLIO11/133After troubleshooting for a few hours, found out that all discovery records pointing to auto lb. Replacing auto with bhs1 fixed the issue.YesVarun
mod-inventory-storage and few other modules(mod-template-engine, mod-patron-blocks, mod-circulation) had incorrect discovery entry in the deployments table. As result, check-in and check-out were failing.FOLIO11/242Replacing auto with bhs1 fixed the issue. Hongwei helped troubleshoot the issue.YesVarun