[FOLIO-1329] Missing modules on folio-snapshot-283 Created: 06/Jul/18  Updated: 12/Nov/18  Resolved: 11/Jul/18

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P3
Reporter: Heikki Levanto Assignee: Heikki Levanto
Resolution: Done Votes: 0
Labels: core, sprint41, sprint42
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint:

 Description   

Something is badly wrong with http://folio-snapshot-283.aws.indexdata.com:9130



 Comments   
Comment by Heikki Levanto [ 06/Jul/18 ]

To summarize our discussion on slack:

  • Cate could not log in to snapshot-stable
  • Marc noticed that mod-users-bl is not enabled for diku
  • Re-enabling the module manually made login work again
  • We proceeded to investigate the situation with curl, for example: curl -w'\n' http://folio-snapshot-283.aws.indexdata.com:9130/_/proxy/modules
  • We found that there are 11 tenants, diku, supertenant, and 9 from various pull requests
  • They all have a different set of modules enabled.
  • Total 206 enabled modules, with db.maxPoolSize=5 gets us very close to PG's limit of 1000 connections.
  • diku should have something like 40 modules enabled, but only has 32: These seem to be missing: folio_organization, folio_search, folio_stripes-core, folio_stripes-smart-components, mod-finance, mod-graphql, mod-user-import, okapi, and the mod-users-bl that Marc added manually
  • Nobody seems to know how to get to the logs of that box. Waiting got Malc to come on line.
  • Yesterday we had different problems with the same box, some modules were not running at all.
  • I could not find any dependency problems on the modules enabled for various tenants. mod-users-bl is required by stripes-core, but that is also not enabled for diku.
Comment by Heikki Levanto [ 06/Jul/18 ]

Loose theories about what might be behind this:

  • Out of memory, killed some processes yesterday, and the rest is follow-up problems
  • Out of database handles, may have killed some processes and/or messed with enabling the necessary modules for diku
  • A bug in some script that enables modules, and/or in Okapi itself. This is one of the few cases where we actually run multiple tenants on the same box

In any case, I would very much like to get ssh access to the box, or at least see Okapi's log, and /var/log/kern.log, and maybe more.

Comment by Heikki Levanto [ 06/Jul/18 ]

Ok, I got ssh access. First observations:

  • The machine has enough memory, has not swapped since last reboot
  • There seems to be enough disk space as well
  • Okapi logs seem to rotate very often, 10 log files, oldest from yesterday, 19 hours old.
  • No signs of priocesses being killed for lack of memory
Comment by Heikki Levanto [ 06/Jul/18 ]

grepping in the logs shows that someone (or something) has indeed being deleting modules from diku:

2018-07-06 07:10:23,974 INFO  ProxyContext         758701/proxy REQ 222.29.81.2:49466 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-bl-3.0.0-SNAPSHOT.17 okapi-2.16.0
2018-07-06 07:11:06,289 INFO  ProxyContext         640025/proxy REQ 222.29.81.2:49586 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-15.1.0-SNAPSHOT.36 okapi-2.16.0
2018-07-06 07:11:25,945 INFO  ProxyContext         280882/proxy REQ 222.29.81.2:49624 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-15.1.0-SNAPSHOT.36 okapi-2.16.0
2018-07-06 07:12:09,270 INFO  ProxyContext         444754/proxy REQ 222.29.81.2:49711 supertenant DELETE /_/proxy/tenants/diku/modules/mod-circulation-10.7.0-SNAPSHOT.159 okapi-2.16.0
2018-07-06 07:12:50,382 INFO  ProxyContext         440252/proxy REQ 222.29.81.2:49780 supertenant DELETE /_/proxy/tenants/diku/modules/okapi-2.16.0 okapi-2.16.0
2018-07-06 07:13:00,709 INFO  ProxyContext         523384/proxy REQ 222.29.81.2:49808 supertenant DELETE /_/proxy/tenants/diku/modules/mod-vendors-1.0.1-SNAPSHOT.20 okapi-2.16.0
2018-07-06 07:16:16,820 INFO  ProxyContext         502273/proxy REQ 222.29.81.2:50203 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-bl-3.0.0-SNAPSHOT.17 okapi-2.16.0
2018-07-06 07:16:34,644 INFO  ProxyContext         212664/proxy REQ 222.29.81.2:50232 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-15.1.0-SNAPSHOT.36 okapi-2.16.0
2018-07-06 07:16:57,183 INFO  ProxyContext         721562/proxy REQ 222.29.81.2:50290 supertenant DELETE /_/proxy/tenants/diku/modules/folio_inventory-1.0.3000202 okapi-2.16.0
2018-07-06 07:17:27,530 INFO  ProxyContext         877069/proxy REQ 222.29.81.2:50339 supertenant DELETE /_/proxy/tenants/diku/modules/folio_organization-2.2.100085 okapi-2.16.0
2018-07-06 07:17:57,615 INFO  ProxyContext         476217/proxy REQ 222.29.81.2:50395 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-bl-3.0.0-SNAPSHOT.17 okapi-2.16.0
2018-07-06 07:18:16,390 INFO  ProxyContext         685537/proxy REQ 222.29.81.2:50428 supertenant DELETE /_/proxy/tenants/diku/modules/folio_stripes-core-2.10.2000301 okapi-2.16.0
2018-07-06 07:18:30,217 INFO  ProxyContext         352045/proxy REQ 222.29.81.2:50444 supertenant DELETE /_/proxy/tenants/diku/modules/folio_stripes-smart-components-1.4.18000201 okapi-2.16.0
2018-07-06 07:18:43,544 INFO  ProxyContext         765735/proxy REQ 222.29.81.2:50475 supertenant DELETE /_/proxy/tenants/diku/modules/mod-vendors-1.0.1-SNAPSHOT.20 okapi-2.16.0
2018-07-06 07:19:06,439 INFO  ProxyContext         058781/proxy REQ 222.29.81.2:50521 supertenant DELETE /_/proxy/tenants/diku/modules/mod-users-bl-3.0.0-SNAPSHOT.17 okapi-2.16.0
2018-07-06 09:50:03,677 INFO  ProxyContext         108587/proxy REQ 31.108.64.95:36353 supertenant POST /_/proxy/tenants/diku/modules okapi-2.16.0
2018-07-06 10:40:33,517 INFO  ProxyContext         621982/proxy REQ 5.57.53.20:48130 supertenant POST /_/proxy/tenants/diku/upgrade okapi-2.16.0
2018-07-06 10:42:12,559 INFO  ProxyContext         950347/proxy REQ 5.57.53.20:48152 supertenant POST /_/proxy/tenants/diku/upgrade okapi-2.16.0
Comment by Heikki Levanto [ 06/Jul/18 ]

whois says that 222.29.81.2 belongs to Peking University. The timing of the requests seems to indicate manual operations, rather than a script. I think this can explain a lot! I am still not sure if I believe in malicious intent, or simple stupidity...

Comment by Hongwei Ji [ 10/Jul/18 ]

Hi Heikki Levanto I checked with CALIS folks at Beijing. It turns out that an intern was experimenting folio-testing site and did not really know what he was doing. They feel very sorry about that and assure it will not happen again.

Comment by Heikki Levanto [ 10/Jul/18 ]

I figured it was likely to be something like that. In a way, it was a good reminder that we have to use our permission system to secure our publicly visible installations.

Comment by Mike Taylor [ 10/Jul/18 ]

Yes, this has been a timely wake-up call that has come with relatively little damage.

Comment by Heikki Levanto [ 11/Jul/18 ]

I think we can close this now. The mystery is solved.

Generated at Thu Feb 08 23:12:32 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.