[FOLIO-2741] login fails due to failure to get /service-points Created: 20/Aug/20  Updated: 01/Sep/20  Resolved: 01/Sep/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P2
Reporter: Ian Hardy Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File inventory-storage-sample.log    
Issue links:
Relates
relates to MODINVSTOR-559 Release mod-inventory-storage 19.3.2 Closed
relates to RMB-677 Close PostgreSQL connection after inv... Closed
relates to RMB-700 NPE when RestVerticle calls LogUtil.f... Closed
Sprint: DevOps: Sprint 95, DevOps: Sprint 96
Development Team: FOLIO DevOps
Release: Q3 2020

 Description   

Overview:
Now and again login will stop working on the reference environments. User enter credentials and submit will hang until timeout. Restarting mod inventory storage will resolve this. Okapi log shows login that does not get a response from mod-inventory-storage until the module is restarted.
Steps to Reproduce:
Not sure yet how to get a system into this state.

Expected Results:
user can login through the UI
Actual Results:
login hangs

Observing the request ID in the okapi logs. Notice the time difference between the early parts and the final 500.

ubuntu@ip-10-36-1-13:~$ grep -r 808230 /var/log/folio/okapi/
/var/log/folio/okapi/okapi-5.log:2020-08-20T11:22:37,786 INFO  ProxyContext         808230/service-points REQ 10.36.1.47:63632 diku GET /service-points  mod-authtoken-2.6.0-SNAPSHOT.73 mod-inventory-storage-19.4.0-SNAPSHOT.457
/var/log/folio/okapi/okapi-5.log:2020-08-20T11:22:37,894 INFO  ProxyContext         808230/service-points RES 202 107370us mod-authtoken-2.6.0-SNAPSHOT.73 http://10.36.1.13:9178/service-points?limit=10000&query=cql.allRecords%3D1%20sortby%20name
/var/log/folio/okapi/okapi-10.log:2020-08-20T12:36:53,777 WARN  ProxyContext         808230/service-points proxyRequestHttpClient failure: http://10.36.1.13:9168: Connection was closed
/var/log/folio/okapi/okapi-10.log:2020-08-20T12:36:53,777 INFO  ProxyContext         808230/service-points RES 500 4455883219us okapi proxyClient failure: mod-inventory-storage-19.4.0-SNAPSHOT.457 http://10.36.1.13:9168: Connection was closed

See the attached sample from mod-inventory-storage which shows the restart, and the request coming back since the postgres client is closed.



 Comments   
Comment by Ian Hardy [ 20/Aug/20 ]

happened again, didn't notice much new. One thing that is a little unusual was 107 open connections like this (sample from netstat)

tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58270 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58686 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58258 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58768 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58782 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58192 ESTABLISHED 
tcp        0      0 81efab5fb944:sunproxyadmin ip-10-36-1-13.ec2.internal:58918 ESTABLISHED 

after a restart those are clear and a more normatl netstat result:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 81efab5fb944:56306      ip-10-36-1-13.ec2.internal:postgresql ESTABLISHED 
tcp        0      0 81efab5fb944:56310      ip-10-36-1-13.ec2.internal:postgresql ESTABLISHED 
tcp        0      0 81efab5fb944:56302      ip-10-36-1-13.ec2.internal:postgresql ESTABLISHED 
tcp        0      0 81efab5fb944:56304      ip-10-36-1-13.ec2.internal:postgresql ESTABLISHED 
tcp        0      0 81efab5fb944:56308      ip-10-36-1-13.ec2.internal:postgresql ESTABLISHED 

watching the container after a reset I see those connections opening and getting cleaned up

Comment by Jakub Skoczen [ 21/Aug/20 ]

Cate Boerema can we get some help from Core:functional devs to get this figured out?

Comment by Cate Boerema (Inactive) [ 21/Aug/20 ]

Sure, Jakub Skoczen. What's the priority? Is this must-fix for the hotfix? Q3?

Comment by Jakub Skoczen [ 24/Aug/20 ]

Cate Boerema It seems high priority due to the fact that this breaks so frequently.

Comment by Jakub Skoczen [ 24/Aug/20 ]

Julian Ladisch Do you have any ideas about this? There is clearly an NPE in the logs Ian attached.

Comment by Cate Boerema (Inactive) [ 25/Aug/20 ]

Jakub Skoczen, do you still want help from Core Functional on this? If so, please assign it to us. Thanks

Comment by Ian Hardy [ 25/Aug/20 ]

Cate Boerema yes please. mod-inventory-storage has been requiring at least one restart a day lately. Please let us know if we can gather any more information for this.

Comment by Jakub Skoczen [ 25/Aug/20 ]

Ian Hardy we think this might be fixed once we migrate MIS over to RMB 30.2.6 and 31.0.2. Marc Johnson can you merge Adam's PR to bump the RMB version on master?

Comment by Marc Johnson [ 25/Aug/20 ]

Jakub Skoczen

can you merge Adam's PR to bump the RMB version on master?

Sure, done.

The Core Platform team can review and merge these changes autonomously (using the agreed development process)

Comment by Ian Hardy [ 26/Aug/20 ]

Thanks all, looks like the update was merged. I'll assign this back to myself to watch and see if that fixed it.

Looks promising so far. netstat shows a "normal" looking amount of connections (6 instead of 350). and no one has complained yet. If there are no more reports of mod-inventory-storage hanging I'll close this on Friday.

Comment by Ian Hardy [ 01/Sep/20 ]

update to rmb resolved this.

Generated at Thu Feb 08 23:22:54 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.