[FOLIO-2891] AWS load balancer timeout issue on Rancher Created: 25/Nov/20  Updated: 04/Jan/21  Resolved: 04/Jan/21

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P3
Reporter: Natalia Zaitseva Assignee: Stanislav Miroshnichenko
Resolution: Done Votes: 0
Labels: KT-ID
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File Screenshot 2020-11-25 at 12.02.35.png    
Sprint: DevOps: Sprint 104
Story Points: 1
Development Team: FOLIO DevOps

 Description   

Spitfire team faced the issue when one request is processing more than 1 min and end-user receives 504 Gateway Time-out response.

Detailed information:

  • team - Spitfire
  • module - mod-kb-ebsco-java
  • application name on Rancher - mod-kb-ebsco-java-poc
  • tenant - exportpoc
  • request - ... /eholdings/packages/<package_id>/resources/costperuse/export?platform=<platform_type>&fiscalYear=<year>
    example
    /eholdings/packages/36-7191/resources/costperuse/export?platform=all&fiscalYear=2019
    

Actual result:
According to the logs, request processing time is ~ 3 min

25 Nov 2020 09:50:46:723 INFO  ExportServiceImpl [] PACKAGE - 36-7191
...
25 Nov 2020 09:53:05:045 INFO  LogUtil [reqId=009076/eholdings] 192.168.179.174:33908 GET /eholdings/packages/36-7191/resources/costperuse/export platform=all&fiscalYear=2019 HTTP_1_1 200 4000110 80946 tid=exportpoc OK 

Postman response:

the response header contains

 Server: awselb/2.0 

which reffers to default timeout value for Folio



 Comments   
Comment by John Malconian [ 15/Dec/20 ]

Hi Stanislav - There are two ingress endpoints that should have idle timeout increased. This is the flow: AWS ALB Ingress->Nginx Ingress->Okapi

The settings should be made in the alb ingress manifest and the nginx/okapi ingress manifest. In both cases you need to add an annotation. On the folio-eks-us-west-2 cluster:

$ kubectl -n kube-system get ingress     
NAME                                HOSTS               ADDRESS                                                                  PORTS   AGE
alb-ingress-folio-eks-2-us-west-2   *                   f2b6996c-kubesystem-albing-accc-1096161577.us-west-2.elb.amazonaws.com   80      181d

I've added the necessary annotation, 'alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=300' to the alb ingress on folio-eks-us-west-2 already . This makes the idle timeout 5 minutes.

$ kubectl -n kube-system get ingress alb-ingress-folio-eks-2-us-west-2 -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig":
      { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:732722833398:certificate/b1e1ca4b-0f0a-41c8-baaa-8b64a1cd4e0a
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80,"HTTPS": 443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=300
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/security-groups: sg-003280ea7c76f431f
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/target-type: instance

For each of the okapi ingress manifests, add the following annotation:

nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
Comment by Stanislav Miroshnichenko [ 16/Dec/20 ]

Thank you John Malconian

Comment by Stanislav Miroshnichenko [ 18/Dec/20 ]

Added 900 sec. time out for:

  • okapi helm chart version >= 0.3.25
  • edge-oai-pmh helm chart version >= 0.1.27

Ingress 'alb-ingress-folio-eks-2-us-west-2' idle_timeout increased to 900 sec.

Annotations:

  • nginx.ingress.kubernetes.io/proxy-connect-timeout: "900"
  • nginx.ingress.kubernetes.io/proxy-read-timeout: "900"
  • nginx.ingress.kubernetes.io/proxy-send-timeout: "900"
Generated at Thu Feb 08 23:24:01 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.