Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

...

Top long query for failed job on third tenant during DI Create job with 10k- SELECT jsonb,id FROM fs07000002_mod_inventory_storage.instance_holdings_item_view. Average latency- 386455.99 ms/call

Test Runs 

Test #

Scenario

Load level
1 - Concurrent Create importsDI MARC Bib Create10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
2 - Concurrent Update importsDI MARC Bib Update10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
3 - Concurrent Create imports ("smoke test") of 50K DI MARC Bib Create50k concurrently on 3 tenants 

...

Update jobs

DB Connections

Create jobs DB connections for 2 tenants Create jobs tenants  - 710, for 3 tenants Create jobs - 870

Update jobs DB connections for 2 tenants Create jobs tenants  - 630, for 3 tenants Create jobs - 785

DB connections needed for every additional job processing concurrently on different tenant - 150.


DB Connections for Create jobs

DB Connections for Update jobs

DB load

Create jobs

...

Expand
titleLogs

pcp1/mod-search 

failure in bulk execution - 186 errors during all update jobs, >4000 errors during create jobs

10:59:59 [] [] [] [] WARN KafkaMessageListener Failed to index resource event [eventType: CREATE, tenantId: fs09000000, id: 5cc8ef78-cb05-49fa-8274-1cba1d660aad]

index [pcp1_instance_fs09000000], id [f7aea9b8-614e-4050-9dbd-e2f8a884c06b], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:data/write/bulk[s]] would be [16502737514/15.3gb], which is larger than the limit of [16320875724/15.1gb], real usage: [16499671264/15.3gb], new bytes reserved: [3066250/2.9mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3103382/2.9mb]]]]

feign.FeignException$Unauthorized: [401 Unauthorized] during [GET] to [http://inventory-view/instances?query=id%3D%3D%28%221e9b752b-6cc3-433b-ae90-cbafdc307cb6%22%29&limit=1] [InventoryViewClient#getInstances(CqlQuery,int)]: [Invalid token]
org.folio.search.exception.SearchOperationException: Failed to perform elasticsearch request [index=pcp1_contributor_fs09000000, type=bulkApi, message: 30,000 milliseconds timeout on connection http-outgoing-265 [ACTIVE]]


Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-294 [ACTIVE]
WARN essageBatchProcessor Failed to process batch, attempting to process resources one by one


feign.RetryableException: timeout executing GET http://inventory-view/instances?query=id%3D%3D%28%22c4ac5388-4b64-4de5-8930-d3be806c1b7f%22%20or%20%229228d6de-c8e3-4e62-85c7-f5b8b45fb649%22%20or%20%2288913517-17dc-4d90-be8b-9560c1f30a01%22%20or%20%220514f12a-680d-415d-b262-ba82f4dd3e76%22%20or%20%2205253541-bf8a-465a-b91d-b2cb4ff0944d%22%20or%20%222f7b4924-8781-4834-b25f-36fc430d8f5d%22%20or%20%22b2dad08a-05e9-41d2-8742-25c98daf7fbe%22%20or%20%22be30bb11-58da-4b06-beea-e204a0823438%22%20or%20%22930b4625-8eb3-492a-bae8-388480364e67%22%20or%20%2223925f8a-efb4-41a2-9002-48292f6419f3%22%20or%20%22765cb2e6-96a4-4b33-99fd-b38011be999f%22%20or%20%22a3666aca-4963-4e94-95ab-dd3d790ffdd3%22%20or%20%2226bb45e4-dfc0-4915-9ec8-0187d334651d%22%20or%20%220c536304-7330-46d2-a19e-9b69ca13591a%22%20or%20%22ee237647-875c-405e-8ce1-fd55f701d83b%22%20or%20%22869ba2aa-f465-42b4-b4ba-b47ccd29d6ac%22%20or%20%221aa2e06e-b647-4e7b-8fa8-9804c65e1dc1%22%20or%20%22a7096f26-0363-49b4-803c-31e5661b12de%22%20or%20%22042395fb-34a7-4412-a78a-d541ef948922%22%20or%20%2273342f96-f5b5-42d6-ab41-1ef121aef0d5%22%20or%20%22fa028d18-e45e-434d-85d2-e8c4e5db9519%22%20or%20%22c0d1a681-66ea-4cc8-bf68-5771cf8a93e9%22%20or%20%22d7ece421-5dae-4519-9d09-100f22b47007%22%20or%20%2225ca20d1-d0e1-414a-b228-b74ceaba2512%22%20or%20%2274fcad26-9ddf-4cf9-8206-f665374a37f3%22%20or%20%221fd756cd-e573-4287-9c3e-c29408ee8709%22%20or%20%22a322007c-f578-4c93-9239-e6558a393710%22%20or%20%22a2ed3841-a9d2-4364-a3fe-939e2bffbe24%22%20or%20%22b31f92f9-027e-40bf-a5b1-bb4e50241a46%22%20or%20%2204175b0b-9dea-403b-911c-82d5f6a2fbe2%22%20or%20%22f4ebc0f9-adb7-4458-a4ae-8d7996b3b4f8%22%20or%20%22e1e6f55c-7720-4a23-a73a-3202746c7c75%22%20or%20%2238a43c32-d2de-454f-b7d7-7725b5bab61e%22%20or%20%22b8f17eec-f61c-4b29-9769-3b8d91a6dae4%22%20or%20%22e93b2a0c-939b-4c31-98f9-c85cb52081eb%22%20or%20%22983fd8c1-cd4a-4087-a4b9-b1c3dd11c08e%22%20or%20%222b8a75db-2ec2-4190-8702-c4dca1067bf6%22%20or%20%22f73f7d21-deab-4af4-9163-7f374f1d56d2%22%20or%20%229819b074-3174-429e-8f7f-1d6312d9630f%22%20or%20%2256743d6d-3df9-4bbf-9495-cc9f2b95e60b%22%20or%20%229f146268-ad02-46b2-8f0d-a5f64fc8579b%22%20or%20%22fb26676f-2d20-4309-8fc5-aa1c09962618%22%20or%20%222b39c2de-6374-46af-b78e-1c83d669c991%22%20or%20%22a7fd3130-6752-40b4-b2a4-cc6f1fee0349%22%20or%20%22e7561b18-c9da-43f8-97d4-9125261de4b6%22%20or%20%221106c6ed-519b-4efa-a73c-deae1dc0570d%22%20or%20%220ea912aa-6107-490f-8855-350c7edc0060%22%20or%20%225ec701f4-af40-4464-b6a4-ba9f14ca7d28%22%20or%20%22e2a15de7-b793-4949-beed-9f56bd9cde9d%22%20or%20%22879210f9-cae8-4202-9dcd-89657a5f8113%22%29&limit=50



pcp1/mod-authtoken
org.folio.auth.authtokenmodule.tokens.TokenValidationException: Access token has expired

number of errors - 23400. The errors happen only during DI in fs07000002 tenant
filter @logStream like "pcp1/mod-authtoken"
filter @message like "ERROR FilterApi"
13:48:00 [595516/users] [fs07000002] [] [mod-authtoken] ERROR FilterApi Unable to retrieve permissions for system-user: User does not exist: 8cc96687-ea63-44cb-ab5f-a73bc6985324 request took 7 ms

...

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader


    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731


  • Number of records in DB:
    •  fs09000000
      • instances - 25.129.941
      • items - 26.299.440
      • holdings - 25.392.570
    • fs07000001
      • nstances - 10.039.613
      • items - 1.423.844
      • holdings - 10.461.259
    • fs07000002
      • nstances - 1.114.273
      • items - 1.106.537
      • holdings - 1.106.539
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

...