Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

...

Top long query for failed job on third tenant during DI Create job with 10k- SELECT jsonb,id FROM fs07000002_mod_inventory_storage.instance_holdings_item_view. Average latency- 386455.99 ms/call

Test Runs 

Test #

Scenario

Load level
1 - Concurrent Create importsDI MARC Bib Create10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
2 - Concurrent Update importsDI MARC Bib Update10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
3 - Concurrent Create imports ("smoke test") of 50K DI MARC Bib Create50k concurrently on 3 tenants 

...

Expand
titleLogs

pcp1/mod-search 

failure in bulk execution - 186 errors during all update jobs, >4000 errors during create jobs

10:59:59 [] [] [] [] WARN KafkaMessageListener Failed to index resource event [eventType: CREATE, tenantId: fs09000000, id: 5cc8ef78-cb05-49fa-8274-1cba1d660aad]

index [pcp1_instance_fs09000000], id [f7aea9b8-614e-4050-9dbd-e2f8a884c06b], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:data/write/bulk[s]] would be [16502737514/15.3gb], which is larger than the limit of [16320875724/15.1gb], real usage: [16499671264/15.3gb], new bytes reserved: [3066250/2.9mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3103382/2.9mb]]]]

feign.FeignException$Unauthorized: [401 Unauthorized] during [GET] to [http://inventory-view/instances?query=id%3D%3D%28%221e9b752b-6cc3-433b-ae90-cbafdc307cb6%22%29&limit=1] [InventoryViewClient#getInstances(CqlQuery,int)]: [Invalid token]
org.folio.search.exception.SearchOperationException: Failed to perform elasticsearch request [index=pcp1_contributor_fs09000000, type=bulkApi, message: 30,000 milliseconds timeout on connection http-outgoing-265 [ACTIVE]]


Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-294 [ACTIVE]
WARN essageBatchProcessor Failed to process batch, attempting to process resources one by one


feign.RetryableException: timeout executing GET http://inventory-view/instances?query=id%3D%3D%28%22c4ac5388-4b64-4de5-8930-d3be806c1b7f%22%20or%20%229228d6de-c8e3-4e62-85c7-f5b8b45fb649%22%20or%20%2288913517-17dc-4d90-be8b-9560c1f30a01%22%20or%20%220514f12a-680d-415d-b262-ba82f4dd3e76%22%20or%20%2205253541-bf8a-465a-b91d-b2cb4ff0944d%22%20or%20%222f7b4924-8781-4834-b25f-36fc430d8f5d%22%20or%20%22b2dad08a-05e9-41d2-8742-25c98daf7fbe%22%20or%20%22be30bb11-58da-4b06-beea-e204a0823438%22%20or%20%22930b4625-8eb3-492a-bae8-388480364e67%22%20or%20%2223925f8a-efb4-41a2-9002-48292f6419f3%22%20or%20%22765cb2e6-96a4-4b33-99fd-b38011be999f%22%20or%20%22a3666aca-4963-4e94-95ab-dd3d790ffdd3%22%20or%20%2226bb45e4-dfc0-4915-9ec8-0187d334651d%22%20or%20%220c536304-7330-46d2-a19e-9b69ca13591a%22%20or%20%22ee237647-875c-405e-8ce1-fd55f701d83b%22%20or%20%22869ba2aa-f465-42b4-b4ba-b47ccd29d6ac%22%20or%20%221aa2e06e-b647-4e7b-8fa8-9804c65e1dc1%22%20or%20%22a7096f26-0363-49b4-803c-31e5661b12de%22%20or%20%22042395fb-34a7-4412-a78a-d541ef948922%22%20or%20%2273342f96-f5b5-42d6-ab41-1ef121aef0d5%22%20or%20%22fa028d18-e45e-434d-85d2-e8c4e5db9519%22%20or%20%22c0d1a681-66ea-4cc8-bf68-5771cf8a93e9%22%20or%20%22d7ece421-5dae-4519-9d09-100f22b47007%22%20or%20%2225ca20d1-d0e1-414a-b228-b74ceaba2512%22%20or%20%2274fcad26-9ddf-4cf9-8206-f665374a37f3%22%20or%20%221fd756cd-e573-4287-9c3e-c29408ee8709%22%20or%20%22a322007c-f578-4c93-9239-e6558a393710%22%20or%20%22a2ed3841-a9d2-4364-a3fe-939e2bffbe24%22%20or%20%22b31f92f9-027e-40bf-a5b1-bb4e50241a46%22%20or%20%2204175b0b-9dea-403b-911c-82d5f6a2fbe2%22%20or%20%22f4ebc0f9-adb7-4458-a4ae-8d7996b3b4f8%22%20or%20%22e1e6f55c-7720-4a23-a73a-3202746c7c75%22%20or%20%2238a43c32-d2de-454f-b7d7-7725b5bab61e%22%20or%20%22b8f17eec-f61c-4b29-9769-3b8d91a6dae4%22%20or%20%22e93b2a0c-939b-4c31-98f9-c85cb52081eb%22%20or%20%22983fd8c1-cd4a-4087-a4b9-b1c3dd11c08e%22%20or%20%222b8a75db-2ec2-4190-8702-c4dca1067bf6%22%20or%20%22f73f7d21-deab-4af4-9163-7f374f1d56d2%22%20or%20%229819b074-3174-429e-8f7f-1d6312d9630f%22%20or%20%2256743d6d-3df9-4bbf-9495-cc9f2b95e60b%22%20or%20%229f146268-ad02-46b2-8f0d-a5f64fc8579b%22%20or%20%22fb26676f-2d20-4309-8fc5-aa1c09962618%22%20or%20%222b39c2de-6374-46af-b78e-1c83d669c991%22%20or%20%22a7fd3130-6752-40b4-b2a4-cc6f1fee0349%22%20or%20%22e7561b18-c9da-43f8-97d4-9125261de4b6%22%20or%20%221106c6ed-519b-4efa-a73c-deae1dc0570d%22%20or%20%220ea912aa-6107-490f-8855-350c7edc0060%22%20or%20%225ec701f4-af40-4464-b6a4-ba9f14ca7d28%22%20or%20%22e2a15de7-b793-4949-beed-9f56bd9cde9d%22%20or%20%22879210f9-cae8-4202-9dcd-89657a5f8113%22%29&limit=50



pcp1/mod-authtoken
org.folio.auth.authtokenmodule.tokens.TokenValidationException: Access token has expired

number of errors - 23400. The errors happen only during DI in fs07000002 tenant
filter @logStream like "pcp1/mod-authtoken"
filter @message like "ERROR FilterApi"
13:48:00 [595516/users] [fs07000002] [] [mod-authtoken] ERROR FilterApi Unable to retrieve permissions for system-user: User does not exist: 8cc96687-ea63-44cb-ab5f-a73bc6985324 request took 7 ms

...

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader


    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731


  • Number of records in DB:
    •  fs09000000
      • instances - 25.129.941
      • items - 26.299.440
      • holdings - 25.392.570
    • fs07000001
      • nstances - 10.039.613
      • items - 1.423.844
      • holdings - 10.461.259
    • fs07000002
      • nstances - 1.114.273
      • items - 1.106.537
      • holdings - 1.106.539
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

...

ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
pcp1-pvt
mod-remote-storage10(11)*3.0.024920447210243960512512
mod-data-import18(20)*3.0.71204818442561292384512
mod-authtoken13(16)*2.14.121440115251292288128
mod-configuration9(10)*5.9.22102489612876888128
mod-users-bl9(10)*7.6.021440115251292288128
mod-inventory-storage12(15)*27.0.3(27.0.4)*24096369020483076384512
mod-circulation-storage12(14)*17.1.3(17.1.7)*22880259215361814384512
mod-source-record-storage15(18)*5.7.3(5.7.5)*25600500020483500384512
mod-inventory11(14)*20.1.3(20.1.7)*22880259210241814384512
mod-di-converter-storage15(18)*2.1.2(2.1.5)*2102489612876888128
mod-circulation12(14)*24.0.8(24.0.11)*22880259215361814384512
mod-pubsub11(13)*2.11.2(2.11.3)*2153614401024922384512
mod-patron-blocks9(10)*1.9.021024896102476888128
mod-source-record-manager14(17)*3.7.4(3.7.8)*25600500020483500384512
mod-quick-marc9(11)*5.0.0(5.0.1)*1228821761281664384512
nginx-okapi92023.06.1421024896128000
okapi-b115.1.23168414401024922384512
mod-feesfines10(11)*19.0.02102489612876888128
pub-okapi92023.06.142102489612876800


Expand
titleAll modules


ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
pcp1-pvt








Fri Mar 15 17:12:45 UTC 2024








mod-remote-storage11mod-remote-storage:3.0.024920447210243960512512
mod-ncip10mod-ncip:1.14.42102489612876888128
mod-finance-storage10mod-finance-storage:8.5.021024896102470088128
mod-agreements10mod-agreements:6.0.2215921488128000
mod-ebsconet10mod-ebsconet:2.1.1212481024128700128256
edge-sip28edge-sip2:3.1.12102489612876888128
mod-organizations10mod-organizations:1.8.02102489612870088128
mod-settings11mod-settings:1.0.22102489620076888128
edge-dematic10edge-dematic:2.1.01102489612876888128
mod-data-import20mod-data-import:3.0.71204818442561292384512
mod-search20mod-search:3.0.5225922480204814405121024
mod-tags10mod-tags:2.1.02102489612876888128
mod-authtoken16mod-authtoken:2.14.121440115251292288128
edge-courses2edge-courses:1.3.02102489612876888128
mod-notify10mod-notify:3.1.02102489612876888128
mod-inventory-update10mod-inventory-update:3.2.12102489612876888128
mod-configuration10mod-configuration:5.9.22102489612876888128
mod-orders-storage10mod-orders-storage:13.6.02102489651270088128
edge-caiasoft10edge-caiasoft:2.1.02102489612876888128
mod-login-saml18mod-login-saml:2.7.12102489612876888128
mod-erm-usage-harvester11mod-erm-usage-harvester:4.4.12102489612876888128
mod-password-validator10mod-password-validator:3.1.0214401298128768384512
mod-licenses10mod-licenses:5.0.22248023121281792384512
mod-gobi10mod-gobi:2.7.12102489612870088128
mod-fqm-manager9mod-fqm-manager:1.1.0-SNAPSHOT.10782102489612876888128
mod-bulk-operations9mod-bulk-operations:1.1.723072260010241536384512
mod-graphql16mod-graphql:1.12.00102489612876888128
mod-finance10mod-finance:4.8.02102489612870088128
mod-erm-usage13mod-erm-usage:4.6.02102489612876888128
mod-copycat10mod-copycat:1.5.02102489612876888128
mod-lists8mod-lists:1.1.0-SNAPSHOT.12612102489612876888128
mod-entities-links15mod-entities-links:2.0.4225922480400144001024
mod-permissions47mod-permissions:6.5.0-SNAPSHOT.3692168415445121024384512
pub-edge9pub-edge:2023.06.142102489612876800
mod-orders10mod-orders:12.7.122048144010241024384512
edge-patron10edge-patron:5.0.02102489625676888128
edge-ncip11edge-ncip:1.9.22102489612876888128
mod-users-bl10mod-users-bl:7.6.021440115251292288128
mod-inventory-storage15mod-inventory-storage:27.0.424096369020483076384512
mod-invoice10mod-invoice:5.7.221440115251292288128
mod-user-import10mod-user-import:3.8.02102489612876888128
mod-sender10mod-sender:1.11.02102489612876888128
edge-oai-pmh8edge-oai-pmh:2.7.121512136010241440384512
mod-data-export-worker10mod-data-export-worker:3.1.223072280010242048384512
mod-rtac10mod-rtac:3.5.02102489612876888128
mod-circulation-storage14mod-circulation-storage:17.1.722880259215361814384512
mod-calendar10mod-calendar:2.5.02102489612876888128
mod-source-record-storage18mod-source-record-storage:5.7.525600500020483500384512
mod-event-config10mod-event-config:2.6.02102489612876888128
mod-courses10mod-courses:1.4.82102489612876888128
mod-inventory15mod-inventory:20.1.822880259210241814384512
mod-email10mod-email:1.16.02102489612876888128
mod-di-converter-storage18mod-di-converter-storage:2.1.52102489612876888128
mod-circulation14mod-circulation:24.0.1122880259215361814384512
mod-pubsub13mod-pubsub:2.11.32153614401024922384512
edge-orders10edge-orders:2.9.12102489612876888128
edge-rtac7edge-rtac:2.6.22102489612876888128
mod-template-engine10mod-template-engine:1.19.12102489612876888128
mod-users34mod-users:19.3.0-SNAPSHOT.6772102489612876888128
mod-patron-blocks10mod-patron-blocks:1.9.021024896102476888128
edge-fqm21edge-fqm:1.0.12102489612876888128
mod-audit10mod-audit:2.8.02102489612876888128
mod-source-record-manager17mod-source-record-manager:3.7.825600500020483500384512
nginx-edge9nginx-edge:2023.06.1421024896128000
mod-quick-marc11mod-quick-marc:5.0.11228821761281664384512
nginx-okapi9nginx-okapi:2023.06.1421024896128000
okapi-b11okapi:5.1.23168414401024922384512
mod-feesfines11mod-feesfines:19.0.02102489612876888128
mod-invoice-storage10mod-invoice-storage:5.7.021872153610241024384512
mod-service-interaction10mod-service-interaction:3.0.22204818442561290384512
mod-data-export12mod-data-export:4.8.711024896102476888128
mod-patron10mod-patron:6.0.02102489612876888128
mod-oai-pmh5mod-oai-pmh:3.12.824096369020483076384512
edge-connexion10edge-connexion:1.1.02102489612876888128
mod-kb-ebsco-java10mod-kb-ebsco-java:4.0.02102489612876888128
mod-notes10mod-notes:5.1.021024896128952384512
mod-organizations-storage10mod-organizations-storage:4.6.02102489612870088128
mod-data-export-spring12mod-data-export-spring:3.0.21204818442561536384512
mod-login10mod-login:7.10.12144012981024768384512
pub-okapi9pub-okapi:2023.06.142102489612876800
mod-eusage-reports13mod-eusage-reports:2.0.02102489612876888128



Methodology/Approach

DI tests were started from UI concurrently with 1 job on each tenant, fs09000000 first and then on fs07000001 so in total two jobs on two tenants. Then 1 job on three tenants concurrently with several seconds delay - started with tenant fs09000000, second tenant -fs07000001 and third tenant - fs07000002. 
DI Create jobs were conducted with 10k and 25k first. Then DI Update jobs.

...