Data Import Creates + Updates multi tenant with file split enabled

Overview

This document contains the results of testing concurrent Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

The purpose for this test is to define how concurrent DI affect duration of DI jobs on the central tenant and to check possible issues during smoke test with 50k DI Create job running concurrently on all 3 tenants.


Ticket:  PERF-715 - Getting issue details... STATUS

Summary

Data Import duration of 10k and 25k jobs approximately doubles when increasing the number of concurrent jobs on different tenants. This trend is consistent across the main/first tenant and other tenants.

Smoke test with 50k didn't reveal some issues. Duration for three concurrent DI Create jobs was 3x higher than one DI on the main tenant; this only confirm previous statement about the concurrency effect.

Maximum average CPU utilization was different during create and update jobs. Top two modules during DI Create jobs in mod-inventory-b - 123%, mod-quick-marc-b - 76%, Update jobs mod-inventory-b - 182%, mod-quick-marc-b - 122%.

Memory consumption was almost the same for DI create and update jobs: Nevertheless it was slightly higher for update jobs in mod-inventory-b - 98%, mod-permission-b - 79%, mod-source-record-storage-b - 73%.

RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs

DB connections were higher during DI Create jobs. With 2 tenants Create jobs - 710, for 3 tenants Create jobs - 870

Top long query for failed job on third tenant during DI Create job with 10k- SELECT jsonb,id FROM fs07000002_mod_inventory_storage.instance_holdings_item_view. Average latency- 386455.99 ms/call

Test Runs 

Test #

Scenario

Load level
1 - Concurrent Create importsDI MARC Bib Create10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
2 - Concurrent Update importsDI MARC Bib Update10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants
3 - Concurrent Create imports ("smoke test") of 50K DI MARC Bib Create50k concurrently on 3 tenants 

Test Results

As the number of concurrent Data Import jobs increases and file size grows, the duration of DI jobs grows proportionally. 

Smoke Test finished successfully for 3 concurrent DI Create jobs of 50K each.

DI Create# of testNumber of concurrent jobs

Main tenant

(fs09000000)

Second tenant

(fs07000001)

Third tenant

(fs07000002)

10KBaseline100:04:56

1200:10:4300:10:37
2300:21:1200:21:0600:20:57 *
25KBaseline100:11:24

3200:23:4400:23:30
4300:37:1100:37:0500:36:58
DI Update




10KBaseline100:06:32

5200:09:4700:11:26
6300:19:0800:19:0600:18:31
25KBaseline100:15:13

7200:30:4900:30:52
8300:47:4700:48:1700:47:54
DI Create (Smoke test)




50K9100:22:31

10301:12:5401:12:4401:12:35

* - Errors occurred only in 10K DI Create jobs running on third tenant during 3 concurrent jobs test. The errors did not reproduce during subsequent tests.

  • io.vertx.core.impl.NoStackTraceThrowable: [{"id":"cf64277b-9945-49a1-93c0-007643c46efe","error":"Timeout for DB_HOST:DB_PORT=db.pcp1.folio-eis.us-east-1:5432","holdingId":"bd17bc47-72eb-480b-8a83-e0a1bc16e0f4"}]
  • java.lang.NullPointerException: Cannot invoke "org.folio.processing.mapping.defaultmapper.processor.parameters.MappingParameters.getLinkingRules()" because "mappingParameters" is null

Service CPU Utilization

 CPU utilization comparison
ServiceCPU CreateCPU Update
mod-inventory-b122.87181.72
mod-di-converter-storage-b78.9475.21
mod-quick-marc-b75.7122.16
nginx-okapi71.7978.33
mod-source-record-storage-b47.3642.14
okapi-b36.9929.78
mod-source-record-manager-b30.4136.98
mod-inventory-storage-b24.8319.45
mod-users-b19.335.61
mod-configuration-b11.692.73
mod-permissions-b9.1918.71
mod-pubsub-b6.976.85
mod-authtoken-b6.513.44
mod-password-validator-b3.272.75
mod-feesfines-b2.292.5
mod-data-import-b1.842.09
mod-circulation-storage-b1.271.65
mod-circulation-b0.330.34
pub-okapi0.230.24

DI Create jobs

DI Update jobs


Service Memory Utilization

 Memory consumption comparison
ServiceMemory CreateMemory Update
mod-inventory-b95.1698.34
mod-permissions-b75.0379.63
mod-source-record-storage-b62.2972.77
mod-users-b61.2359.93
mod-data-import-b61.0268.28
mod-source-record-manager-b47.7654.2
okapi-b41.8442.55
mod-di-converter-storage-b34.6235.22
mod-feesfines-b28.6327.51
mod-quick-marc-b28.3930.48
mod-configuration-b27.5726.51
mod-pubsub-b24.7224.86
mod-authtoken-b21.9320.1
mod-inventory-storage-b17.2118.03
mod-circulation-storage-b17.0416.55
mod-circulation-b10.8811.13
nginx-okapi4.694.69
pub-okapi4.634.46

DI Create jobs

DI Update jobs

DB CPU Utilization

RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs

Create jobs

Update jobs

DB Connections

Create jobs DB connections for 2 tenants  - 710, for 3 tenants - 870

Update jobs DB connections for 2 tenants  - 630, for 3 tenants - 785

DB connections needed for every additional job processing concurrently on different tenant - 150.


DB Connections for Create jobs

DB Connections for Update jobs

DB load

Create jobs

Update jobs

Appendix

Errors & Exceptions

During successfully finished tests exceptions were observed:

 Logs

pcp1/mod-search 

failure in bulk execution - 186 errors during all update jobs, >4000 errors during create jobs

10:59:59 [] [] [] [] WARN KafkaMessageListener Failed to index resource event [eventType: CREATE, tenantId: fs09000000, id: 5cc8ef78-cb05-49fa-8274-1cba1d660aad]

index [pcp1_instance_fs09000000], id [f7aea9b8-614e-4050-9dbd-e2f8a884c06b], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:data/write/bulk[s]] would be [16502737514/15.3gb], which is larger than the limit of [16320875724/15.1gb], real usage: [16499671264/15.3gb], new bytes reserved: [3066250/2.9mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3103382/2.9mb]]]]

feign.FeignException$Unauthorized: [401 Unauthorized] during [GET] to [http://inventory-view/instances?query=id%3D%3D%28%221e9b752b-6cc3-433b-ae90-cbafdc307cb6%22%29&limit=1] [InventoryViewClient#getInstances(CqlQuery,int)]: [Invalid token]
org.folio.search.exception.SearchOperationException: Failed to perform elasticsearch request [index=pcp1_contributor_fs09000000, type=bulkApi, message: 30,000 milliseconds timeout on connection http-outgoing-265 [ACTIVE]]
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-294 [ACTIVE]
WARN essageBatchProcessor Failed to process batch, attempting to process resources one by one
feign.RetryableException: timeout executing GET http://inventory-view/instances?query=id%3D%3D%28%22c4ac5388-4b64-4de5-8930-d3be806c1b7f%22%20or%20%229228d6de-c8e3-4e62-85c7-f5b8b45fb649%22%20or%20%2288913517-17dc-4d90-be8b-9560c1f30a01%22%20or%20%220514f12a-680d-415d-b262-ba82f4dd3e76%22%20or%20%2205253541-bf8a-465a-b91d-b2cb4ff0944d%22%20or%20%222f7b4924-8781-4834-b25f-36fc430d8f5d%22%20or%20%22b2dad08a-05e9-41d2-8742-25c98daf7fbe%22%20or%20%22be30bb11-58da-4b06-beea-e204a0823438%22%20or%20%22930b4625-8eb3-492a-bae8-388480364e67%22%20or%20%2223925f8a-efb4-41a2-9002-48292f6419f3%22%20or%20%22765cb2e6-96a4-4b33-99fd-b38011be999f%22%20or%20%22a3666aca-4963-4e94-95ab-dd3d790ffdd3%22%20or%20%2226bb45e4-dfc0-4915-9ec8-0187d334651d%22%20or%20%220c536304-7330-46d2-a19e-9b69ca13591a%22%20or%20%22ee237647-875c-405e-8ce1-fd55f701d83b%22%20or%20%22869ba2aa-f465-42b4-b4ba-b47ccd29d6ac%22%20or%20%221aa2e06e-b647-4e7b-8fa8-9804c65e1dc1%22%20or%20%22a7096f26-0363-49b4-803c-31e5661b12de%22%20or%20%22042395fb-34a7-4412-a78a-d541ef948922%22%20or%20%2273342f96-f5b5-42d6-ab41-1ef121aef0d5%22%20or%20%22fa028d18-e45e-434d-85d2-e8c4e5db9519%22%20or%20%22c0d1a681-66ea-4cc8-bf68-5771cf8a93e9%22%20or%20%22d7ece421-5dae-4519-9d09-100f22b47007%22%20or%20%2225ca20d1-d0e1-414a-b228-b74ceaba2512%22%20or%20%2274fcad26-9ddf-4cf9-8206-f665374a37f3%22%20or%20%221fd756cd-e573-4287-9c3e-c29408ee8709%22%20or%20%22a322007c-f578-4c93-9239-e6558a393710%22%20or%20%22a2ed3841-a9d2-4364-a3fe-939e2bffbe24%22%20or%20%22b31f92f9-027e-40bf-a5b1-bb4e50241a46%22%20or%20%2204175b0b-9dea-403b-911c-82d5f6a2fbe2%22%20or%20%22f4ebc0f9-adb7-4458-a4ae-8d7996b3b4f8%22%20or%20%22e1e6f55c-7720-4a23-a73a-3202746c7c75%22%20or%20%2238a43c32-d2de-454f-b7d7-7725b5bab61e%22%20or%20%22b8f17eec-f61c-4b29-9769-3b8d91a6dae4%22%20or%20%22e93b2a0c-939b-4c31-98f9-c85cb52081eb%22%20or%20%22983fd8c1-cd4a-4087-a4b9-b1c3dd11c08e%22%20or%20%222b8a75db-2ec2-4190-8702-c4dca1067bf6%22%20or%20%22f73f7d21-deab-4af4-9163-7f374f1d56d2%22%20or%20%229819b074-3174-429e-8f7f-1d6312d9630f%22%20or%20%2256743d6d-3df9-4bbf-9495-cc9f2b95e60b%22%20or%20%229f146268-ad02-46b2-8f0d-a5f64fc8579b%22%20or%20%22fb26676f-2d20-4309-8fc5-aa1c09962618%22%20or%20%222b39c2de-6374-46af-b78e-1c83d669c991%22%20or%20%22a7fd3130-6752-40b4-b2a4-cc6f1fee0349%22%20or%20%22e7561b18-c9da-43f8-97d4-9125261de4b6%22%20or%20%221106c6ed-519b-4efa-a73c-deae1dc0570d%22%20or%20%220ea912aa-6107-490f-8855-350c7edc0060%22%20or%20%225ec701f4-af40-4464-b6a4-ba9f14ca7d28%22%20or%20%22e2a15de7-b793-4949-beed-9f56bd9cde9d%22%20or%20%22879210f9-cae8-4202-9dcd-89657a5f8113%22%29&limit=50


pcp1/mod-authtoken
org.folio.auth.authtokenmodule.tokens.TokenValidationException: Access token has expired

number of errors - 23400. The errors happen only during DI in fs07000002 tenant
filter @logStream like "pcp1/mod-authtoken"
filter @message like "ERROR FilterApi"
13:48:00 [595516/users] [fs07000002] [] [mod-authtoken] ERROR FilterApi Unable to retrieve permissions for system-user: User does not exist: 8cc96687-ea63-44cb-ab5f-a73bc6985324 request took 7 ms


Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731
  • Number of records in DB:
    •  fs09000000
      • instances - 25.129.941
      • items - 26.299.440
      • holdings - 25.392.570
    • fs07000001
      • nstances - 10.039.613
      • items - 1.423.844
      • holdings - 10.461.259
    • fs07000002
      • nstances - 1.114.273
      • items - 1.106.537
      • holdings - 1.106.539
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
pcp1-pvt
mod-remote-storage10(11)*3.0.024920447210243960512512
mod-data-import18(20)*3.0.71204818442561292384512
mod-authtoken13(16)*2.14.121440115251292288128
mod-configuration9(10)*5.9.22102489612876888128
mod-users-bl9(10)*7.6.021440115251292288128
mod-inventory-storage12(15)*27.0.3(27.0.4)*24096369020483076384512
mod-circulation-storage12(14)*17.1.3(17.1.7)*22880259215361814384512
mod-source-record-storage15(18)*5.7.3(5.7.5)*25600500020483500384512
mod-inventory11(14)*20.1.3(20.1.7)*22880259210241814384512
mod-di-converter-storage15(18)*2.1.2(2.1.5)*2102489612876888128
mod-circulation12(14)*24.0.8(24.0.11)*22880259215361814384512
mod-pubsub11(13)*2.11.2(2.11.3)*2153614401024922384512
mod-patron-blocks9(10)*1.9.021024896102476888128
mod-source-record-manager14(17)*3.7.4(3.7.8)*25600500020483500384512
mod-quick-marc9(11)*5.0.0(5.0.1)*1228821761281664384512
nginx-okapi92023.06.1421024896128000
okapi-b115.1.23168414401024922384512
mod-feesfines10(11)*19.0.02102489612876888128
pub-okapi92023.06.142102489612876800
 All modules
ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
pcp1-pvt








Fri Mar 15 17:12:45 UTC 2024








mod-remote-storage11mod-remote-storage:3.0.024920447210243960512512
mod-ncip10mod-ncip:1.14.42102489612876888128
mod-finance-storage10mod-finance-storage:8.5.021024896102470088128
mod-agreements10mod-agreements:6.0.2215921488128000
mod-ebsconet10mod-ebsconet:2.1.1212481024128700128256
edge-sip28edge-sip2:3.1.12102489612876888128
mod-organizations10mod-organizations:1.8.02102489612870088128
mod-settings11mod-settings:1.0.22102489620076888128
edge-dematic10edge-dematic:2.1.01102489612876888128
mod-data-import20mod-data-import:3.0.71204818442561292384512
mod-search20mod-search:3.0.5225922480204814405121024
mod-tags10mod-tags:2.1.02102489612876888128
mod-authtoken16mod-authtoken:2.14.121440115251292288128
edge-courses2edge-courses:1.3.02102489612876888128
mod-notify10mod-notify:3.1.02102489612876888128
mod-inventory-update10mod-inventory-update:3.2.12102489612876888128
mod-configuration10mod-configuration:5.9.22102489612876888128
mod-orders-storage10mod-orders-storage:13.6.02102489651270088128
edge-caiasoft10edge-caiasoft:2.1.02102489612876888128
mod-login-saml18mod-login-saml:2.7.12102489612876888128
mod-erm-usage-harvester11mod-erm-usage-harvester:4.4.12102489612876888128
mod-password-validator10mod-password-validator:3.1.0214401298128768384512
mod-licenses10mod-licenses:5.0.22248023121281792384512
mod-gobi10mod-gobi:2.7.12102489612870088128
mod-fqm-manager9mod-fqm-manager:1.1.0-SNAPSHOT.10782102489612876888128
mod-bulk-operations9mod-bulk-operations:1.1.723072260010241536384512
mod-graphql16mod-graphql:1.12.00102489612876888128
mod-finance10mod-finance:4.8.02102489612870088128
mod-erm-usage13mod-erm-usage:4.6.02102489612876888128
mod-copycat10mod-copycat:1.5.02102489612876888128
mod-lists8mod-lists:1.1.0-SNAPSHOT.12612102489612876888128
mod-entities-links15mod-entities-links:2.0.4225922480400144001024
mod-permissions47mod-permissions:6.5.0-SNAPSHOT.3692168415445121024384512
pub-edge9pub-edge:2023.06.142102489612876800
mod-orders10mod-orders:12.7.122048144010241024384512
edge-patron10edge-patron:5.0.02102489625676888128
edge-ncip11edge-ncip:1.9.22102489612876888128
mod-users-bl10mod-users-bl:7.6.021440115251292288128
mod-inventory-storage15mod-inventory-storage:27.0.424096369020483076384512
mod-invoice10mod-invoice:5.7.221440115251292288128
mod-user-import10mod-user-import:3.8.02102489612876888128
mod-sender10mod-sender:1.11.02102489612876888128
edge-oai-pmh8edge-oai-pmh:2.7.121512136010241440384512
mod-data-export-worker10mod-data-export-worker:3.1.223072280010242048384512
mod-rtac10mod-rtac:3.5.02102489612876888128
mod-circulation-storage14mod-circulation-storage:17.1.722880259215361814384512
mod-calendar10mod-calendar:2.5.02102489612876888128
mod-source-record-storage18mod-source-record-storage:5.7.525600500020483500384512
mod-event-config10mod-event-config:2.6.02102489612876888128
mod-courses10mod-courses:1.4.82102489612876888128
mod-inventory15mod-inventory:20.1.822880259210241814384512
mod-email10mod-email:1.16.02102489612876888128
mod-di-converter-storage18mod-di-converter-storage:2.1.52102489612876888128
mod-circulation14mod-circulation:24.0.1122880259215361814384512
mod-pubsub13mod-pubsub:2.11.32153614401024922384512
edge-orders10edge-orders:2.9.12102489612876888128
edge-rtac7edge-rtac:2.6.22102489612876888128
mod-template-engine10mod-template-engine:1.19.12102489612876888128
mod-users34mod-users:19.3.0-SNAPSHOT.6772102489612876888128
mod-patron-blocks10mod-patron-blocks:1.9.021024896102476888128
edge-fqm21edge-fqm:1.0.12102489612876888128
mod-audit10mod-audit:2.8.02102489612876888128
mod-source-record-manager17mod-source-record-manager:3.7.825600500020483500384512
nginx-edge9nginx-edge:2023.06.1421024896128000
mod-quick-marc11mod-quick-marc:5.0.11228821761281664384512
nginx-okapi9nginx-okapi:2023.06.1421024896128000
okapi-b11okapi:5.1.23168414401024922384512
mod-feesfines11mod-feesfines:19.0.02102489612876888128
mod-invoice-storage10mod-invoice-storage:5.7.021872153610241024384512
mod-service-interaction10mod-service-interaction:3.0.22204818442561290384512
mod-data-export12mod-data-export:4.8.711024896102476888128
mod-patron10mod-patron:6.0.02102489612876888128
mod-oai-pmh5mod-oai-pmh:3.12.824096369020483076384512
edge-connexion10edge-connexion:1.1.02102489612876888128
mod-kb-ebsco-java10mod-kb-ebsco-java:4.0.02102489612876888128
mod-notes10mod-notes:5.1.021024896128952384512
mod-organizations-storage10mod-organizations-storage:4.6.02102489612870088128
mod-data-export-spring12mod-data-export-spring:3.0.21204818442561536384512
mod-login10mod-login:7.10.12144012981024768384512
pub-okapi9pub-okapi:2023.06.142102489612876800
mod-eusage-reports13mod-eusage-reports:2.0.02102489612876888128


Methodology/Approach

DI tests were started from UI concurrently with 1 job on each tenant, fs09000000 first and then on fs07000001 so in total two jobs on two tenants. Then 1 job on three tenants concurrently with several seconds delay - started with tenant fs09000000, second tenant -fs07000001 and third tenant - fs07000002. 
DI Create jobs were conducted with 10k and 25k first. Then DI Update jobs.