Skip to end of banner
Go to start of banner

Data Import Creates + Updates multi tenant with file split enabled

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Overview

This document contains the results of testing concurrent Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

The purpose for this test is to define how concurrent DI affect duration of DI jobs on the central tenant and to check possible issues during smoke test with 50k DI Create job running concurrently on all 3 tenants.


Ticket:  PERF-715 - Getting issue details... STATUS

Summary

Data import duration approximately doubling with 10k and 25k jobs when increasing the number of concurrent jobs on different tenants. This trend is consistent across the central tenant and other tenants.

Smoke test with 50k didn't reveal some issues. Duration for 3 concurrent DI Create jobs was three times higher than one DI on central tenant that only confirm previous statement about concurrency affect.

Maximum average CPU utilization was different during create and update jobs. Top two modules during DI Create jobs in mod-inventory-b - 123%, mod-quick-marc-b - 76%, Update jobs mod-inventory-b - 182%, mod-quick-marc-b - 122%.

Memory consumption  for top 3 modules were almost the same for DI create and update jobs: mod-inventory-b - 98%, mod-permission-b - 79%, mod-source-record-storage-b - 73%.

RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs

DB connections were higher during DI Create jobs. With 2 tenants Create jobs - 710, for 3 tenants Create jobs - 870

Top long query for failed job on third tenant during DI Create job with 10k- SELECT jsonb,id FROM fs07000002_mod_inventory_storage.instance_holdings_item_view. Average latency- 386455.99 ms/call

Test Runs 

Test #

Scenario

Load levelComment
1DI MARC Bib Create10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants


2DI MARC Bib Update10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants


3DI MARC Bib Create50k concurrently on 3 tenants - smoke test

Test Results

Data import

As the number of concurrent jobs and file size grow duration of DI jobs grow proportionally. 

Smoke tests finished successfully for 3 concurrent DI Create jobs with 50k.

DI Create# of testNumber of concurrent jobsCentral tenantSecond tenantThird tenant
10kBaseline100:04:56

1200:10:4300:10:37
2300:21:1200:21:0600:20:57 *
25kBaseline100:11:24

3200:23:4400:23:30
4300:37:1100:37:0500:36:58
DI Update




10kBaseline100:06:32

5200:09:4700:11:26
6300:19:0800:19:0600:18:31
25kBaseline100:15:13

7200:30:4900:30:52
8300:47:4700:48:1700:47:54
DI Create (smoke test)




50k9100:22:31

10301:12:5401:12:4401:12:35

* - Errors occurred only in 10k DI Create jobs running on third tenant during 3 concurrent jobs test. The errors did not reproduce during subsequent tests.

  • io.vertx.core.impl.NoStackTraceThrowable: [{"id":"cf64277b-9945-49a1-93c0-007643c46efe","error":"Timeout for DB_HOST:DB_PORT=db.pcp1.folio-eis.us-east-1:5432","holdingId":"bd17bc47-72eb-480b-8a83-e0a1bc16e0f4"}]
  • java.lang.NullPointerException: Cannot invoke "org.folio.processing.mapping.defaultmapper.processor.parameters.MappingParameters.getLinkingRules()" because "mappingParameters" is null

Service CPU Utilization

 CPU utilization comparison
ServiceCPU CreateCPU Update
mod-inventory-b122.87181.72
mod-di-converter-storage-b78.9475.21
mod-quick-marc-b75.7122.16
nginx-okapi71.7978.33
mod-source-record-storage-b47.3642.14
okapi-b36.9929.78
mod-source-record-manager-b30.4136.98
mod-inventory-storage-b24.8319.45
mod-users-b19.335.61
mod-configuration-b11.692.73
mod-permissions-b9.1918.71
mod-pubsub-b6.976.85
mod-authtoken-b6.513.44
mod-password-validator-b3.272.75
mod-feesfines-b2.292.5
mod-data-import-b1.842.09
mod-circulation-storage-b1.271.65
mod-circulation-b0.330.34
pub-okapi0.230.24

DI Create jobs

DI Update jobs


Service Memory Utilization

 Memory consumption comparison
ServiceMemory CreateMemory Update
mod-inventory-b95.1698.34
mod-permissions-b75.0379.63
mod-source-record-storage-b62.2972.77
mod-users-b61.2359.93
mod-data-import-b61.0268.28
mod-source-record-manager-b47.7654.2
okapi-b41.8442.55
mod-di-converter-storage-b34.6235.22
mod-feesfines-b28.6327.51
mod-quick-marc-b28.3930.48
mod-configuration-b27.5726.51
mod-pubsub-b24.7224.86
mod-authtoken-b21.9320.1
mod-inventory-storage-b17.2118.03
mod-circulation-storage-b17.0416.55
mod-circulation-b10.8811.13
nginx-okapi4.694.69
pub-okapi4.634.46

DI Create jobs

DI Update jobs

DB CPU Utilization

RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs

Create jobs

Update jobs

DB Connections

DB connections for 2 tenants Create jobs - 710, for 3 tenants Create jobs - 870

DB connections for 2 tenants Create jobs - 630, for 3 tenants Create jobs - 785

DB connections needed for every additional job processing concurrently on different tenant - 150.

Create jobs

Update jobs

DB load

Create jobs

Update jobs

Appendix

Errors & Exceptions

During successfully finished tests exceptions were observed:

 Logs

pcp1/mod-search

10:59:59 [] [] [] [] WARN KafkaMessageListener Failed to index resource event [eventType: CREATE, tenantId: fs09000000, id: 5cc8ef78-cb05-49fa-8274-1cba1d660aad]

index [pcp1_instance_fs09000000], id [f7aea9b8-614e-4050-9dbd-e2f8a884c06b], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:data/write/bulk[s]] would be [16502737514/15.3gb], which is larger than the limit of [16320875724/15.1gb], real usage: [16499671264/15.3gb], new bytes reserved: [3066250/2.9mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3103382/2.9mb]]]]

feign.FeignException$Unauthorized: [401 Unauthorized] during [GET] to [http://inventory-view/instances?query=id%3D%3D%28%221e9b752b-6cc3-433b-ae90-cbafdc307cb6%22%29&limit=1] [InventoryViewClient#getInstances(CqlQuery,int)]: [Invalid token]
org.folio.search.exception.SearchOperationException: Failed to perform elasticsearch request [index=pcp1_contributor_fs09000000, type=bulkApi, message: 30,000 milliseconds timeout on connection http-outgoing-265 [ACTIVE]]
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-294 [ACTIVE]
WARN essageBatchProcessor Failed to process batch, attempting to process resources one by one
feign.RetryableException: timeout executing GET http://inventory-view/instances?query=id%3D%3D%28%22c4ac5388-4b64-4de5-8930-d3be806c1b7f%22%20or%20%229228d6de-c8e3-4e62-85c7-f5b8b45fb649%22%20or%20%2288913517-17dc-4d90-be8b-9560c1f30a01%22%20or%20%220514f12a-680d-415d-b262-ba82f4dd3e76%22%20or%20%2205253541-bf8a-465a-b91d-b2cb4ff0944d%22%20or%20%222f7b4924-8781-4834-b25f-36fc430d8f5d%22%20or%20%22b2dad08a-05e9-41d2-8742-25c98daf7fbe%22%20or%20%22be30bb11-58da-4b06-beea-e204a0823438%22%20or%20%22930b4625-8eb3-492a-bae8-388480364e67%22%20or%20%2223925f8a-efb4-41a2-9002-48292f6419f3%22%20or%20%22765cb2e6-96a4-4b33-99fd-b38011be999f%22%20or%20%22a3666aca-4963-4e94-95ab-dd3d790ffdd3%22%20or%20%2226bb45e4-dfc0-4915-9ec8-0187d334651d%22%20or%20%220c536304-7330-46d2-a19e-9b69ca13591a%22%20or%20%22ee237647-875c-405e-8ce1-fd55f701d83b%22%20or%20%22869ba2aa-f465-42b4-b4ba-b47ccd29d6ac%22%20or%20%221aa2e06e-b647-4e7b-8fa8-9804c65e1dc1%22%20or%20%22a7096f26-0363-49b4-803c-31e5661b12de%22%20or%20%22042395fb-34a7-4412-a78a-d541ef948922%22%20or%20%2273342f96-f5b5-42d6-ab41-1ef121aef0d5%22%20or%20%22fa028d18-e45e-434d-85d2-e8c4e5db9519%22%20or%20%22c0d1a681-66ea-4cc8-bf68-5771cf8a93e9%22%20or%20%22d7ece421-5dae-4519-9d09-100f22b47007%22%20or%20%2225ca20d1-d0e1-414a-b228-b74ceaba2512%22%20or%20%2274fcad26-9ddf-4cf9-8206-f665374a37f3%22%20or%20%221fd756cd-e573-4287-9c3e-c29408ee8709%22%20or%20%22a322007c-f578-4c93-9239-e6558a393710%22%20or%20%22a2ed3841-a9d2-4364-a3fe-939e2bffbe24%22%20or%20%22b31f92f9-027e-40bf-a5b1-bb4e50241a46%22%20or%20%2204175b0b-9dea-403b-911c-82d5f6a2fbe2%22%20or%20%22f4ebc0f9-adb7-4458-a4ae-8d7996b3b4f8%22%20or%20%22e1e6f55c-7720-4a23-a73a-3202746c7c75%22%20or%20%2238a43c32-d2de-454f-b7d7-7725b5bab61e%22%20or%20%22b8f17eec-f61c-4b29-9769-3b8d91a6dae4%22%20or%20%22e93b2a0c-939b-4c31-98f9-c85cb52081eb%22%20or%20%22983fd8c1-cd4a-4087-a4b9-b1c3dd11c08e%22%20or%20%222b8a75db-2ec2-4190-8702-c4dca1067bf6%22%20or%20%22f73f7d21-deab-4af4-9163-7f374f1d56d2%22%20or%20%229819b074-3174-429e-8f7f-1d6312d9630f%22%20or%20%2256743d6d-3df9-4bbf-9495-cc9f2b95e60b%22%20or%20%229f146268-ad02-46b2-8f0d-a5f64fc8579b%22%20or%20%22fb26676f-2d20-4309-8fc5-aa1c09962618%22%20or%20%222b39c2de-6374-46af-b78e-1c83d669c991%22%20or%20%22a7fd3130-6752-40b4-b2a4-cc6f1fee0349%22%20or%20%22e7561b18-c9da-43f8-97d4-9125261de4b6%22%20or%20%221106c6ed-519b-4efa-a73c-deae1dc0570d%22%20or%20%220ea912aa-6107-490f-8855-350c7edc0060%22%20or%20%225ec701f4-af40-4464-b6a4-ba9f14ca7d28%22%20or%20%22e2a15de7-b793-4949-beed-9f56bd9cde9d%22%20or%20%22879210f9-cae8-4202-9dcd-89657a5f8113%22%29&limit=50


pcp1/mod-authtoken
org.folio.auth.authtokenmodule.tokens.TokenValidationException: Access token has expired


Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcp1-pvt
mod-remote-storage10(11)*3.0.024920447210243960512512FALSE
mod-data-import18(20)*3.0.71204818442561292384512FALSE
mod-authtoken13(16)*2.14.121440115251292288128FALSE
mod-configuration9(10)*5.9.22102489612876888128FALSE
mod-users-bl9(10)*7.6.021440115251292288128FALSE
mod-inventory-storage12(15)*27.0.3(27.0.4)*24096369020483076384512FALSE
mod-circulation-storage12(14)*17.1.3(17.1.7)*22880259215361814384512FALSE
mod-source-record-storage15(18)*5.7.3(5.7.5)*25600500020483500384512FALSE
mod-inventory11(14)*20.1.3(20.1.7)*22880259210241814384512FALSE
mod-di-converter-storage15(18)*2.1.2(2.1.5)*2102489612876888128FALSE
mod-circulation12(14)*24.0.8(24.0.11)*22880259215361814384512FALSE
mod-pubsub11(13)*2.11.2(2.11.3)*2153614401024922384512FALSE
mod-patron-blocks9(10)*1.9.021024896102476888128FALSE
mod-source-record-manager14(17)*3.7.4(3.7.8)*25600500020483500384512FALSE
mod-quick-marc9(11)*5.0.0(5.0.1)*1228821761281664384512FALSE
nginx-okapi92023.06.1421024896128000FALSE
okapi-b115.1.23168414401024922384512FALSE
mod-feesfines10(11)*19.0.02102489612876888128FALSE
pub-okapi92023.06.142102489612876800FALSE

 * - The newest version was used in this test to compare with the previous test

Methodology/Approach

DI tests were started from UI with 5-minute pauses between the tests.


  • No labels