LCP performance testing

  PERF-232 - Getting issue details... STATUS  

currents software configuration


Summary

Several series of tests were performed. Last version of modules were deployed, and configuration of modules changed by recommendations of Folijet team. Details in the table

performance optimized configuration, the number of partitions in Kafka was also increased to 2. Import times were not significantly different, although CPU usage was reduced. Before configuration changes maxim CPU usage for SRM module was about 900% Inventory 700%

after changes SRM was maximum for CRM 300% from Inventory 250%.

BUT it should be noted that a general decrease in performance (import time) was found in all tests, even when returning to the state of the test on March 13

Аs possible reasons for the performance degradation, it is possible that Kafka or the database is not optimally configured.


Modules:


Moduleprevious versioncurrent version on lcp1
Data Import Modulemod-data-import-2.3.0-SNAPSHOT.224mod-data-import-2.4.1
Source Record Manager Modulemod-source-record-manager-3.3.0-SNAPSHOT.556mod-source-record-manager-3.3.8
Source Record Storage Modulemod-source-record-storage-5.3.0-SNAPSHOT.394mod-source-record-storage-5.3.2
Inventory Modulemod-inventory-18.1.0-SNAPSHOT.493mod-inventory-18.1.4
Inventory Storage Modulemod-inventory-storage-23.0.0-SNAPSHOT.657mod-inventory-storage-23.0.2
Invoice business logic modulemod-invoice-5.3.0-SNAPSHOT.284mod-invoice-5.3.0

Performance-optimized configuration:

ModuleCPU Initial

CPU Optimized

memory Initial MEMORY optimizedjava options (optimized, only in case other changes)task instances #
Data Import Module

128

1282048/10242048/1024
1
Source Record Manager Module5125122048/18441440/1296-XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -Xmx1292m -XX:+UseG1GC2
Source Record Storage Module

128

5121440/12961440/1296 -XX:MetaspaceSize=384m -XX:MaxMetaspaceSize=384m -Xmx908m -XX:+UseG1GC2
Inventory Module5125122048/18442880/2592-XX:MetaspaceSize=182m -XX:MaxMetaspaceSize=182m -Xmx1814m -XX:+UseG1GC2
Inventory Storage Module512512864/778846/778
2
Invoice business logic module1281281024/8001024/800
2
mod-data-import-converter-storage128512512/460512/460
2
okapi





kafka topics related to di - number of partitions raised to 2

lcp1.Default.fs09000000.DI_COMPLETED
lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED
lcp1.Default.fs09000000.DI_ERROR
lcp1.Default.fs09000000.DI_INVENTORY_AUTHORITY_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_NOT_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_UPDATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_ITEM_MATCHED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_MARC_FOR_UPDATE_RECEIVED
lcp1.Default.fs09000000.DI_PARSED_RECORDS_CHUNK_SAVED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_PARSED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_READ
lcp1.Default.fs09000000.DI_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_INSTANCE_HRID_SET
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_NOT_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED

example change partition number if topic already exist

./kafka-topics.sh --alter --zookeeper\
 z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
 --topic lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED --partitions 2

if topic not exist yet

 ./kafka-topics.sh --create --zookeeper\
 z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
 --topic lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED --partitions 2 --replication-factor 1

additional tests (*) - added CPU UNITS for module data-import-CS from 128 to 512

 (**) - cleaned tables DI (deleted records from 03 marth 2022)


DI durations

fileprofileduration

duration from Folijet - Lotus Snapshot Performance testing

(for comparison)

Data Import Test report (Lotus)

duration with revert configuration

just before 

2022-04-26

duration with revert configuration


2022-03-17

5KPTF - Create 2 204 min 04 sec~8 min3 min 54 s05 min 44 sec04 min 44 sec
5KUpdates Success - 105 min 26 sec~13 min4 min 12 s

10KPTF - Create 2

07 min 00 sec (8 mins)

00:09:55.36

~ 19 min6 min 45 s
08 min 50 sec
10KUpdates Success - 111 min 46 sec~ 25 min8 min 4 s
12 mon 03 sec
25KPTF - Create 2

25 min 48 sec (19:49.063*)

(17:33.521**)

~ 45 min16 min 8s38 min 34 sec22 min 52 sec
25KUpdates Success - 129 min 08 sec~ 1h 31min19 min 50s

50KPTF - Create 2FAILED



50KPTF - Create 240 min 00 (41 min)~ 1h 34min32 min 28 s

5K create charts

mod-data-import-converter-storage we see too much CPU usage

5K update charts


50K DI charts

After cpu units after it was increased to 512 for mod-data-import-converter-storage maximum usage CPU 


25 k with one partition and previous configuration of DI modules


Error logs during 50K DI fails

lcp1/mod-source-record-storage

09:45:22.977 [vert.x-worker-thread-15] ERROR KafkaConsumerWrapper [1890205eqId] Error while processing a record - id: 6 subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) offset: 185757

java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130

at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]

at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]

at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?]

at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]

at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]

at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ~[ms.jar:?]

at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?]

at io.vertx.core.impl.future.FutureBase.emitFailure(FutureBase.java:75) ~[ms.jar:?]

at io.vertx.core.impl.future.FutureImpl.tryFail(FutureImpl.java:230) ~[ms.jar:?]

at io.vertx.core.impl.future.PromiseImpl.tryFail(PromiseImpl.java:23) ~[ms.jar:?]

at io.vertx.core.Promise.fail(Promise.java:89) ~[ms.jar:?]

at org.folio.dataimport.util.RestUtil.lambda$handleResponse$0(RestUtil.java:184) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.handleFailure(HttpContext.java:396) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:390) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:365) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:332) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.fail(HttpContext.java:313) ~[ms.jar:?]

at io.vertx.ext.web.client.impl.HttpContext.lambda$handleCreateRequest$7(HttpContext.java:536) ~[ms.jar:?]

at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?]

at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[ms.jar:?]

at io.vertx.core.impl.WorkerContext.lambda$execute$2(WorkerContext.java:104) ~[ms.jar:?]

at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[ms.jar:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [ms.jar:?]

at java.lang.Thread.run(Thread.java:829) [?:?]

Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130

at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[ms.jar:?]

at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[ms.jar:?]

at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[ms.jar:?]

at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[ms.jar:?]

at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[ms.jar:?]

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[ms.jar:?]

at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[ms.jar:?]

at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[ms.jar:?]

... 2 more

09:45:22.978 [vert.x-worker-thread-15] WARN KafkaConsumerWrapper [1890206eqId] Error handler has not been implemented for subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) failures

09:45:22.978 [vert.x-worker-thread-15] DEBUG KafkaConsumerWrapper [1890206eqId] Threshold is exceeded, preparing to resume, globalLoad: 0, currentLoad: 0, requestNo: -3689

revert task definitions

search for versions on March 17, 2022

aws ecs describe-task-definition --task-definition  lcp1-mod-source-record-manager:11 | grep -E 'image|registeredAt'

if return to date 2022-03-14

moduledateversionlast versiontask definition

revert

version needed

lcp1-mod-data-import2022-04-01T19:29:54.136000+00:002.4.12.4.14no
lcp1-mod-source-record-manager2022-03-14T19:32:23.969000+00:003.3.03.3.84yes
lcp1-mod-source-record-storage2022-03-14T19:24:36.768000+00:005.3.05.3.24yes
lcp1-mod-inventory2022-03-14T19:24:58.524000+00:0018.1.018.1.44yes
lcp1-mod-inventory-storage022-03-14T19:22:38.645000+00:0023.0.023.0.24yes
lcp1-mod-invoice2022-03-14T19:34:23.683000+00:005.3.05.3.03yes
lcp1-mod-data-import-converter-storage2022-03-14T19:32:04.647000+00:001.13.01.13.23yes
lcp1-okapi-b2022-03-14T19:13:09.338000+00:004.13.04.13.26yes

if return to date just before 2022-04-26

moduledateversionlast versiontask definitionrevert version needed
lcp1-mod-data-import2022-04-01T19:29:54.136000+00:002.4.12.4.14no
lcp1-mod-source-record-manager2022-04-08T15:56:54.536000+00:003.3.83.3.89no
lcp1-mod-source-record-storage2022-04-06T14:21:41.597000+00:005.3.25.3.26no
lcp1-mod-inventory2022-04-25T22:11:30.088000+00:0018.1.418.1.48no
lcp1-mod-inventory-storage2022-04-01T19:29:16.220000+00:0023.0.223.0.25no
lcp1-mod-invoice2022-03-14T19:34:23.683000+00:005.3.05.3.03no
lcp1-mod-data-import-converter-storage2022-04-01T19:30:26.221000+00:001.13.21.13.24no
lcp1-okapi-b2022-03-30T13:40:17.077000+00:004.13.04.13.27yes