- PERF-232Getting issue details... STATUS
currents software configuration
required
Modules:
Module | previous version | current version on lcp1 |
---|---|---|
Data Import Module | mod-data-import-2.3.0-SNAPSHOT.224 | mod-data-import-2.4.1 |
Source Record Manager Module | mod-source-record-manager-3.3.0-SNAPSHOT.556 | mod-source-record-manager-3.3.8 |
Source Record Storage Module | mod-source-record-storage-5.3.0-SNAPSHOT.394 | mod-source-record-storage-5.3.2 |
Inventory Module | mod-inventory-18.1.0-SNAPSHOT.493 | mod-inventory-18.1.4 |
Inventory Storage Module | mod-inventory-storage-23.0.0-SNAPSHOT.657 | mod-inventory-storage-23.0.2 |
Invoice business logic module | mod-invoice-5.3.0-SNAPSHOT.284 | mod-invoice-5.3.0 |
Performance-optimized configuration:
Module | CPU Initial | CPU Optimized | memory Initial | MEMORY optimized | java options (optimized, only in case other changes) | task instances # |
---|---|---|---|---|---|---|
Data Import Module | 128 | 128 | 2048/1024 | 2048/1024 | 1 | |
Source Record Manager Module | 512 | 512 | 2048/1844 | 1440/1296 | -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -Xmx1292m -XX:+UseG1GC | 2 |
Source Record Storage Module | 128 | 512 | 1440/1296 | 1440/1296 | -XX:MetaspaceSize=384m -XX:MaxMetaspaceSize=384m -Xmx908m -XX:+UseG1GC | 2 |
Inventory Module | 512 | 512 | 2048/1844 | 2880/2592 | -XX:MetaspaceSize=182m -XX:MaxMetaspaceSize=182m -Xmx1814m -XX:+UseG1GC | 2 |
Inventory Storage Module | 512 | 512 | 864/778 | 846/778 | 2 | |
Invoice business logic module | 128 | 128 | 1024/800 | 1024/800 | 2 | |
mod-data-import-converter-storage | 128 | 512 | 512/460 | 512/460 | 2 | |
okapi |
kafka topics related to di - number of partitions raised to 2
lcp1.Default.fs09000000.DI_COMPLETED
lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED
lcp1.Default.fs09000000.DI_ERROR
lcp1.Default.fs09000000.DI_INVENTORY_AUTHORITY_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_NOT_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_UPDATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_ITEM_MATCHED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_MARC_FOR_UPDATE_RECEIVED
lcp1.Default.fs09000000.DI_PARSED_RECORDS_CHUNK_SAVED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_PARSED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_READ
lcp1.Default.fs09000000.DI_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_INSTANCE_HRID_SET
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_NOT_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED
example change partition number if topic already exist
./kafka-topics.sh --alter --zookeeper\
z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
--topic lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED --partitions 2
if topic not exist yet
./kafka-topics.sh --create --zookeeper\
z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
--topic lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED --partitions 2 --replication-factor 1
additional tests (*) - added CPU UNITS for module data-import-CS from 128 to 512
(**) - cleaned tables DI (deleted records from 03 marth 2022)
file | profile | duration | duration from Folijet - Lotus Snapshot Performance testing (for comparison) | Data Import Test report (Lotus) |
---|---|---|---|---|
5K | PTF - Create 2 2 | 04 min 04 sec | ~8 min | 3 min 54 s |
5K | Updates Success - 1 | 05 min 26 sec | ~13 min | 4 min 12 s |
10K | PTF - Create 2 | 07 min 00 sec (8 mins) 00:09:55.36 | ~ 19 min | 6 min 45 s |
10K | Updates Success - 1 | 11 min 46 sec | ~ 25 min | 8 min 4 s |
25K | PTF - Create 2 | 25 min 48 sec (19:49.063*) (17:33.521**) | ~ 45 min | 16 min 8s |
25K | Updates Success - 1 | 29 min 08 sec | ~ 1h 31min | 19 min 50s |
50K | PTF - Create 2 | FAILED | ||
50K | PTF - Create 2 | 40 min 00 (41 min) | ~ 1h 34min | 32 min 28 s |
5K create charts
mod-data-import-converter-storage we see too much CPU usage
5K update charts
50K DI charts
After cpu units after it was increased to 512 for mod-data-import-converter-storage maximum usage CPU
25 k with one partition and previous configuration of DI modules
Error logs during 50K DI fails
lcp1/mod-source-record-storage
09:45:22.977 [vert.x-worker-thread-15] ERROR KafkaConsumerWrapper [1890205eqId] Error while processing a record - id: 6 subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) offset: 185757 |
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130 |
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] |
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] |
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] |
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] |
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] |
at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureBase.emitFailure(FutureBase.java:75) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl.tryFail(FutureImpl.java:230) ~[ms.jar:?] |
at io.vertx.core.impl.future.PromiseImpl.tryFail(PromiseImpl.java:23) ~[ms.jar:?] |
at io.vertx.core.Promise.fail(Promise.java:89) ~[ms.jar:?] |
at org.folio.dataimport.util.RestUtil.lambda$handleResponse$0(RestUtil.java:184) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.handleFailure(HttpContext.java:396) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:390) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:365) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:332) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.fail(HttpContext.java:313) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.lambda$handleCreateRequest$7(HttpContext.java:536) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[ms.jar:?] |
at io.vertx.core.impl.WorkerContext.lambda$execute$2(WorkerContext.java:104) ~[ms.jar:?] |
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[ms.jar:?] |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] |
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [ms.jar:?] |
at java.lang.Thread.run(Thread.java:829) [?:?] |
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130 |
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[ms.jar:?] |
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[ms.jar:?] |
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[ms.jar:?] |
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[ms.jar:?] |
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[ms.jar:?] |
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[ms.jar:?] |
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[ms.jar:?] |
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[ms.jar:?] |
... 2 more |
09:45:22.978 [vert.x-worker-thread-15] WARN KafkaConsumerWrapper [1890206eqId] Error handler has not been implemented for subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) failures |
09:45:22.978 [vert.x-worker-thread-15] DEBUG KafkaConsumerWrapper [1890206eqId] Threshold is exceeded, preparing to resume, globalLoad: 0, currentLoad: 0, requestNo: -3689 |
revert task definitions
aws ecs describe-task-definition --task-definition lcp1-mod-source-record-manager:11 | grep -E 'image|registeredAt'
module | date | vesion | last version | task definition |
---|---|---|---|---|
lcp1-mod-source-record-manager | 2022-03-14T19:32:23.969000+00:00 | 3.3.0 | 3.3.8 | 4 |
lcp1-mod-source-record-storage:4 | 2022-03-14T19:24:36.768000+00:00 | 5.3.0 | 5.3.2 | 4 |
lcp1-mod-inventory | 2022-03-14T19:24:58.524000+00:00 | 18.1.0 | 18.1.4 | 4 |
lcp1-mod-inventory-storage | 022-03-14T19:22:38.645000+00:00 | 23.0.0 | 23.0.2 | 4 |