- PERF-232Getting issue details... STATUS
currents software configuration
Summary
Several series of tests were performed. Last version of modules were deployed, and configuration of modules changed by recommendations of Folijet team. Details in the table
performance optimized configuration, the number of partitions was also increased to 2. Import times were not significantly different, although CPU usage was reduced. Before configuration changes maxim CPU usage for SRM module was about 900% Inventory 700%
after changes SRM was maximum for CRM 300% from Inventory 250%.
BUT it should be noted that a general decrease in performance (import time) was found in all tests, even when returning to the state of the test on March 13
Modules:
Module | previous version | current version on lcp1 |
---|---|---|
Data Import Module | mod-data-import-2.3.0-SNAPSHOT.224 | mod-data-import-2.4.1 |
Source Record Manager Module | mod-source-record-manager-3.3.0-SNAPSHOT.556 | mod-source-record-manager-3.3.8 |
Source Record Storage Module | mod-source-record-storage-5.3.0-SNAPSHOT.394 | mod-source-record-storage-5.3.2 |
Inventory Module | mod-inventory-18.1.0-SNAPSHOT.493 | mod-inventory-18.1.4 |
Inventory Storage Module | mod-inventory-storage-23.0.0-SNAPSHOT.657 | mod-inventory-storage-23.0.2 |
Invoice business logic module | mod-invoice-5.3.0-SNAPSHOT.284 | mod-invoice-5.3.0 |
Performance-optimized configuration:
Module | CPU Initial | CPU Optimized | memory Initial | MEMORY optimized | java options (optimized, only in case other changes) | task instances # |
---|---|---|---|---|---|---|
Data Import Module | 128 | 128 | 2048/1024 | 2048/1024 | 1 | |
Source Record Manager Module | 512 | 512 | 2048/1844 | 1440/1296 | -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -Xmx1292m -XX:+UseG1GC | 2 |
Source Record Storage Module | 128 | 512 | 1440/1296 | 1440/1296 | -XX:MetaspaceSize=384m -XX:MaxMetaspaceSize=384m -Xmx908m -XX:+UseG1GC | 2 |
Inventory Module | 512 | 512 | 2048/1844 | 2880/2592 | -XX:MetaspaceSize=182m -XX:MaxMetaspaceSize=182m -Xmx1814m -XX:+UseG1GC | 2 |
Inventory Storage Module | 512 | 512 | 864/778 | 846/778 | 2 | |
Invoice business logic module | 128 | 128 | 1024/800 | 1024/800 | 2 | |
mod-data-import-converter-storage | 128 | 512 | 512/460 | 512/460 | 2 | |
okapi |
kafka topics related to di - number of partitions raised to 2
lcp1.Default.fs09000000.DI_COMPLETED
lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED
lcp1.Default.fs09000000.DI_ERROR
lcp1.Default.fs09000000.DI_INVENTORY_AUTHORITY_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_NOT_MATCHED
lcp1.Default.fs09000000.DI_INVENTORY_HOLDING_UPDATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED
lcp1.Default.fs09000000.DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_INVENTORY_ITEM_MATCHED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_LOG_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_MARC_FOR_UPDATE_RECEIVED
lcp1.Default.fs09000000.DI_PARSED_RECORDS_CHUNK_SAVED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_PARSED
lcp1.Default.fs09000000.DI_RAW_RECORDS_CHUNK_READ
lcp1.Default.fs09000000.DI_SRS_MARC_AUTHORITY_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_INSTANCE_HRID_SET
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_CREATED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING
lcp1.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_NOT_MATCHED
lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED
example change partition number if topic already exist
./kafka-topics.sh --alter --zookeeper\
z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
--topic lcp1.Default.fs09000000.DI_SRS_MARC_HOLDING_RECORD_CREATED --partitions 2
if topic not exist yet
./kafka-topics.sh --create --zookeeper\
z-1.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-2.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181,z-3.kafka280.xwt0mn.c17.kafka.us-east-1.amazonaws.com:2181\
--topic lcp1.Default.fs09000000.DI_EDIFACT_RECORD_CREATED --partitions 2 --replication-factor 1
additional tests (*) - added CPU UNITS for module data-import-CS from 128 to 512
(**) - cleaned tables DI (deleted records from 03 marth 2022)
DI durations
file | profile | duration | duration from Folijet - Lotus Snapshot Performance testing (for comparison) | Data Import Test report (Lotus) | duration with revert configuration just before 2022-04-26 | duration with revert configuration 2022-03-17 |
---|---|---|---|---|---|---|
5K | PTF - Create 2 2 | 04 min 04 sec | ~8 min | 3 min 54 s | 05 min 44 sec | 04 min 44 sec |
5K | Updates Success - 1 | 05 min 26 sec | ~13 min | 4 min 12 s | ||
10K | PTF - Create 2 | 07 min 00 sec (8 mins) 00:09:55.36 | ~ 19 min | 6 min 45 s | 08 min 50 sec | |
10K | Updates Success - 1 | 11 min 46 sec | ~ 25 min | 8 min 4 s | 12 mon 03 sec | |
25K | PTF - Create 2 | 25 min 48 sec (19:49.063*) (17:33.521**) | ~ 45 min | 16 min 8s | 38 min 34 sec | 22 min 52 sec |
25K | Updates Success - 1 | 29 min 08 sec | ~ 1h 31min | 19 min 50s | ||
50K | PTF - Create 2 | FAILED | ||||
50K | PTF - Create 2 | 40 min 00 (41 min) | ~ 1h 34min | 32 min 28 s |
5K create charts
mod-data-import-converter-storage we see too much CPU usage
5K update charts
50K DI charts
After cpu units after it was increased to 512 for mod-data-import-converter-storage maximum usage CPU
25 k with one partition and previous configuration of DI modules
Error logs during 50K DI fails
lcp1/mod-source-record-storage
09:45:22.977 [vert.x-worker-thread-15] ERROR KafkaConsumerWrapper [1890205eqId] Error while processing a record - id: 6 subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) offset: 185757 |
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130 |
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] |
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] |
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ~[?:?] |
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] |
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] |
at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureBase.emitFailure(FutureBase.java:75) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl.tryFail(FutureImpl.java:230) ~[ms.jar:?] |
at io.vertx.core.impl.future.PromiseImpl.tryFail(PromiseImpl.java:23) ~[ms.jar:?] |
at io.vertx.core.Promise.fail(Promise.java:89) ~[ms.jar:?] |
at org.folio.dataimport.util.RestUtil.lambda$handleResponse$0(RestUtil.java:184) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.handleFailure(HttpContext.java:396) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:390) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:365) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:332) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.fail(HttpContext.java:313) ~[ms.jar:?] |
at io.vertx.ext.web.client.impl.HttpContext.lambda$handleCreateRequest$7(HttpContext.java:536) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[ms.jar:?] |
at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[ms.jar:?] |
at io.vertx.core.impl.WorkerContext.lambda$execute$2(WorkerContext.java:104) ~[ms.jar:?] |
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[ms.jar:?] |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] |
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [ms.jar:?] |
at java.lang.Thread.run(Thread.java:829) [?:?] |
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: pvt.lb.lcp1.folio-eis.us-east-1/10.23.35.12:9130 |
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[ms.jar:?] |
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[ms.jar:?] |
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[ms.jar:?] |
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[ms.jar:?] |
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[ms.jar:?] |
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[ms.jar:?] |
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[ms.jar:?] |
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[ms.jar:?] |
... 2 more |
09:45:22.978 [vert.x-worker-thread-15] WARN KafkaConsumerWrapper [1890206eqId] Error handler has not been implemented for subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_INSTANCE_CREATED, subscriptionPattern=lcp1\.Default\.\w{1,}\.DI_INVENTORY_INSTANCE_CREATED) failures |
09:45:22.978 [vert.x-worker-thread-15] DEBUG KafkaConsumerWrapper [1890206eqId] Threshold is exceeded, preparing to resume, globalLoad: 0, currentLoad: 0, requestNo: -3689 |
revert task definitions
search for versions on March 17, 2022
aws ecs describe-task-definition --task-definition lcp1-mod-source-record-manager:11 | grep -E 'image|registeredAt'
if return to date 2022-03-14
module | date | version | last version | task definition | revert version needed |
---|---|---|---|---|---|
lcp1-mod-data-import | 2022-04-01T19:29:54.136000+00:00 | 2.4.1 | 2.4.1 | 4 | no |
lcp1-mod-source-record-manager | 2022-03-14T19:32:23.969000+00:00 | 3.3.0 | 3.3.8 | 4 | yes |
lcp1-mod-source-record-storage | 2022-03-14T19:24:36.768000+00:00 | 5.3.0 | 5.3.2 | 4 | yes |
lcp1-mod-inventory | 2022-03-14T19:24:58.524000+00:00 | 18.1.0 | 18.1.4 | 4 | yes |
lcp1-mod-inventory-storage | 022-03-14T19:22:38.645000+00:00 | 23.0.0 | 23.0.2 | 4 | yes |
lcp1-mod-invoice | 2022-03-14T19:34:23.683000+00:00 | 5.3.0 | 5.3.0 | 3 | yes |
lcp1-mod-data-import-converter-storage | 2022-03-14T19:32:04.647000+00:00 | 1.13.0 | 1.13.2 | 3 | yes |
lcp1-okapi-b | 2022-03-14T19:13:09.338000+00:00 | 4.13.0 | 4.13.2 | 6 | yes |
if return to date just before 2022-04-26
module | date | version | last version | task definition | revert version needed |
---|---|---|---|---|---|
lcp1-mod-data-import | 2022-04-01T19:29:54.136000+00:00 | 2.4.1 | 2.4.1 | 4 | no |
lcp1-mod-source-record-manager | 2022-04-08T15:56:54.536000+00:00 | 3.3.8 | 3.3.8 | 9 | no |
lcp1-mod-source-record-storage | 2022-04-06T14:21:41.597000+00:00 | 5.3.2 | 5.3.2 | 6 | no |
lcp1-mod-inventory | 2022-04-25T22:11:30.088000+00:00 | 18.1.4 | 18.1.4 | 8 | no |
lcp1-mod-inventory-storage | 2022-04-01T19:29:16.220000+00:00 | 23.0.2 | 23.0.2 | 5 | no |
lcp1-mod-invoice | 2022-03-14T19:34:23.683000+00:00 | 5.3.0 | 5.3.0 | 3 | no |
lcp1-mod-data-import-converter-storage | 2022-04-01T19:30:26.221000+00:00 | 1.13.2 | 1.13.2 | 4 | no |
lcp1-okapi-b | 2022-03-30T13:40:17.077000+00:00 | 4.13.0 | 4.13.2 | 7 | yes |