Data Import Test report (Lotus)
It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.
Overview
This document contains the results of testing Data Import in Lotus to detect performance trends.
Infrastructure
- 6 m5.xlarge EC2 instances
- 2 instances of db.r6.xlarge database instances, one reader and one writer
- MSK
- 4 m5.2xlarge brokers in 2 zones
- auto.create-topics.enable = true
- log.retention.minutes=480
- mod-inventory memory
- 256 CPU units, 1814MB mem
- inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
- inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
- kafka.consumer.max.poll.records=10
- mod-inventory-storage
- 128 CPU units, 544MB mem
- mod-source-record-storage
- 128 CPU units, 908MB mem
- mod-source-record-manager
- 128 CPU units, 1292MB mem
- mod-data-import
- 128 CPU units, 1024MB mem
Software versions
- mod-data-import v2.4.0
- mod-source-record-manager v3.3.0
- mod-source-record-storage v5.3.0
- mod-inventory v18.1.0
- mod-inventory-storage v23.0.0
Results
Summary
Lotus release is more stable and faster than Kiwi. We was able to run imports up to 100K including "update success 1" job profile.
All of jobs in table below were performed one by one in a row without containers restart and errors.
Duration Lotus | Lotus rerun | Duration KIWI | ||
---|---|---|---|---|
1K MARC Create | PTF - Create 2 | 1 min 9 s | DNR | |
1K MARC Create | PTF - Updates Success - 1 | 1 min 30 s | DNR | |
2K MARC Create | PTF - Create 2 | 1 min 34 s | DNR | |
2K MARC Create | PTF - Updates Success - 1 | 1 min 54 s | DNR | |
5K MARC Create | PTF - Create 2 | 3 min 54 s | 5 min, 8 min | |
5K MARC Update | PTF - Updates Success - 1 | 4 min 12 s | 11 min, 13 min | |
10K MARC Create | PTF - Create 2 | 6 min 45 s | 8 min 12 s | 11 min , 14 min |
10K MARC Update | PTF - Updates Success - 1 | 8 min 4 s | 22 min, 24 min | |
25K MARC Create | PTF - Create 2 | 16 min 8s | 20 min 3 s | 23 mins, 25 mins, 26 mins |
25K MARC Update | PTF - Updates Success - 1 | 19 min 50s | 1 hour 20 mins (completed with errors) *, 56 mins | |
50K MARC Create | PTF - Create 2 | 32 min 28 s | 40 min 40 s | Completed with errors, 1 hr 40 mins |
50K Update | PTF - Updates Success - 1 | 39 min 5 s | 2 hr 32 mins (job stuck at 76% completion) | |
100K MARC Create | PTF - Create 2 | 1 hr 11 min | 1 hr 23 min | DNR |
100K Update | PTF - Updates Success - 1 | 1 hr 19 min | DNR | |
500K MARC Create | PTF - Create 2 | 7 hr 4 min (Completed with errors) | DNR |
Resources usage
CPU usage shows supposed trend without spikes. In each test CPU behaves as expected.
NOTE: This time we can't see spikes of events cache topic. All imports become more stable and fast in Lotus release.
500K Create analysis
Job has status "Completed with Errors" and it's finished (didn't stack)
Records created:
SRS | Instances | Holdings | Items |
---|---|---|---|
500000 | 499992 | 499992 | 499992 |
CPU usage shows stable trend as it supposed to be, without sudden spikes and anomalies
ERRORS
mod-inventory-storage
23:55:36 978559/instance-storage fs09000000 9eb67301-6f6e-468f-9b1a-6134dc39a684 mod_inventory_storage ERROR Conn ERROR: Cannot update record af8bcc5f-2aa1-49f3-9287-3a01bb118e75 because it has been changed (optimistic locking): Stored _version is 3, _version of request is 2 (23F09)
io.vertx.pgclient.PgException: ERROR: Cannot update record af8bcc5f-2aa1-49f3-9287-3a01bb118e75 because it has been changed (optimistic locking): Stored _version is 3, _version of request is 2 (23F09)
at io.vertx.pgclient.impl.codec.ErrorResponse.toException(ErrorResponse.java:31) ~?
at io.vertx.pgclient.impl.codec.QueryCommandBaseCodec.handleErrorResponse(QueryCommandBaseCodec.java:57) ~?
at io.vertx.pgclient.impl.codec.ExtendedQueryCommandCodec.handleErrorResponse(ExtendedQueryCommandCodec.java:90) ~?
at io.vertx.pgclient.impl.codec.PgDecoder.decodeError(PgDecoder.java:246) ~?
at io.vertx.pgclient.impl.codec.PgDecoder.decodeMessage(PgDecoder.java:132) ?
at io.vertx.pgclient.impl.codec.PgDecoder.channelRead(PgDecoder.java:112) ?
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) ?
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ?
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ?
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ?
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ?
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ?
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ?
at io.netty.channel.DefaultChannelPipeline.fire