Dependencies between mod-pubsub kafka partitions and CICO performance(Orchid)
Overview
According to PERF-534 It's been observed that DI's performance was improved greatly when DI Kafka topics' partitions were increased to 2.
In this testing effort, we'd like to see if increasing mod-pubsub's Kafka topics partitions from 1 to two would have the same positive impact on the Check In Check Out workflow as many mod-pubsub's topics are related to circulation. We will test CICO with R/W split enabled and disabled as well.
Summary
It doesn’t looks like changing of mod-pub-sub partitions to 2 helping much.
For some tests response times were faster, and for some tests were slower (-20ms + 50ms).
There is no big benefits when enabling Read/Write split on CICO in these standalone tests. However they may appear during real life usage and/or running several high DB load workflows (such as Data Import), as it will distribute load between DB nodes.
Recommendations
The only notable observation is that mod-pub-sub Kafka topics has naming pattern include mod-pub-sub version like:
ncp5.pub-sub.fs09000000.FEE_FINE_BALANCE_CHANGED.mod-pubsub-2.7.0
ncp5.pub-sub.fs09000000.FEE_FINE_BALANCE_CHANGED.mod-pubsub-2.9.1
ncp5.pub-sub.fs09000000.FEE_FINE_BALANCE_CHANGED.mod-pubsub-2.10.0-SNAPSHOT
*which is same topic for different mod-pubsub versions.
If mod-pubsub gets updated frequently, then the old topics might still hang around and will accumulate unnecessarily. So possibly it's a good idea to exclude the version number from topic naming pattern.
Test Sets
Test # | Test Conditions | Duration | Load generator size | Load generator Memory(GiB) | Notes
|
1. | 8,20,30,75users CI/CO | 30 mins each | t3.large | 3 | 2 pub-sub partition, R/W split enabled |
2. | 8,20,30,75users CI/CO | 30 mins each | t3.large | 3 | 1 pub-sub partitions, R/W split enabled |
3. | 8,20,30,75users CI/CO | 30 mins each | t3.large | 3 | 1 pub-sub partition, R/W split disabled |
4. | 8,20,30,75users CI/CO | 30 mins each | t3.large | 3 | 2 pub-sub partitions, R/W split disabled |
Results
Below listed response times (average (avg.) 75 percentile and 95 percentile) for tests (8,20,30,75 users) with 1 and 2 mod-pub-sub Kafka topic partitions.
Also there is comparison provided between 2 and 1 partitions. Number +n mean that particular response time is slower by n ms comparing with appropriate number from 1 partition test.
With Read Write split enabled
R/W Split enabled | CI | CO | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 partitions | 1 partition | 2 partitions | 1 partition | |||||||||
avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | |
8 users | 0.476 | 0.496+2 | 0.556-4 | 0.476 | 0.494 | 0.560 | 0.763-7 | 0.784+38 | 0.890-27 | 0.770 | 0.746 | 0.917 |
20 users | 0.459-4 | 0.477-7 | 0.527-12 | 0.463 | 0.484 | 0.539 | 0.740-10 | 0.763-10 | 0.845-13 | 0.750 | 0.773 | 0.858 |
30 users | 0.456+16 | 0.482+18 | 0.530+24 | 0.440 | 0.464 | 0.506 | 0.747+8 | 0.770+9 | 0.848+6 | 0.739 | 0.761 | 0.842 |
75 users | 0.526-3 | 0.574-6 | 0.707+12 | 0.529 | 0.580 | 0.695 | 0.951-4 | 1.020-11 | 1.187+2 | 0.955 | 1.031 | 1.185 |
*Here we can see that there is no significant difference in response times between 1 and 2 partitions of mod-pub-sub kafka topics when R/W split is enabled. For some cases it's better and for some it's worse so we can conclude that it has no pattern and having 2 partitions have no benefits in response times.
Read Write split disabled
R/W Split Disabled | CI | CO | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 partitions | 1 partition | 2 partitions | 1 partition | |||||||||
avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | |
8 users | 0.484+17 | 0.497+20 | 0.561+7 | 0.467 | 0.477 | 0.554 | 0.759+24 | 0.769+26 | 0.926+70 | 0.735 | 0.743 | 0.856 |
20 users | 0.471+11 | 0.468-11 | 0.540+19 | 0.460 | 0.479 | 0.521 | 0.748+14 | 0.771+15 | 0.848+16 | 0.734 | 0.756 | 0.832 |
30 users | 0.446-7 | 0.472-6 | 0.520-2 | 0.453 | 0.479 | 0.522 | 0.727-10 | 0.749-9 | 0.824-5 | 0.737 | 0.758 | 0.829 |
75 users | 0.552 | 0.604+91 | 0.736+37 | 0.522 | 0.513 | 0.669 | 0.977+49 | 1.046+54 | 1.220 | 0.928 | 0.992 | 1.120 |
*Here we can see that there is no significant difference in response times between 1 and 2 partitions of mod-pub-sub kafka topics when R/W split is disabled. For some cases it's better and for some it's worse so we can conclude that it has no pattern and having 2 partitions have no benefits in response times.
Comparisons
Comparison between RW/Split enabled/disabled with 1 and 2 partitions
Table below shows how many milliseconds will we save or miss if we'll enable Read/Write split on DB. (R/W split disabled response times are baseline numbers for comparisons)
Notable observations:
As shown here - there is a big difference in CPU usage pattern with and without R/W split. For now I doesn't looks like it helping much, and in most cases it makes performance worse. However there possibly will be performance benefits during real life usage and/or running several high DB load workflows (such as Data Import), as it will distribute load between DB nodes.
For now - no visible pattern to conclude if R/W split is working better with 1 mod-pub-sub partition or with 2.
R/W Split Disabled | CI | CO | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 partitions | 1 partition | 2 partitions | 1 partition | |||||||||
avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | avg. | 75% | 95% | |
8 users | 0.484-8 | 0.497-1 | 0.561-5 | 0.467+9 | 0.477+17 | 0.554+6 | 0.759+4 | 0.769+15 | 0.926-36 | 0.735+35 | 0.743+3 | 0.856+61 |
20 users | 0.471-17 | 0.468+9 | 0.540-13 | 0.460+3 | 0.479+5 | 0.521+18 | 0.748-8 | 0.771-8 | 0.848-3 | 0.734+16 | 0.756+17 | 0.832+26 |
30 users | 0.446+10 | 0.472+10 | 0.520+10 | 0.453-13 | 0.479-15 | 0.522-16 | 0.727+20 | 0.749+21 | 0.824+20 | 0.737+2 | 0.758+3 | 0.829+13 |
75 users | 0.552-26 | 0.604-30 | 0.736+29 | 0.522+7 | 0.513+67 | 0.669+26 | 0.977-26 | 1.046-26 | 1.220-13 | 0.928+27 | 0.992+39 | 1.120+65 |
Comparison between current result vs initial results
Initial results was made by measuring CICO performance on snapshot version of modules
As a base for current result we'll use CICO results with Read/Write split disabled and with one mod-pub-sub Kafka partition.(As it was also input conditions for initial testing).
Initial test | CI | CO | ||
|---|---|---|---|---|
avg. | 95% | avg. | 95% | |
8 users | 0.467 534 (-14%) | 0.554 1'041 (-87%) | 0.735 909 (-23%) | 0.856 1'462 (-70%) |