Folijet - Morning Glory Snapshot Performance testing
The following resources are used:
- 6 m4large EC2 spot instances for Kubernetes cluster;
- 1 db.r5.xlarge instance for RDS service (writer)
- one m5.large per 2 zones for kafks on MSK
Previous Lotus testing performance results:
Lotus Snapshot Performance testing
Modules:
Data Import Module (mod-data-import-2.5.0-SNAPSHOT.231)
Source Record Manager Module (mod-source-record-manager-3.4.0-SNAPSHOT.621)
Source Record Storage Module (mod-source-record-storage-5.4.0-SNAPSHOT.426)
Inventory Module (mod-inventory-18.2.0-SNAPSHOT.537) - mod-inventory-18.0.0
Inventory Storage Module (mod-inventory-storage-23.1.0-SNAPSHOT.692)
Data Import Converter Storage (mod-data-import-converter-storage-1.14.0-SNAPSHOT.202)
Invoice business logic module (mod-invoice-5.4.0-SNAPSHOT.306)
Data Export Module (mod-data-export-4.5.0-SNAPSHOT.319)
Performance-optimized configuration:
Folio
MAX_REQUEST_SIZE = 4000000 (for all modules)
Kafka
2 Tasks for all DI Modules (except mod-data-import)
2 Partition for all DI Kafka topics
Please Notice: an environment should be configured in such a way that for every Kafka topic there are as many partitions as many instances created for a module connected to that topic
Examples:
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --delete --topic perf-eks-folijet.Default.fs09000000.DI_ERROR
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --create --topic perf-eks-folijet.Default.fs09000000.DI_ERROR --partitions 2 --replication-factor 1
Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic perf-eks-folijet.Default.fs09000000.DI_ERROR.
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --describe --topic perf-eks-folijet.Default.fs09000000.DI_ERROR
Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR PartitionCount: 2 ReplicationFactor: 1 Configs: min.insync.replicas=1,message.format.version=2.6-IV0,unclean.leader.election.enable=true
Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR Partition: 1 Leader: 2 Replicas: 2 Isr: 2
JVM
mod-data-import: -XX:MaxRAMPercentage=85.0 -XX:+UseG1GC / cpu: 128m/192m | memory: 1Gi/1Gi
mod-source-record-manager: -XX:MaxRAMPercentage=65 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / DB_RECONNECTATTEMPTS = 3 / DB_RECONNECTINTERVAL = 1000 / cpu: 512m/1024m | memory: 1844Mi / 2Gi
mod-source-record-storage: -XX:MaxRAMPercentage=65 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 1296Mi/1440Mi
mod-inventory: -XX:MaxRAMPercentage=80 -XX:MetaspaceSize=120M -XX:+UseG1GC -Dorg.folio.metadata.inventory.storage.type=okapi / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 2592Mi/2880Mi
mod-inventory-storage: -XX:MaxRAMPercentage=80 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 1024Mi/1200Mi
Tests:
env | profile | records number | time in Morning Glory | time in Lotus | Kafka partition number | module instance number | CPU | description |
---|---|---|---|---|---|---|---|---|
MG Perf Rancher | PTF Create - 2 | 5000 | 7 min | 8 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-10T07:44:27.576+00:00 2022-06-10T07:51:11.140+00:00 |
MG Perf Rancher | PTF Create - 2 | 5000 | 7 min | 8 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ -Ddi.flow.control.enable=false 2022-06-14T10:24:44.093+00:00 2022-06-14T10:31:54.725+00:00 |
MG Perf Rancher | PTF Update - 1 | 5000 | 11 min | 13 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-20T06:18:46.748+00:00 2022-06-20T06:30:02.991+00:00 |
MG Perf Rancher | PTF Create - 2 | 10`000 | 16 min | 19 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-10T07:54:23.720+00:00 2022-06-10T08:08:48.484+00:00 |
MG Perf Rancher | PTF Create - 2 | 10`000 | 16 min | 19 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-14T10:36:41.482+00:00 2022-06-14T10:53:03.556+00:00 |
MG Perf Rancher | PTF Update - 1 | 10`000 | 22 min | 25 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-20T07:07:00.594+00:00 2022-06-20T07:28:54.905+00:00 |
MG Perf Rancher | PTF Create - 2 | 50`000 | 59 min | 1h 25min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-10T08:12:29.178+00:00 2022-06-10T09:11:34.642+00:00 |
MG Perf Rancher | PTF Update - 1 | 50`000 | 1h 42 min | 2h 17min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-20T09:11:41.701+00:00 2022-06-20T10:54:29.378+00:00 |
MG Perf Rancher | PTF Create - 2 | 100`000 | 2h 20min | 2h 24min (22 errors) | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-13T09:30:35.574+00:00 2022-06-13T12:26:52.484+00:00 |
MG Perf Rancher | PTF Update - 1 | 100`000 | 2h 49min | 4h 40min (tests were made for 1 instance number and partition number | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 2022-06-21T11:46:43.175+00:00 2022-06-21T14:36:05.532+00:00 57 errors Inventory/Inventory-storage errors: io.netty.channel.StacklessClosedChannelException, io.vertx.core.impl.NoStackTraceThrowable: Connection is not active now, current status: CLOSED io.vertx.core.impl.NoStackTraceThrowable: Timeout |
MG Perf Rancher | PTF Create - 2 | 500`000 | 14h 46min (60 errors) | 15h 37min (31 errors) | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.659+ 60 errors 2022-06-13T14:27:40.568+00:00 2022-06-14T05:14:27.458+00:00 |
MG Bugfest | Default Marc Bib Create | 5000 | 4min | 2 | 2 | 512/1024 | mod-source-record-manager - Xmx = 2G "startedDate" : "2022-08-15T14:57:13.753+00:00", "completedDate" : "2022-08-15T15:01:29.365+00:00", | |
MG Bugfest | Default Marc Bib Create | 10000 | 10min | 2 | 2 | 512/1024 | mod-source-record-manager - Xmx = 2G "startedDate" : "2022-08-15T15:03:07.364+00:00", "completedDate" : "2022-08-15T15:13:28.827+00:00" | |
MG Bugfest | Create SRS MARC Authority | 5000 | 5min | 2 | 2 | 512/1024 | mod-source-record-manager - Xmx = 2G "startedDate" : "2022-08-16T00:15:55.396+00:00", "completedDate" : "2022-08-16T00:20:19.240+00:00", | |
MG Bugfest | Create SRS MARC Authority | 10000 | 8 min | 2 | 2 | 512/1024 | mod-source-record-manager - Xmx = 2G "startedDate" : "2022-08-16T14:52:53.191+00:00", "completedDate" : "2022-08-16T15:00:12.723+00:00" | |
MG Bugfest | Create SRS MARC Authority | 50000 | 34min | 2 | 2 | 512/1024 | mod-source-record-manager - Xmx = 2G "startedDate" : "2022-08-16T15:01:15.360+00:00", "completedDate" : "2022-08-16T15:35:37.028+00:00", |
Results before flow control fix: MODSOURMAN-811
env | profile | records number | time | time in Lotus | Kafka partition number | module instance number | CPU | description |
---|---|---|---|---|---|---|---|---|
MG Perf Rancher | PTF Create - 2 | 5000 | 7 min | 8 min | 2 | 2 | 512/1024 | mod-source-record-manager-3.4.0-SNAPSHOT.621 2022-05-27T12:58:30.331+00:00 2022-05-27T13:05:08.683+00:00 |
MG Perf Rancher | PTF Update - 1 | 5000 | 10 min | 13 min | 2 | 2 | 512/1024 | 2022-05-27T13:22:35.123+00:00 2022-05-27T13:32:35.344+00:00 |
MG Perf Rancher | PTF Create - 2 | 10`000 | 21 min | 27min | 19 min | 2 | 2 | 512/1024 | -Ddi.flow.control.enable=false 2022-05-30T09:51:13.876+00:00 | 2022-05-31T18:13:05.977+00:00 2022-05-30T10:12:33.982+00:00 | 2022-05-31T18:40:58.928+00:00 |
MG Perf Rancher | PTF Update - 1 | 10`000 | 30 min | 25 min | 2 | 2 | 512/1024 | -Ddi.flow.control.enable=false 2022-05-31T19:19:46.296+00:00 2022-05-31T19:49:59.651+00:00 |
MG Perf Rancher | PTF Create - 2 | 10`000 | 21 min | 19 min | 2 | 2 | 512/1024 | -Ddi.flow.control.enable=true 2022-05-31T20:02:06.368+00:00 2022-05-31T20:23:19.490+00:00 |
MG Perf Rancher | PTF Update - 1 | 10`000 | 31 min | 25 min | 2 | 2 | 512/1024 | -Ddi.flow.control.enable=true 2022-06-01T19:08:11.563+00:00 2022-06-01T19:39:58.803+00:00 |
MG Perf Rancher | PTF Create - 2 | 10`000 | 17 min | 19 min | 2 | 2 | 512/1024 | -Ddi.flow.control.enable=true 2022-06-03T09:20:07.654+00:00 2022-06-03T09:37:51.631+00:00 |
MG Perf Rancher | PTF Create - 2 | 30`000 | 1h 6 min | 45 min | 2 | 2 | 512/1024 | 2022-05-27T13:37:12.980+00:00 2022-05-27T14:31:52.595+00:00 |
MG Perf Rancher | PTF Update - 1 | 30`000 | 1h 26min | - | 2 | 2 | 512/1024 | 2022-05-27T15:37:33.580+00:00 2022-05-27T17:03:15.702+00:00 |
MG Perf Rancher | PTF Create - 2 | 50`000 | 2h 37 min | 1h 25min | 2 | 2 | 512/1024 | 3 errors: 2022-06-01T19:48:33.977+00:00 2022-06-01T22:25:59.700+00:00 |
60 errors (500K - PTF Create - 2):
Almost all errors with mod-inventory storage related to not having enough memory for instances (memory: 778Mi/846Mi). Instances of mod-inventory-storage were restarted 2 times.
io.vertx.core.impl.NoStackTraceThrowable: {"errors":[{"message":"must not be null","type":"1","code":"javax.validation.constraints.NotNull.message","parameters":[{"key":"contributors[0].name","value":"null"}]}]}
io.vertx.core.impl.NoStackTraceThrowable: {"errors":[{"message":"must not be null","type":"1","code":"javax.validation.constraints.NotNull.message","parameters":[{"key":"contributors[2].name","value":"null"}]}]}
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: Connection was closed: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: Connection was closed: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: finishConnect(..) failed: Connection refused: mod-inventory-storage.folijet.svc.cluster.local/172.20.250.48:80: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: finishConnect(..) failed: Connection refused: mod-inventory-storage.folijet.svc.cluster.local/172.20.250.48:80: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /item-storage/items
Performance testing of update item scenario after removal of index (MODDATAIMP-697):
In the scope of FOLIO-3388 "item_status_name_idx_gin" index for the Item status field was removed from mod-inventory-storage. However, this index potentially can be used for matching by Item status during Item update.
Job profile structure to update item and matching item by status field:
- Job profile
- Match profile (902$a to Item HRID)
- For matches: sub match profile (Static match of "Available" to Item Loan and Availability Status)
- For matches: action profile (action = update; Folio record type = Item)
- Mapping profile (Folio record type = Item)
- For matches: action profile (action = update; Folio record type = Item)
- For matches: sub match profile (Static match of "Available" to Item Loan and Availability Status)
Test results:
records number | time | description | |
---|---|---|---|
Testing with index | 5000 | 6 | 40 | 6 | 6 | |
Testing without index | 5000 | 6 | 11 | 6 | 6 |
Analysis of the query for matching item by status field:
Example of the CQL query built while processing the sub-match profile for matching by status:
status.name == "Available" AND id == "4ae2603d-1f71-457f-b69a-3eed820d6cfb"
This CQL query is translated by mod-inventory-storage to the following SQL:
SELECT id, jsonb, creation_date, created_by, holdingsrecordid, permanentloantypeid, temporaryloantypeid, materialtypeid, permanentlocationid, temporarylocationid, effectivelocationid FROM fs09000000_mod_inventory_storage.item WHERE ( CASE WHEN length(lower(f_unaccent('Available'))) <= 600 THEN left(lower(f_unaccent(item.jsonb->'status'->>'name')),600) LIKE lower(f_unaccent('Available')) ELSE left(lower(f_unaccent(item.jsonb->'status'->>'name')),600) LIKE left(lower(f_unaccent('Available')),600) AND lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')) END ) AND lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')) END) AND (id='4ae2603d-1f71-457f-b69a-3eed820d6cfb') LIMIT 2 OFFSET 0
For the particular case when matching by item status is used as sub match profile no indexes of the Item status field are used. Instead, a more efficient algorithm is applied to perform data lookup using the index for the id field.
During the testing item update scenario it was observed that the "item_status_name_idx_gin" index deletion does not impact the performance of matching Item by status. According to the results of analysis, this index is not used for matching Item by status field during data import.