Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In Progress(in review) + retesting results will be add to these report in scope of the
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-681

...

  1. One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-748
    Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.
  2. During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-930
  3. UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-929
  4. Usage:
    • Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.  CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
    • When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
    • Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk. 
  5. More CPU could be allocated to mod-inventory and mod-di-converter-storage

...

 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-930

...

With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled

ocp3-mod-data-import:12

Image Modified

Data Import Robustness Enhancement

...

Memory utilization rich maximal value for mod-source-record-storage-b 88%  and for mod-source-record-manager-b 85%.

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

...

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

CPU utilization of  mod-di-converter-storage-b

 

RDS CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal  CPU Utilization = 95%

...

The average number of DB connections during the test changed from 400 to 1200.

CICO responce time graph

Retesting DI file-splitting feature on Poppy release with Refresh Token Rotation (RTR) and file-splitting feature

The goal of the tests was to investigate how the file-splitting feature caused Data-import on Poppy release and the impact of Refresh Token Rotation (RTR). The tests were performed on ocp3(Poppy), pcp1(Poppy) and ncp5(Orchid)  environments.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-723
Refresh Token Rotation (RTR)

Brief comparison summary


Test results

DI tests/ Configurationncp5
Orchid
ocp3
FSF true  without RTR token
ocp3
FSF false without RTR token
*ocp3
FSF deleted without token

ocp3

FSF false

AT =RT=

300; 

ocp3

 FSF false

AT =RT= 1000000000

pcp1

FSF false

AT =RT= 10000000

pcp1

FSF false without token

retest*

250k_bib_Create_1.mrcnot testednot testedfailedfailedfailedfailedfailedfailed
100k_bib_Create.mrc00:41:4100:54:3200:54:3600:53:5900:48:5600:54:42.0500:47:17"01:01:39"
50k_bib_Create.mrc00:19:4300:30:4000:30:2000:22:1700:27:0500:30:0900:21:4500:20:46
25k_bib_Create.mrc00:10:1100:13:5300:17:2000:10:3300:12:4200:13:2500:11:5400:10:53
10k_bib_Create.mrc00:04:1900:07:2200:05:3500:04:38not tested00:05:33.00:04:4200:04:36
5k_bib_Create.mrc00:02:3500:04:3100:02:4300:02:55not tested00:03:0700:02:5500:02:30
1k_bib_Create.mrcnot testednot testednot testednot testednot testednot tested00:00:54not tested
DI-25K-Update.mrcnot testednot testedfinished
successfully
failedfailedfinished
successfully
failedfinished
successfully


Resource utilization during testing

Service CPU utilization during the Data-import process

The next data import jobs were carried out
1) 5k_bib_Create 2) 10k_bib_Create  3) 25k_bib_Create 4) 50k_bib_Create 5) 50k_bib_Create 6) 100k_bib_Create 7) 50k_bib_Create 8) 25k_bib_Create 9) 25k_bib_Update 10) 50k_bib_Update(stopped)
CPU utilization was stable during all jobs, but some spikes of data-import jobs were at the beginning of all tests.
 

Image Added

Memory Utilization

Most of the modules were stable during the test, and no memory leak is suspected for DI modules, except mod-inventory-b which consumed about 92% of memory during all DI processes. 
Image Added

RDS CPU Utilization 


Maximal  CPU Utilization = 95%

Image Added

RDS Database Connections

The maximal number of DB connections during the tests was about 580.

Image Added

Database load

Image Added

Top SQL queries

Image Added

Appendix

Infrastructure ocp3  with the "Bugfest" Dataset

...

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

...