Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In Progress(in review) + retesting results will be add to these report in scope of the
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-681

...

  1. One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-748
    Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.
  2. During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-930
  3. UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-929
  4. Usage:
    • Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.  CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
    • When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
    • Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk. 
  5. More CPU could be allocated to mod-inventory and mod-di-converter-storage

...

 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-930

...

With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled

ocp3-mod-data-import:12

Image Modified

Data Import Robustness Enhancement

...

Memory utilization rich maximal value for mod-source-record-storage-b 88%  and for mod-source-record-manager-b 85%.

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

...

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

CPU utilization of  mod-di-converter-storage-b

 

RDS CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal  CPU Utilization = 95%

...

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal  CPU Utilization = 94%

Retesting

...

DI file-splitting feature on Poppy release

Retest the DI feature to be sure that the new changes have not affected performance negatively.  Retest the DI file-splitting feature for the following scenarios:

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-681

Test 1.  Single tenant: create and update 250K file 

...

Previous results  (Orchid )

Duration

...

Brief comparison summary


The duration of the import date has increased, in particular(Poppy time processing - Orchid time processing ):

  • 250K MARC BIB Create PTF - Create 2 ---> 44 minutes
  • 250K MARC BIB UpdatePTF - Updates Success - 1 -→ 45 minutes
  • Multitenant MARC Create (100k, 50k, and 1 record)

...

  • PTF - Create 2

...

Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants 

...

Release: Orchid

Response time without DI (Average) 

Release: Orchid
Response time with DI
(Average)

...

Release: Poppy
Response time without DI (Average) 

...

Release: Poppy
Response time with DI (Average) 

...

Release: Orchid

DI Duration with CI/CO 

...

Release: Poppy

DI Duration with CI/CO 

...

Resource utilization during testing

...

  • -→1 hour 35 minutes
    • Check-Out without DI ~ 200ms
    • Check-In without DI ~ 650ms
    • Check-Out with DI ~ 770ms
    • Check-in with DI ~ 330ms

Resource utilization:

  • Service CPU utilization on Poppy is about the same as on the Orchid;
  • Memory utilization on Poppy is about the same as on the Orchid;
  • RDS CPU Utilization  during all tests and on both releases was about 96%;
  • The number of connections to DB on both releases were about the same from 550(Test 1.1) to 1200(Test 1.4).


Test 1.  Single tenant: create and update 250K file 

Test #Test parametersProfile

Duration

(Poppy)

Status

Previous results 

(Orchid )

Duration

1.1250K MARC BIB Create PTF - Create 22 hours 16 min Completed1 hour 32 min
1.2250K MARC BIB UpdatePTF - Updates Success - 13 hours 1 min Completed2 hours 16 min
1.3Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 24 hours 14min Completed2 hours 40 min

Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants 

Splitting Feature enabled

Release: Orchid

Response time without DI (Average) 


Release: Orchid
Response time with DI
(Average)


Release: Poppy
Response time without DI (Average) 

Release: Poppy
Response time with DI (Average) 

Check-Out0.804s1.48s1.03s2.26s
Check-In0.505s1.067s0.570s1.4s



Release: Orchid

DI Duration with CI/CO 

Release: Poppy

DI Duration with CI/CO 

Tenant _116 min 53 sec34 min 55 sec
Tenant _220min 39 sec27 min 39 sec
Tenant _317min 54 sec25 min 17 sec

Resource utilization during testing

Test 1.1. Data-import of 250K records file with "PTF - Create 2" job profile

Service CPU Utilization 

The shark spike of CPU at the beginning of test 1, We see similar behavior in all of the DI tests. СPU consumption was uniform during the test.

Image Added

Memory Utilization

The memory consumption was not affected, the mod-source-records-manager service increased the memory usage from 45% to 60% during the test, but after the test, the memory started to return to the pre-test value.

Image Added


RDS CPU Utilization  

Consumption of the database CPU was 97% throughout the test

Image Added

RDS Database Connections

The average number of DB connections during the test was about 550.


Image Added

Test 1.2. Data-import of 250K records file with "PTF -

...

Update" job profile

Service CPU Utilization 

Image Removed

Memory Utilization

Image Removed

RDS CPU Utilization  

Image Removed

RDS Database Connections

...

Test 1.2. Data-import of 250K records file with "PTF - Update" job profile

...

СPU consumption was stable during the test, except mod-inventory service at the beginning of the test the CPU usage was about 140% at the end of the test CPU value was about 200%.   
Image Modified

Memory Utilization

The memory was stable and without memory leaks.
Image Modified

RDS CPU Utilization 

Consumption of the database CPU was 97% throughout the test

RDS Database Connections

The average number of DB connections during the test was about 550.

Test 1.3. Multitenant MARC Create (100k, 50k, and 1 record)

Service CPU Utilization 

СPU consumption was stable during the test. However, in the last hour of the test, the services mod-inventory and mod-quick-mark increare the CPU utilization by 75%
Image Modified

Memory Utilization

The memory was stable and without memory leaks.
Image Modified

RDS CPU Utilization CPU Utilization 

Consumption of the database CPU was 96% throughout the test

RDS Database Connections

The average number of DB connections during the test was about 800.


Test 1.4. Data-import of 250K records file with "PTF - Update" job profile

Service CPU Utilization 

Memory Utilization 

The memory was stable and without memory leaks.

RDS CPU Utilization 

Consumption of the database CPU was 96% throughout the test

RDS Database Connections

The average number of DB connections during the test changed from 400 to 1200.

CICO responce time graph

Appendix

...

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

...

Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k

Test 4. To define the optimal value for RECORDS_PER_SPLIT_FILE(500, 1K, 2K, 5K) data-import job with PTF-Create-2 profile were run for 25K for 1 tenant simultaneously, for 2 tenants and for 3 tenants.