Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In Progress

...

  1. One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-748
    Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.
  2. During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-930
  3. UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-929
  4. Usage:
    • Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.
    • When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
    • Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk. 
  5. More CPU could be allocated to mod-inventory and mod-di-converter-storage

Results

Test #

Profile

Splitting Feature EnabledResults

Splitting Feature Disabled

ResultsBefore Splitting Feature
released
DeployedResults
1

100K MARC Create

PTF - Create 237 min -39 minCompleted40 minCompleted32-33 minutesCompleted
1

250K MARC Create 

PTF - Create 21 hour 32 minCompleted1 hour 41 minCompleted1 hour 33 min - 1 hour 57 minCompleted
1500K MARC CreatePTF - Create 23 hours 29 minCompleted*3 hours 55 minCompleted3 hours 33 minCompleted
2Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 22 hours 40 minCompleted*

3 hours 1 minCompleted
3CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants)PTF - Create 224 min 18 secCompleted

24 minCompleted *
4

100K MARC Update (Create new file)

PTF - Updates Success - 1

58 min 25 sec

57 min 19 sec

Completed1 hour 3 minCompleted--
4

250K MARC Update

PTF - Updates Success - 1

2 hours 2 min **


2 hours 12 min

Completed with errors **

Completed

1 hour 53 minCompleted--
4500K MARC UpdatePTF - Updates Success - 1

4 hours 43 min

4 hours 38 minutes

Completed

Completed

5 hour 59 minCompleted--

 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-748
 Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing.

...

 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-930

...

Data Import Robustness Enhancement 
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-646

25K records RECORDS_PER_SPLIT_FILE
Number of concurrent tenantsJob profile 500Status1KStatus5KStatus10KStatus
1 Tenant test#1PTF - Create 212 minutes 55 secondsCompleted11 minutes 48 secondsCompleted09 minutes 21 secondsCompleted9 minutes 2 secCompleted
1 Tenant test#210 minutes 31 secondsCompleted09 minutes 32 secondsCompleted9 minutes 6 secCompleted9 minutes 14 secCompleted
2 Tenants test#1PTF - Create 219 minutes 29 secondsCompleted15 minutes 47 secondsCompleted16 minutes 15 secondsCompleted16 minutes 3 secondsCompleted
2 Tenants test#218 minutes 19 secondsCompleted15 minutes 47 secondsCompleted16 minutes 11 secCompleted16 min 41 secCompleted
3 Tenants test#1PTF - Create 2

24 minutes 15 seconds

Completed

25 minutes 47 seconds

Completed23 minutes Completed23 minutes 27 secondsCompleted
3 Tenants test#224 minutes 38 secondsCompleted

23 minutes 28 seconds

Completed23 minutes 2 secCompleted23 minutes 26 secondsCompleted

Instance CPU Utilization 

...

Memory utilization rich maximal value for mod-source-record-storage-b 88%  and for mod-source-record-manager-b 85%.

Image Removed

Test 285%.

Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.


Service CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K500, 2 runs for each test.

Image Added

...


RDS CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
Image RemovedImage Added

RDS

...

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  500k records DI

Database Connections

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.

Image Added




Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants

...

Maximal DB CPU usage is about 95%


Test#3 With CI/CO


RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 535 connections count.

...

Test#3 With CI/CO


Appendix

Infrastructure

...

ocp3  with the "Bugfest" Dataset

Records count :

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

...