PTF - Data Import Create/Update multi tenant (Quesnelia) [ECS]
Overview
- This document contains the results of testing Data Import for MARC Bibliographic records with an create and update jobs on the Quesnelia [ECS] release on qcon environment.
- PERF-859Getting issue details... STATUS
Summary
- Data Import tests finished successfully on qcon environment using the PTF - Create 2 profile and file with 10k, 25k and 50k records.
- Data Import test for PTF - Updates Success - 2 profile with 25k records file on 3 tenants concurrently finished with errors for two records.
- Comparing with previous testing results Poppy and Quesnelia releases
- Data Import processed DI MARC Bib Create jobs including test on 2 and 3 tenants concurrently without errors for Quesnelia releases.
- Data Import processed DI MARC Bib Update job with 25k file for 3 tenants concurrently with two errors.
- Data Import durations for Create job has performance improvement around 50% for Quesnelia releases.
- Data Import durations for Update job stayed in the same time range in Average for Quesnelia releases.
- When comparing the durations of the 'main' job on the first tenant and observing how the second and third jobs affect and slow down the main job, it becomes evident that processing times are twice as slow for three tenants compared to two tenants when handling a 10k file.
- Additionally, for a DI MARC Bib Update job with 25k file , the duration increases by only 6%.
Test Runs and Results
This table contains durations for Data Import.
Profile | Test # | Tenant | MARC File | DI Duration by Tenant Quesnelia | Results |
---|---|---|---|---|---|
DI MARC Bib Create (PTF - Create 2) | 1 | College (cs00000int_0001) | 10K.mrc | 0:04:52 | Completed |
Professional (cs00000int_0002) | 10K.mrc | 0:05:24 | Completed | ||
2 | College (cs00000int_0001) | 10K.mrc | 0:06:35 | Completed | |
Professional (cs00000int_0002) | 10K.mrc | 0:06:18 | Completed | ||
School (cs00000int_0003) | 10K.mrc | 0:08:13 | Completed | ||
3 | College (cs00000int_0001) | 25K.mrc | 0:11:04 | Completed | |
Professional (cs00000int_0002) | 25K.mrc | 0:16:16 | Completed | ||
4 | College (cs00000int_0001) | 25K.mrc | 0:11:47 | Completed | |
Professional (cs00000int_0002 | 25K.mrc | 0:19:12 | Completed | ||
School (cs00000int_0003) | 25K.mrc | 0:23:41 | Completed | ||
5 | College (cs00000int_0001) | 50K.mrc | 0:36:02 | Completed | |
Professional (cs00000int_0002) | 50K.mrc | 0:42:50 | Completed | ||
School (cs00000int_0003) | 50K.mrc | 0:47:13 | Completed | ||
DI MARC Bib Update (PTF - Updates Success - 2) | 6 | College (cs00000int_0001) | 10K.mrc | 0:11:53 | Completed |
Professional (cs00000int_0002) | 10K.mrc | 0:14:49 | Completed | ||
7 | College (cs00000int_0001) | 10K.mrc | 0:20:39 | Completed | |
Professional (cs00000int_0002) | 10K.mrc | 0:20:07 | Completed | ||
School (cs00000int_0003) | 10K.mrc | 0:21:05 | Completed | ||
8 | College (cs00000int_0001) | 25K.mrc | 0:38:41 | Completed | |
Professional (cs00000int_0002) | 25K.mrc | 0:39:09 | Completed | ||
9 | College (cs00000int_0001) | 25K.mrc | 0:41:32 | Completed with errors* | |
Professional (cs00000int_0002) | 25K.mrc | 0:42:34 | Completed | ||
School (cs00000int_0003) | 25K.mrc | 0:16:27 | Completed | ||
10 | College (cs00000int_0001) | 50K.mrc | 1:43:17 | Completed | |
Professional (cs00000int_0002) | 50K.mrc | 1:50:34 | Completed | ||
School (cs00000int_0003) | 50K.mrc | 1:59:29 | Completed |
This table contains durations for Data Import by each tests.
Profile | Test # | Tenants | MARC File | DI Duration by Test Quesnelia (hh:mm:ss) |
---|---|---|---|---|
DI MARC Bib Create (PTF - Create 2) | 1 | College (cs00000int_0001) | 10K.mrc | 0:09:17 |
2 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 10K.mrc | 0:12:01 | |
3 | College (cs00000int_0001) Professional (cs00000int_0002) | 25K.mrc | 0:20:09 | |
4 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 25K.mrc | 0:30:54 | |
5 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 50K.mrc | 1:00:44 | |
DI MARC Bib Update (PTF - Updates Success - 2) | 6 | College (cs00000int_0001) Professional (cs00000int_0002) | 10K.mrc | 00:20:07 |
7 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 10K.mrc | 00:27:56 | |
8 | College (cs00000int_0001) Professional (cs00000int_0002) | 25K.mrc | 00:47:08 | |
9 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 25K.mrc | 00:26:46 | |
10 | College (cs00000int_0001) Professional (cs00000int_0002) School (cs00000int_0003) | 50K.mrc | 02:22:13 |
Comparison
This table compares the DI durations of the 'main' job on the first tenant and how the second and third jobs affect and slow down the main job during Data Import.
Profile | MARC File | DI Duration "main" job for 1 tenants | DI Duration "main" job for 2 tenants | DI Duration "main" job for 3 tenants |
---|---|---|---|---|
DI MARC Bib Create (PTF - Create 2) | 10K.mrc | 0:05:35 *test run in other day | 0:04:52 | 0:06:35 |
25K.mrc | 0:15:27 *test run in other day | 0:11:04 | 0:11:47 | |
DI MARC Bib Update (PTF - Updates Success - 2) | 10K.mrc | 0:07:55 | 0:11:53 | 0:20:39 |
25K.mrc | 0:22:39 | 0:38:41 | 0:41:32 |
This table contains сompare durations for second and third jobs compare to the main job for Data Import.
Profile | Test # | Tenant | MARC File | DI Duration for second and third jobs |
---|---|---|---|---|
DI MARC Bib Create (PTF - Create 2) | 1 | College (cs00000int_0001) | 10K.mrc | 0:04:52 |
Professional (cs00000int_0002) | 10K.mrc | 0:05:24 +11% | ||
2 | College (cs00000int_0001) | 10K.mrc | 0:06:35 | |
Professional (cs00000int_0002) | 10K.mrc | 0:06:18 -4% | ||
School (cs00000int_0003) | 10K.mrc | 0:08:13 +25% | ||
3 | College (cs00000int_0001) | 25K.mrc | 0:11:04 | |
Professional (cs00000int_0002) | 25K.mrc | 0:16:16 +47% | ||
4 | College (cs00000int_0001) | 25K.mrc | 0:11:47 | |
Professional (cs00000int_0002 | 25K.mrc | 0:19:12 +63% | ||
School (cs00000int_0003) | 25K.mrc | 0:23:41 +101% | ||
5 | College (cs00000int_0001) | 50K.mrc | 0:36:02 | |
Professional (cs00000int_0002) | 50K.mrc | 0:42:50 +19% | ||
School (cs00000int_0003) | 50K.mrc | 0:47:13 +31% | ||
DI MARC Bib Update (PTF - Updates Success - 2) | 6 | College (cs00000int_0001) | 10K.mrc | 0:11:53 |
Professional (cs00000int_0002) | 10K.mrc | 0:14:49 +24% | ||
7 | College (cs00000int_0001) | 10K.mrc | 0:20:39 | |
Professional (cs00000int_0002) | 10K.mrc | 0:20:07 -3% | ||
School (cs00000int_0003) | 10K.mrc | 0:21:05 +2% | ||
8 | College (cs00000int_0001) | 25K.mrc | 0:38:41 | |
Professional (cs00000int_0002) | 25K.mrc | 0:39:09 +1% | ||
9 | College (cs00000int_0001) | 25K.mrc | 0:41:32 | |
Professional (cs00000int_0002) | 25K.mrc | 0:42:34 +2.5% | ||
School (cs00000int_0003) | 25K.mrc | 0:16:27 -60% | ||
10 | College (cs00000int_0001) | 50K.mrc | 1:43:17 | |
Professional (cs00000int_0002) | 50K.mrc | 1:50:34 +7% | ||
School (cs00000int_0003) | 50K.mrc | 1:59:29 +16% |
This table contains durations comparison between Poppy and Quesnelia releases.
Profile | Test # | Tenant | MARC File | DI Duration Poppy (hh:mm:ss) | DI Duration Quesnelia (hh:mm:ss) | DI Delta (hh:mm:ss) |
---|---|---|---|---|---|---|
DI MARC Bib Create | 1 | College (cs00000int_0001) | 10K.mrc | 00:10:43 | 0:04:52 | -0:05:51 -54% |
Professional (cs00000int_0002) | 10K.mrc | 00:10:37 | 0:05:24 | -0:05:13 -49% | ||
2 | College (cs00000int_0001) | 10K.mrc | 00:21:12 | 0:06:35 | -0:14:37 -69% | |
Professional (cs00000int_0002) | 10K.mrc | 00:21:06 | 0:06:18 | -0:14:48 -70% | ||
School (cs00000int_0003) | 10K.mrc | 00:20:57 | 0:08:13 | -0:12:44 -61% | ||
3 | College (cs00000int_0001) | 25K.mrc | 00:23:44 | 0:11:04 | -0:12:40 -53% | |
Professional (cs00000int_0002) | 25K.mrc | 00:23:30 | 0:16:16 | -0:07:14 -31% | ||
4 | College (cs00000int_0001) | 25K.mrc | 00:37:11 | 0:11:47 | -0:25:24 -68% | |
Professional (cs00000int_0002 | 25K.mrc | 00:37:05 | 0:19:12 | -0:17:53 -48% | ||
School (cs00000int_0003) | 25K.mrc | 00:36:58 | 0:23:41 | -0:13:17 -36% | ||
5 | College (cs00000int_0001) | 50K.mrc | 01:12:54 | 0:36:02 | -0:36:52 -50% | |
Professional (cs00000int_0002) | 50K.mrc | 01:12:44 | 0:42:50 | -0:29:54 -41% | ||
School (cs00000int_0003) | 50K.mrc | 01:12:35 | 0:47:13 | -0:25:22 -35% | ||
DI MARC Bib Update | 6 | College (cs00000int_0001) | 10K.mrc | 00:09:47 | 0:11:53 | +0:02:06 +21% |
Professional (cs00000int_0002) | 10K.mrc | 00:11:26 | 0:14:49 | +0:03:23 +29% | ||
7 | College (cs00000int_0001) | 10K.mrc | 00:19:08 | 0:20:39 | +0:01:31 +8% | |
Professional (cs00000int_0002) | 10K.mrc | 00:19:06 | 0:20:07 | +0:01:01 +5% | ||
School (cs00000int_0003) | 10K.mrc | 00:18:31 | 0:21:05 | +0:02:34 +14% | ||
8 | College (cs00000int_0001) | 25K.mrc | 00:30:49 | 0:38:41 | +0:07:52 +25% | |
Professional (cs00000int_0002) | 25K.mrc | 00:30:52 | 0:39:09 | +0:08:17 +27% | ||
9 | College (cs00000int_0001) | 25K.mrc | 00:47:47 | 0:41:32 | -0:06:15 -13% | |
Professional (cs00000int_0002) | 25K.mrc | 00:48:17 | 0:42:34 | -0:05:43 -11% | ||
School (cs00000int_0003) | 25K.mrc | 00:47:54 | 0:16:27 | -0:31:27 -65% | ||
10 | College (cs00000int_0001) | 50K.mrc | not tested | 1:43:17 | ||
Professional (cs00000int_0002) | 50K.mrc | not tested | 1:50:34 | |||
School (cs00000int_0003) | 50K.mrc | not tested | 1:59:29 |
Resource utilization for Test Set №1
Service CPU Utilization
Here we can see that mod-data-import used 105% CPU in spike and mod-inventory used 95% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU was 96%.
DB Connections
Max number of DB connections was 1650.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Resource utilization for Test Set №2
Service CPU Utilization
Here we can see that nginx-okapi used 195% CPU in spike, mod-inventory used 150% CPU in spike and mod-data-import used 140% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU was 96%.
DB Connections
Max number of DB connections was 1650.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Errors
Data Import test for PTF - Updates Success - 2 profile with 25k records file on 3 tenants concurrently finished with errors for two records.
Felid records:
Record ID | Record Name | Start Time | And Time | Duration |
---|---|---|---|---|
987db8ab-91e4-43ab-9c4f-8456da2466c5 | 1718545962860-25k-333_14.mrc | 2024-06-16 13:53:01.597+00 | 2024-06-16 14:07:58.232+00 | 0:14:57 |
4b55ca82-9fb1-4d42-b116-c3d3df75b312 | 1718545962860-25k-333_13.mrc | 2024-06-16 13:53:01.596+00 | 2024-06-16 14:07:13.958+00 | 0:14:12 |
Error Logs:
# | Data Import Log for Records |
---|---|
1 | io.vertx.core.impl.NoStackTraceThrowable: Failed to update MARC record in SRS, instanceId: '830273a3-459b-4a81-945e-139719dd1b71', jobExecutionId: '4b55ca82-9fb1-4d42-b116-c3d3df75b312', status code: 504, |
2 | io.vertx.core.impl.NoStackTraceThrowable: Failed to update MARC record in SRS, instanceId: '8d74d733-7c8e-40bf-974c-a2e19a01a389', jobExecutionId: '987db8ab-91e4-43ab-9c4f-8456da2466c5', status code: 422, |
Appendix
Infrastructure
PTF - environment Quesnelia (qcon)
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
- Number of records in DB:
- cs00000int_0001
- instances - 6828236
- items - 7859770
- holdings - 7328737
- cs00000int_0002
- nstances - 1163315
- items - 1754121
- holdings - 1332559
- cs00000int_0003
- nstances - 1135806
- items - 1735291
- holdings - 1309387
- cs00000int_0001
OpenSearch
domain: fse
- Data nodes Instance type: r6g.xlarge.search
Number of nodes: 9
Version: OpenSearch_2_7_R20240502
MSK - tenat
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Kafka consolidated topics enabled
Methodology/Approach
DI tests scenario (DI MARC Bib Create\Update) were started from UI.
Test set №1:
- Test 1: Manually tested 10k records files DI MARC Bib Create started on College tenant(cs00000int_0001) and Professional tenant(cs00000int_0002) concurrently, step 30%.
- Test 2: Manually tested 10k records files DI MARC Bib Create started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
- Test 3: Manually tested 25k records files DI MARC Bib Create started on College tenant(cs00000int_0001) and Professional tenant(cs00000int_0002) concurrently, step 30%.
- Test 4: Manually tested 25k records files DI MARC Bib Create started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
- Test 5: Manually tested 150k records files DI MARC Bib Create started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
Test set №2:
- Test 6: Manually tested 10k records files DI MARC Bib Update started on College tenant(cs00000int_0001) and Professional tenant(cs00000int_0002) concurrently, step 30%.
- Test 7: Manually tested 10k records files DI MARC Bib Update started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
- Test 8: Manually tested 25k records files DI MARC Bib Update started on College tenant(cs00000int_0001) and Professional tenant(cs00000int_0002) concurrently, step 30%.
- Test 9: Manually tested 25k records files DI MARC Bib Update started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
- Test 10: Manually tested 150k records files DI MARC Bib Update started on College tenant(cs00000int_0001), Professional tenant(cs00000int_0002) and School tenant(cs00000int_0003) concurrently, step 30%.
To get status and time range for import jobs the query used:
select file_name,started_date,completed_date, completed_date - started_date as duration ,status from cs00000int_0001_mod_source_record_manager.job_execution order by started_date desc limit 1000; select file_name,started_date,completed_date, completed_date - started_date as duration ,status from cs00000int_0002_mod_source_record_manager.job_execution order by started_date desc limit 1000; select file_name,started_date,completed_date, completed_date - started_date as duration ,status from cs00000int_0003_mod_source_record_manager.job_execution order by started_date desc limit 1000;