Skip to end of banner
Go to start of banner

[Quesnelia] [non-ECS] [Data import] Create MARC authority Records

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 40 Next »

Overview

  • This document contains the results of testing Data Import for MARC Bibliographic records with an update job in the Quesnelia release on qcp1 environments with Kafka consolidated topics and file splitting features enabled on a non-ecs environment.

PERF-832 - Getting issue details... STATUS  

Summary

Test Results and Comparison

Test â„–1

Test with 1k, 10k, 25k and 50k records files DI started on one tenant only.

Test #

# of records 

% creates

File

DI duration 
Morning Glory

DI duration
Nolana

DI duration 
Orchid

DI duration 
Poppy

DI duration 
Quesnelia
11,0001001k_marc_authority.mrc?api=v2 24 s27 s41 sec29 sec22 sec
-24%
25,000100 LC_SUBJ_msplit00000000.mrc?api=v21 min 21 s1 min 15 s1min 21s1 min 38 sec1 min 19 sec
-19%
310,000100msplit00000000.mrc?api=v2 2 min 32 s2 min 31 s2min 53s2 min 53 sec2 min 36 sec
-9.8%
422778
(for Poppy test)
25000
(for Quesnelia test)
100 msplit00000013.mrc?api=v211 min 14 s7 min 7 s5 min 42s6 min 24 sec6 min 19 sec
-1.3%
550,00010050000_authorityrecords.mrc?api=v222 min11 min 24 s11 min 11s13 min 48 sec11 min 59 sec
-13%

Test â„–2

Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.

# of records 
(Poppy)

DI Duration with CICO
(Poppy)

CI time Avg
(Poppy)

CI time 95th pct
(Poppy)

CO time Avg
(Poppy)

CO time 95th pct
(Poppy)

# of records
(Quesnelia)

DI Duration with CICO
(Quesnelia)

CI time Avg
(Quesnelia)
CI time 95th pct
(Quesnelia)
CO time Avg
(Quesnelia)
CO time 95th pct
(Quesnelia)
1,00035 sec0.5250.5761.0781.3261,000

20 sec
-42.8%

0.560
+6%
0.754
+30%
1.164
+8%
1.313
-1%
5,0001 min 41 sec0.5130.6120.91.0195,0001 min 19 sec
-21.7%
0.701
+36%
1.171
+91%
1.141
+26%
1.790
+75%
10,0003 min 4 sec0.5810.6851.0161.32110,0002 min 35 sec
-15.7%
0.723
+24%
1.024
+49%
1.179
+16%
1.494
+13%
22,7786 min 32 sec0.5981.5421.2441.72925,0006 min 26 sec
-1.5%
0.722
+20%
1.024
-33%
1.180
-5%
1.494
-13%
50,00013 min 48 sec0.6711.9531.512.0950,000

12 min 16 sec
-11%

0.777
+15%
1.045
-46%

1.265
-16%

1.550
-25%

Test â„–3

Multitenant testing

File with 50K, 25K, 10K, 5K, 1K loaded sequences and launched on Tenant1, Tenant2, and Tenant3 for Poppy 

File with 50K, 25K, 10K, 5K, 1K loaded sequences and launched on Tenant3, Tenant2 and Tenant1 for Quesnelia

Num of recordsTenant 1
pcp1-00
duration
(Poppy)
Tenant 2
pcp1-01
duration
(Poppy)
Tenant 3
pcp1-02
duration
(Poppy)
Num of recordsTenant 3
qcp1-02
duration
(Quesnelia)
Tenant 2
qcp1-01
duration
(Quesnelia)
Tenant 1
qcp1-00
duration
(Quesnelia)
100066 min75 min74 min100015 min 15 sec15 min 46 sec16 min
500067 min76 min75 min500022 min 9 sec22 min 10 sec21 min 33 sec
10,00067 min76 min74 min10,00027 min 38 sec27 min 51 sec29 min 42 sec
22,77867 min*73 min*71 min*2500046 min 42 sec47 min47 min 14 sec
50,00062 min56 min54 min50,00067 min 29 sec67 min 27 sec67 min 22 sec

Test â„–4

Order for load files with pause between files: 50k, 25k, 10k, 5k, and 1k  for order tenants Tenant 3(qcp1-02), Tenant 1(qcp1-00) and Tenant 2(qcp1-01)

Job ProfileNum of recordsTenant 1
qcp1-00
duration
Tenant 2
qcp1-01
duration
Tenant 3
qcp1-02
duration
KG - Create SRS MARC Authority on nonmatches to 010 $a DUBLICATE for Q100000:2800:3200:18
500003:3803:3103:29
10,00006:5807:0206:47
2500017:5417:3517:53
50,00034:0935:0829:49

Test â„–5

Order for load files without pause between files: 1k, 5k, 10k, 25k and 50k for order tenants : Tenant 3(qcp1-02), Tenant 2(qcp1-01) and Tenant 1(qcp1-00)

t 3-2-1 1-5-10-25-50

Job ProfileNum of recordsTenant 1
qcp1-00
duration
Tenant 2
qcp1-01
duration
Tenant 3
qcp1-02
duration
KG - Create SRS MARC Authority on nonmatches to 010 $a DUBLICATE for Q




10000:09:370:02:430:00:53
50000:10:540:28:430:02:17
10,0000:19:130:31:450:05:53
250000:35:540:44:130:22:44
50,0001:03:551:10:160:52:38



Resource utilization for Test #1

 Resource utilization table
CPURAM
mod-inventory-b49.90%mod-data-import-b64.80%
mod-source-record-storage-b44.70%mod-permissions-b63.50%
nginx-okapi38.10%mod-source-record-manager-b49.90%
mod-di-converter-storage-b28.40%mod-inventory-b41.80%
mod-source-record-manager-b24.80%mod-di-converter-storage-b41.30%
okapi-b16.40%okapi-b36.20%
mod-data-import-b16.10%mod-source-record-storage-b27.70%
mod-permissions-b5.60%mod-inventory-storage-b16.20%
mod-inventory-storage-b0.50%nginx-okapi4.90%
pub-okapi0.20%pub-okapi4.70%

Service CPU Utilization

Here we can see that mod-inventory-b module used 171% CPU, mod-di-converter-storage-b used 83% CPU and nginx-okapi used 66% CPU

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU in the average was 88%.

DB Connections

DB connections was 1139.

DB load

                                                                                                                     

Top SQL-queries


#TOP 5 SQL statements
1
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
3
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
4
WITH input_rows(record_id, authority_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_authorities(record_id, authority_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, authority_id::uuid
   )
SELECT record_id, authority_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.authority_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_authorities c USING (record_id);
5
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *


Resource utilization for Test #2

 Resource utilization table
CPURAM
nginx-okapi49%mod-inventory-b75%
mod-inventory-b47%mod-permissions-b70%
mod-source-record-storage-b42%mod-data-import-b61%
mod-di-converter-storage-b29%mod-source-record-manager-b48%
mod-source-record-manager-b26%okapi-b36%
okapi-b22%mod-inventory-storage-b35%
mod-permissions-b7%mod-di-converter-storage-b34%
pub-okapi2%mod-source-record-storage-b28%
mod-inventory-storage-b2%nginx-okapi5%
mod-data-import-b2%pub-okapi4%

Service CPU Utilization

Here we can see that mod-inventory-b module used 140% CPU in average, mod-inventory-b used 117% CPU and nginx-okapi used 109% CPU

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU was 93%.

DB Connections

DB connections was 1273.

DB load

                                                                                                                     

Top SQL-queries

#TOP 5 SQL statements
1
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
3
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
4
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
5
WITH input_rows(record_id, authority_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_authorities(record_id, authority_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, authority_id::uuid
   )
SELECT record_id, authority_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.authority_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_authorities c USING (record_id);

Resource utilization for Test #3

 Resource utilization table
CPURAM
mod-permissions-b146%mod-permissions-b79%
mod-inventory-b56%mod-inventory-b72%
nginx-okapi42%mod-data-import-b61%
mod-source-record-storage-b41%mod-source-record-manager-b44%
mod-di-converter-storage-b37%okapi-b37%
mod-source-record-manager-b27%mod-source-record-storage-b36%
okapi-b20%mod-inventory-storage-b36%
mod-data-import-b2%mod-di-converter-storage-b35%
mod-inventory-storage-b1%nginx-okapi5%
pub-okapi1%pub-okapi4%

Service CPU Utilization

Here we can see that mod-inventory-b module used 140% CPU in average, mod-inventory-b used 117% CPU and nginx-okapi used 109% CPU

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU was 94%.

DB Connections

DB connections was 1823.

DB load

                                                                                                                     

Top SQL-queries

#TOP 5 SQL statements
1
INSERT INTO fs07000002_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
3
INSERT INTO fs07000001_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
4
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
5
INSERT INTO fs07000001_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)


Resource utilization for Test #4

 Resource utilization table
CPURAM
mod-inventory-b91.30%mod-permissions-b80.00%
nginx-okapi43.60%mod-data-import-b65.10%
mod-source-record-storage-b39.90%mod-inventory-b60.40%
mod-di-converter-storage-b26.60%mod-source-record-storage-b42.20%
okapi-b25.00%mod-di-converter-storage-b42.00%
mod-source-record-manager-b18.90%mod-source-record-manager-b41.90%
mod-data-import-b2.40%okapi-b37.60%
mod-permissions-b1.60%mod-inventory-storage-b16.30%
mod-inventory-storage-b0.30%nginx-okapi5.60%
pub-okapi0.20%pub-okapi4.80%

Service CPU Utilization

Here we can see that mod-inventory-b module used 140% CPU in average, mod-inventory-b used 117% CPU and nginx-okapi used 109% CPU

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU was 93%.

DB Connections

DB connections was 1273.

DB load

                                                                                                                     

Top SQL-queries

#TOP 5 SQL statements
1
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
2
INSERT INTO fs07000002_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
3
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
4
INSERT INTO fs07000002_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
5
INSERT INTO fs07000001_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

Resource utilization for Test #5

 Resource utilization table
CPURAM
mod-permissions-b77%mod-permissions-b77%
mod-inventory-b63%mod-inventory-b75%
mod-di-converter-storage-b35%mod-data-import-b59%
nginx-okapi35%mod-source-record-manager-b53%
mod-source-record-storage-b35%mod-source-record-storage-b44%
mod-source-record-manager-b23%mod-di-converter-storage-b34%
mod-data-import-b20%mod-inventory-storage-b34%
okapi-b19%okapi-b33%
mod-inventory-storage-b1%nginx-okapi4%
pub-okapi1%pub-okapi4%

Service CPU Utilization

Here we can see that mod-inventory-b module used 140% CPU in average, mod-inventory-b used 117% CPU and nginx-okapi used 109% CPU

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU was 98%.

DB Connections

DB connections was 1757.

DB load

                                                                                                                     

Top SQL-queries

#TOP 5 SQL statements
1
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
3
INSERT INTO fs07000002_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
4
INSERT INTO fs07000001_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
5
INSERT INTO fs07000002_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)


Appendix

Infrastructure

PTF - environment Quesnelia (qcp1)

  • 10 db.r6g.xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instances, writer

    NameMemory GIBvCPUs

    db.r6g.xlarge

    32 GiBvCPUs
  • MSK ptf-mobius-testing2
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2


 Quesnelia modules memory and CPU parameters


Methodology/Approach

DI tests scenario a data import job profile that creates new MARC authority records for non-matches (Job Profile: KG - Create SRS MARC Authority on nonmatches to 010 $a DUBLICATE for Q) were started from UI on Quesnelia (qcp1) env with  file splitting features enabled on a non-ecs environment.

  • Action for non-matches:  Create MARC authority record 

  • The above files are all stored here - MARC Resources
    - 22k file what was provided from MARC Resources  does nor work, so 50k file was split to file with 25k records and used instead of 22k file.

Test set

  • Test 1: Manually tested 1k, 10k, 25k and 50k records files DI started on one tenant only.
  • Test 2: Manually tested 1k, 10k, 25k and 50k records files DI started on one tenant only plus Check-in and Checkout (CICO) for 5 concurrent users.
  • Test 3: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrentlyOrder for load file without pause between files: 50k, 25k, 10k, 5k, and 1k for order tenantsTenant 3(qcp1-02), Tenant 2(qcp1-01) and Tenant 1(qcp1-00)
  • Test 4: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrently. Order for load file with pause between files: 50k, 25k, 10k, 5k, and 1k for order tenantsTenant 3(qcp1-02), Tenant 1(qcp1-00) and Tenant 2(qcp1-01)
  • Test 5: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrentlyOrder for load file without pause between files: 1k, 5k, 10k, 25k and 50k for order tenantsTenant 3(qcp1-02), Tenant 2(qcp1-01) and Tenant 1(qcp1-00)



  • No labels