[Sunflower] [non-ECS] Multi-tenants DI
Overview
This document contains the results of testing Data Import for MARC Bibliographic records with an update job in the Sunflower release on secp1 environments with Kafka consolidated topics and file splitting features enabled on a non-ecs environment
https://folio-org.atlassian.net/browse/PERF-1113
During the first run, tests #2 and #3 failed due to authentication issues for tenants fs07000001 and fs07000002. These authentication blockers have since been resolved. Detailed steps and actions taken to fix the issue can be found here documentations
Summary
Three DI update job tests with 25,000 records were performed on the secp1 cluster.
We compared the Sunflower and Quesnelia releases and observed both performance gains and losses in the DI update job durations:
Single tenant: On average, jobs ran 9 minutes and 40 seconds slower in Sunflower.
Two tenants: Jobs were 10 minutes slower in Sunflower.
Three tenants: Jobs ran 45 minutes faster in Sunflower compared to Quesnelia.
Resource Utilization (Sunflower):
mod-di-converter-storage CPU usage: 203%
mod-inventory CPU usage: 128%
Database CPU usage: 99%
Average DB connections: 1,542
Test Results
Profile | Test # | Tenant | MARC File | DI Duration Sunflower (hh:mm:ss) | Results |
---|---|---|---|---|---|
DI MARC Bib Update (PTF - Updates Success - 6) | 1 | fs09000000 | 25K.mrc | 00:25:35 | Completed |
2 | fs09000000 | 25K.mrc | 00:30:15 | Completed | |
fs07000001 | 25K.mrc | 00:25:09 | Completed | ||
3 | fs09000000 | 25K.mrc | 00:26:35 | Completed | |
fs07000001 | 25K.mrc | 00:25:36 | Completed | ||
fs07000002 | 25K.mrc | 00:15:15 | Completed |
Comparison
Test # | Sunflower | Quesnelia | DI Delta | DI Delta | ||||||
Profile | Tenant | DI Duration Sunflower per tenant (hh:mm:ss) | DI Duration Sunflower per test (hh:mm:ss) | Profile | Tenant | DI Duration Quesnelia (hh:mm:ss) | DI Duration Quesnelia per test (hh:mm:ss) | |||
---|---|---|---|---|---|---|---|---|---|---|
№1 25K.mrc | DI MARC Bib Update (PTF - Updates Success - 6) | fs09000000 | 00:25:35 | 00:25:35 | DI MARC Bib Update (PTF - Updates Success - 1) | fso9000000 | 00:15:55 | 00:15:55 | +00:09:40 | +00:09:40 |
№2 25K.mrc | fs09000000 | 00:30:15 | 00:55:24 | fso9000000 | 00:17:06 | 00:45:23 | +00:13:09 | +0:10:01 | ||
fs07000001 | 00:25:09 | fso7000001 | 00:28:17 | -00:03:08 | ||||||
№3 25K.mrc | fs09000000 | 00:26:35 | 01:06:59 | fso9000000 | 00:33:57 | 1:51:59 | -00:07:22 | -00:45:00 | ||
fs07000001 | 00:25:36 | fso7000001 | 00:36:07 | -00:10:31 | ||||||
fs07000002 | 00:15:15 | fso7000002 | 00:41:55 | -00:26:40 |
Resource utilization for Test #1 and Test #2
Service CPU Utilization
Here we can see that mod-di-converter-storage-b module used 145% CPU, mod-source-record-storage-b used 88% CPU and mod-inventory-b used 81% CPU
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU in the average was 99%.
DB Connections
DB connections was 1496.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 | insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) |
2 | INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) ON CONFLICT DO NOTHING |
3 | insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) |
4 | WITH cte AS (SELECT id, name, name_type_id, authority_id, last_updated_date FROM fs09000000_mod_search.contributor WHERE last_updated_date > $1 ORDER BY last_updated_date ) SELECT c.id, c.name, c.name_type_id, c.authority_id, c.last_updated_date, json_agg( CASE WHEN sub.instance_count |
5 | UPDATE fs09000000_mod_inventory_storage.holdings_record SET jsonb = $1::jsonb WHERE id=? |
Resource utilization for Test #3
Service CPU Utilization
Here we can see that mod-di-converter-storage-b module used 203% CPU in average, mod-inventory-b used 129% CPU and mod-source-record-storage-b used 112% CPU
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU was 97%.
DB Connections
DB connections was 1588.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 | insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) |
2 | INSERT INTO fs07000002_mod_search.contributor (id, name, name_type_id, authority_id) VALUES ($1, $2, $3, $4) ON CONFLICT (id) DO UPDATE SET last_updated_date = CURRENT_TIMESTAMP |
3 | insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) |
4 | INSERT INTO fs07000002_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) ON CONFLICT DO NOTHING |
5 | INSERT INTO fs09000000_mod_search.item (id, tenant_id, instance_id, holding_id, json) VALUES ($1::uuid, $2, $3::uuid, $4::uuid, $5::jsonb) ON CONFLICT (id, tenant_id) DO UPDATE SET instance_id = EXCLUDED.instance_id, holding_id = EXCLUDED.holding_id, tenant_id = EXCLUDED.tenant_id, json = EXCLUDED.json |
Appendix
Infrastructure
PTF - environment Sunflower (secp1)
10 db.r7g.xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instances, writer
| Memory GIB | vCPUs |
---|---|---|
| 32 GiB4 | 4 vCPUs |
MSK fse-test
4 kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3