OAI-PMH data harvesting[Incremental + Full] (Poppy consortia)

Overview

  • The purpose of the OAI-PMH Full Harvesting tests and Incremental Harvesting tests is to measure performance of Poppy release and to find possible issues, bottlenecks per PERF-706 - Getting issue details... STATUS on MCPT environment.
  • Two jMeter scripts for 2 scenarios were prepared to concurrently execute OAI-PMH by initiating harvesting from the member tenant level. To simulate the behaviour of EBSCO Harvester a delay after each request was used (150-300 ms) in the script.

  • First script should trigger two full havests on 62 tenants creating 124 concurrent harvests. Second scenario should trigger 1 full and 1 incremental where tenants have 10k, 100k and 500k accordingly. In this second scenario incremental harvests should start from 10k, then 100k and 500k. 
  • In total, 124 harvests for the first scenario and 143 harvests during 2 scenario (62 full + 57 incremental with 10k, + 23 with 100k + 1 with 500k).
  • Baseline test for 1 Full + 1 incremental with 500k on 1 tenant should be run against tenant with highest number of instances.  
 Overview PCON
  • The purpose of the OAI-PMH Full Harvesting tests and Incremental Harvesting tests is to measure performance of Poppy release and to find possible issues, bottlenecks per PERF-706 - Getting issue details... STATUS on PCON environment.
  • jMeter scripts were prepared to concurrently execute two OAI-PMH harvests on each tenant by initiating harvesting from the member tenant level. To simulate the behavior of EBSCO Harvester a delay after each request was used (200-400 ms) in the script. 2 scripts were triggered from carrier-io concurrently to carry out 1 Full and 1 Incremental harvesting on each member tenant. One was for Full harvesting and second one was for Incremental. In total, 10 harvests were executed simultaneously during the test.

Summary

  • In mcpt OAI-PMH can concurrently perform 124 harvests - 2 full harvests on each 62 member tenants with constant average throughput 10 requests or 1000 records per second during the test.
  • Average response times (RTs) for /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] request in one tenant test - 0.580 sec,, in 62 tenants test - 18.3 sec. Harvesting work faster when RTs low and it happen when number of harvests decrease. Response times for 2 full harvest on 1 tenant only - 0.540 sec.
  • When a number of harvests increase from 1 to 14 a total throughput grow up to 10 request per second. If number of harvests higher than 14 the throughput do not change. Just response times grow.
  • Duration for 2 full harvests on 62 tenants - 4 hours 30 minutes in scenario 1
  • Duration for 10k incremental harvest - 25 min, 100k - 1 hour 52 minutes, 500k - 3 hours - in scenario 2
  • If to run 1 full and 1 incremental 500k on 1 tenant we have full harvest duration - 01:26:19. Incremental harvest duration - 01:09:46. Average response time 0.601 second. 
  • Duration of full harvest triggered from central tenant level for all 62 tenants running sequentially - 13 hours 22 minutes.
  • CPU utilization in mod-oai-pmh didn't exceed 44% over all tests.
  • Memory utilization  in mod-oai-pmh was not higher 49%. Mod-inventory - 61% in scenario 2, 85% in scenario 1 (here it grew to this level because two 10k DI create jobs were tested along the oai-pmh). Additional DI along the test didn't affect oai-pmh response times. No memory leak trends found. More info in resource utilization table.
  • RDS CPU utilization was 25% for first scenario, not higher 30% at the beginning of test in second scenario.
  • DB connections - 700 in first scenario with SRS record source, 1400 with SRS + Inventory and 1350 in second scenario with SRS.
 Summary PCON
  • OAI-PMH can concurrently operate for a minimum of 5 member tenants with negligible performance decline. The time taken is directly proportional to the quantity of requests made.
  • Average response times (ms) for /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] was 0.618

Recommendations & Jiras

Test Runs & Results

Two jMeter scripts for 2 scenarios were prepared to concurrently execute OAI-PMH by initiating harvesting from the member tenant level. To simulate the behaviour of EBSCO Harvester a delay after each request was used (150-300 ms) in the script.

In total, 124 harvests were executed concurrently during the test for the first scenario and 143 harvests during 2 scenario (62 full + 57 incremental with 10k, + 23 with 100k + 1 with 500k) .

This table contains durations for harvests in tenants with approximate numbers of records 25k, 50k and 100k for these scenarios. And also results of incremental harvests of 10k, 100k and 500k records.

RESULTS for tests #1 - #5
Record source



SRSInventorySRS+Inventory

Full / IncrementalTenantInstancesDurationInstancesDurationInstancesDuration
Scenario #1 2 Full HarvestsFullcs00000001_00372422600:56:11
No data2422600:52:27
cs00000001_00485334601:39:24
No data5334601:40:50
cs00000001_00309758802:36:17
No data9758802:30:48
cs00000001_004263467304:28:34
No data63467304:21:41







Scenario #2 1 Full Harvest + 1 IncrementalFullcs00000001_00372422600:49:52
No data

cs00000001_00485334601:28:33
No data

cs00000001_00309758802:15:58
No data

cs00000001_004263467303:41:44
No data

Incrementalcs00000001_003710k00:25:02
No data

cs00000001_0046100k01:52:01
No data

cs00000001_0042500k02:59:13
No data

Test #1

2 Full harvests on 62 member tenants. were carried out concurrently with SRS record source. During test 22 tenants were harvested fully without problems. The largest amount of records for these tenants was 53345 (schema - cs00000001_0048). The rest member tenants (40) stopped harvesting with 54700 records in Average and didn't finish successfully. Records were harvested equally all over the tenants. For some reason load generator stopped execution of the test. It was retested in test #2.

Average response times (RTs) for 100 records request /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] with 124 concurrent harvests was  22 seconds but with 80 harvests it decreased to 15 seconds.

Duration for 24225 records - 00:59:40,  53345 records - 01:48:40

Test #2

2 Full harvests on 62 member tenants with SRS record source. Harvesting finished successfully.

Average response times (RTs)  for request /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken]  for harvesting on one tenant - 580 ms, for 62 tenants - 18.3 sec. Harvesting work faster when RTs low and it happen when number of harvests decrease.

Duration for 24225 records - 00:56:11,  53345 records - 01:42:20, 97588 records - 02:36:17

Test #3

2 Full harvests on 62 member tenants with Inventory record source. Finished successfully without any load. No data to harvest.

Test #4

2 Full harvests on 62 member tenants with SRS + Inventory record source. 

RTs for 100 records request with 124 concurrent harvests (at the beginning of the test) - 14,5 sec

Duration for 24225 records - 00:52:29,  53345 records - 01:40:50, 97588 records - 02:30:48

Test #5

1 Full harvest on 62 member tenants and 1 incremental harvest with 10k on 57 tenants, pause 2 minutes and then 1 incremental with 100k on 23 tenants, pause and 1 incremental with 500k records with SRS record source.

As far as harvests with 10k ended in 25 minutes response times improved for the rest of harvests from 14.4 seconds in Average to 7,4 seconds for full and 6.7 seconds for incremental. Closer to the end of test number of harvests decreased from 19 to 8 with following response times 1.32 seconds for full and 1.11  seconds for incremental. In range from 8 to 2 harvests response times 0.704 seconds for full and 0.622  seconds

Test #6

1 Full harvest + 1 incremental 500k concurrently on cs00000001_0042 tenant with 634k instances.

Full harvest duration - 01:26:19. 

Incremental harvest duration - 01:09:46

Test #7

1 Full harvest from the central tenant level trigger full harvests for all tenants sequentially. It was carried out from EBSCO harvester AWS windows machine. 

Duration for 62 tenants: 13 hours 22 minutes. Number of harvested records: 6495617. Additional info about test in Full Harvests Duration Table.

 CPU and Memory utilization for tests #1 - #5
CPU UtilizationMemory Consumption
1 scenario2 scenario1 scenario2 scenario
Module%Module%Module%Module%

mod-oai-pmh-b

43.87

mod-oai-pmh-b

42.95

mod-inventory-b

84.99

mod-inventory-b

61.23

edge-oai-pmh-b

29.36

edge-oai-pmh-b

28.92

mod-quick-marc-b

63.63

okapi-b

58.17

mod-inventory-b

25.79

mod-inventory-b

23.63

okapi-b

62.01

mod-source-record-manager-b

53.74

nginx-okapi

18.68

nginx-okapi

19.66

mod-source-record-manager-b

57.09

mod-oai-pmh-b

50.54

mod-source-record-storage-b

9.58

mod-quick-marc-b

9.05

mod-oai-pmh-b

48.57

mod-quick-marc-b

49.45

mod-quick-marc-b

8.21

okapi-b

8.32

mod-source-record-storage-b

35.72

mod-source-record-storage-b

47.91

pub-edge

8.15

mod-source-record-storage-b

7.81

edge-oai-pmh-b

33.32

edge-oai-pmh-b

35.10

okapi-b

6.79

pub-edge

7.64

mod-circulation-storage-b

26.79

mod-circulation-storage-b

33.32

mod-source-record-manager-b

3.31

mod-source-record-manager-b

3.14

mod-inventory-storage-b

17.26

mod-inventory-storage-b

13.35

mod-inventory-storage-b

2.71

mod-inventory-storage-b

2.40

pub-edge

4.80

nginx-okapi

4.80

mod-circulation-storage-b

0.55

mod-circulation-storage-b

1.23

nginx-okapi

4.80

pub-edge

4.74

pub-okapi

0.13

pub-okapi

0.14

pub-okapi

4.58

pub-okapi

4.35

 RESULTS PCON

jMeter script was prepared to concurrently execute two OAI-PMH harvests by initiating harvesting from the member tenant level. To simulate the behaviour of EBSCO Harvester a delay after each request was used (200-400 ms) in the script.

2 scripts were triggered from carrier-io concurrently to carry out 1 Full and 1 Incremental harvesting on each member tenant. One was for Full harvesting and second one was for Incremental. In total, 10 harvests were executed simultaneously during the test.

ALL SCENARIO TEST RESULTS
Record source



SRSInventorySRS+Inventory

Number of recordsTenantHarvested numberDurationHarvested numberDurationHarvested numberDuration
Scenario #1 2 Full HarvestsFullcs00000int_000131346501:00:253971100:07:5435317601:07:17
cs00000int_000217000100:35:133000000:06:0520000100:41:47
cs00000int_000318386700:40:371837800:04:1220224500:43:56
cs00000int_000417479300:38:415823200:09:4123302500:47:40
cs00000int_000536289101:07:214663400:09:0040952501:21:33
Scenario #2 1 Full Harvest + 1 IncrementalFullcs00000int_000131346501:07:053971100:06:2835317601:06:05
cs00000int_000217000100:35:223000000:05:0020000100:38:42
cs00000int_000318386700:42:501837800:03:2020224500:40:59
cs00000int_000417479300:40:135823200:08:4623302500:44:32
cs00000int_000536289101:14:554663400:07:4938399601:10:23
Scenario # 2 1 Full Harvest + 1 Incremental10kcs00000int_000110k00:02:0610k00:02:0010k00:02:11
cs00000int_000200:01:5700:01:5700:01:58
cs00000int_000300:02:1000:01:5800:02:09
cs00000int_000400:02:0500:01:4400:02:00
cs00000int_000500:02:0800:02:0000:02:06
50kcs00000int_000150k00:10:363971100:06:0750k00:10:32
cs00000int_000200:09:503000000:04:4200:10:08
cs00000int_000300:10:381837800:03:1200:10:27
cs00000int_000400:10:205010000:06:1700:10:08
cs00000int_000500:10:434663400:07:0000:10:36
500kcs00000int_000131346500:56:32
unavailable27629800:49:17
cs00000int_000217000100:31:36
unavailable20000100:36:19
cs00000int_000318386700:37:00
unavailable20224500:38:31
cs00000int_000417479300:34:53
unavailable23302500:41:38
cs00000int_000536289101:03:14
unavailable27029700:48:40
1 FULL HARVEST SEQUENTIALLY TEST #1


TenantDuration, hh:mm:ssreturned_instances_counterfailed_instances_counter
EBSCO HarvesterSRS cs00000int_000100:34:44313465No logs for failed instances
cs00000int_000200:17:22170001
cs00000int_000300:21:25183867
cs00000int_000400:19:20174887
cs00000int_000500:41:58363361


Duration (for all jobs):02:14:49



Total (full harvest):1205581

2 full harvests resources utilization

Test #1.  Record source = Source record storage

Service CPU Utilization


Service Memory Utilization


.

RDS CPU Utilization


.

RDS Database Connections


Response Times Table

Test #2.  Record source = Source record storage

Service CPU Utilization


Service Memory Utilization


.

RDS CPU Utilization

DB utilization during harvesting didn't exceed 25% in average. Spikes on the screen were up to 60% connected to DI create jobs with 10k records.

.

RDS Database Connections

Connection number for 124 concurrent harvests do not exceed 705.

Response Times Table

At this table we observe that response times for 124 concurrent harvests start from 20 sec per second.

Throughput Table

In this table active users equal to number of harvests. Here we observe that throughput start to decrease from 10 requests if number of harvests less than 14. At this point response times is 1.25 sec.

Test #3  Record source = Inventory

Harvests has completed. No harvests carried out because of lack of data at this data set. The following 1 sets was used for the harvesting: all.

Data needed to harvest should contain 'source' = 'CONSORTIUM-FOLIO' for shared instances or 'source' = 'FOLIO' for not shared ones in mod_inventory_storage.instance

Test #4.  Record source = Source record storage and inventory

Service CPU Utilization


Service Memory Utilization


.

RDS CPU Utilization


.

RDS Database Connections


Response Times Table

The results are almost the same as in SRS record source test.

Throughput Table

1 full + 1 incremental harvesting resources utilization

Test #5.  Record source = Source record storage

Service CPU Utilization


Service Memory Utilization


.

RDS CPU Utilization


.

RDS Database Connections


Response Times Table

Throughput Table

Database load

Top SQL

 Top SQL comparing

Here we compare queries: one with high average latency and other with low. Difference only in schemas. cs00000001_0042 has 630k and cs00000001_0003 - 136k.

Harvesting for 0003 started with full + incremental 10k. after 10k finished it started a new incremental 100k
Harvesting for 0042 started with full + incremental 10k, after 10k finished it started a new incremental 100k and 500k. 

                                             --- Calls/sec - 0.09, Rows/sec - 9.56, Avr. Latency - 237.63 ---

SELECT * FROM cs00000001_0003_mod_oai_pmh.get_instances_with_marc_records inst
 WHERE inst.instance_id > 'e74b436b-ac91-5fd1-9790-bc825329ec84'::uuid
    AND ( inst.source = 'MARC'
 OR inst.source = 'MARC_SHARED' OR inst.source = 'CONSORTIUM-MARC') ORDER BY instance_id
LIMIT 101;

                                             --- Calls/sec - 0.44, Rows/sec - 44.64, Avr. Latency - 55.90 ---
SELECT * FROM cs00000001_0042_mod_oai_pmh.get_instances_with_marc_records inst
 WHERE inst.instance_id > 'c8ed4322-7ec9-5f81-b01a-18075b06c12e'::uuid
    AND ( inst.source = 'MARC'
 OR inst.source = 'MARC_SHARED' OR inst.source = 'CONSORTIUM-MARC') ORDER BY instance_id
LIMIT 101;

Test #6.  Record source = Source record storage

 Memory and CPU utilization, %
MemoryCPU

mod-inventory-b

58.99

mod-inventory-b

24.12

mod-quick-marc-b

58.51

mod-oai-pmh-b

13.07

okapi-b

55.28

mod-pubsub-b

10.13

mod-source-record-manager-b

46.05

edge-oai-pmh-b

9.11

mod-source-record-storage-b

38.19

mod-quick-marc-b

6.99

mod-circulation-storage-b

33.91

okapi-b

5.85

mod-oai-pmh-b

26.95

mod-source-record-storage-b

3.87

mod-pubsub-b

26.88

mod-source-record-manager-b

3.23

edge-oai-pmh-b

24.90

nginx-okapi

1.78

mod-inventory-storage-b

15.04

mod-circulation-storage-b

1.22

nginx-okapi

4.91

mod-inventory-storage-b

0.93

pub-okapi

4.46

pub-okapi

0.13


Service CPU Utilization

mod-oai-pmh - 13% for two harvests, 7% for one after first is finished.

Service Memory Utilization

mod-oai-pmh - 26%

.

RDS CPU Utilization

2 harvests - 6%, 1 harvest - 4%

.

RDS Database Connections

During test 1000 connections in Average.

Response Times Table

Throughput Table

Database load

Top SQL


 Top SQL
SELECT * FROM cs00000001_0042_mod_oai_pmh.get_instances_with_marc_records inst
 WHERE inst.instance_id > '4fd0d955-b5dc-543e-be5b-bed10a46ce35'::uuid
    AND ( inst.source = 'MARC'
 OR inst.source = 'MARC_SHARED' OR inst.source = 'CONSORTIUM-MARC') ORDER BY instance_id
LIMIT 101;


SELECT * FROM cs00000001_0042_mod_oai_pmh.get_instances_with_marc_records inst
 WHERE inst.instance_id > 'aee9b375-f51a-588a-8afe-b89fefe87da0'::uuid
    AND ( inst.source = 'MARC'
 OR inst.source = 'MARC_SHARED' OR inst.source = 'CONSORTIUM-MARC')     AND inst.instance_updated_date >= cs00000001_0042_mod_inventory_storage.dateOrMin(timestamptz '2022-12-21T00:00:00Z')
    AND inst.instance_updated_date <= cs00000001_0042_mod_inventory_storage.dateOrMax(timestamptz '2024-01-07T00:00:00Z')
ORDER BY instance_id
LIMIT 101;


Test #7.  Record source = Source record storage

 Memory and CPU utilization, %
MemoryCPU
mod-quick-marc-b60.89mod-pubsub-b10.31
okapi-b57.48mod-oai-pmh-b9.37
mod-inventory-b54.56mod-inventory-b9.06
mod-source-record-manager-b48.82mod-quick-marc-b7.46
mod-source-record-storage-b40.16edge-oai-pmh-b6.23
mod-pubsub-b35.55okapi-b4.68
mod-circulation-storage-b35.41mod-source-record-storage-b2.41
edge-oai-pmh-b34.19mod-di-converter-storage-b2.21
mod-di-converter-storage-b29.24mod-source-record-manager-b1.67
mod-oai-pmh-b28.28nginx-okapi1.32
mod-inventory-storage-b17.06mod-inventory-storage-b0.67
nginx-okapi5.02pub-okapi0.10
pub-okapi4.80

Service CPU Utilization

mod-oai-pmh - 10%

Service Memory Utilization

mod-oai-pmh - 28%

.

RDS CPU Utilization

RDS was in range from 4 to 8% in average. The highest spike was 15%.

.

RDS Database Connections

During test 1000 connections in average with spikes to 1400 connections regularly - every 30 minutes.

Full Harvests Duration Table

 All harvests duration, instances per second

Filtered by returned_instances_counter (sort largest to smallest)

tenantstarted_datelast_updated_datereturned_instances_counterdurationDuration, in secondsInstances per second
cs00000001_00422024-01-30 07:58:05.270091+002024-01-30 09:24:25.143512+0063467301:26:195179122.55
cs00000001_00242024-01-30 23:32:06.328008+002024-01-31 00:12:09.513664+0038512100:40:032403160.27
cs00000001_00512024-01-31 05:19:16.762325+002024-01-31 05:55:05.370195+0035132300:35:482148163.56
cs00000001_00062024-01-30 19:47:56.994035+002024-01-30 20:18:04.047742+0029357900:30:071807162.47
cs00000001_00072024-01-30 20:18:04.928054+002024-01-30 20:45:31.841288+0027949100:27:261646169.80
cs00000001_00122024-01-30 21:44:15.496785+002024-01-30 22:07:21.604411+0022748700:23:061386164.13
cs00000001_00382024-01-31 02:09:28.951308+002024-01-31 02:30:32.110773+0022556200:21:031263178.59
cs00000001_00502024-01-31 05:00:49.438666+002024-01-31 05:19:16.116316+0019073400:18:261106172.45
cs00000001_00552024-01-31 06:25:03.701594+002024-01-31 06:43:58.086662+0017600700:18:541134155.21
cs00000001_00562024-01-31 06:43:58.675942+002024-01-31 06:59:40.582524+0016097500:15:41941171.07
cs00000001_00082024-01-30 20:45:32.795785+002024-01-30 21:05:08.613622+0016002300:19:351175136.19
cs00000001_00262024-01-31 00:23:26.324646+002024-01-31 00:39:18.350149+0015391100:15:52952161.67
cs00000001_00162024-01-30 22:21:16.557905+002024-01-30 22:38:40.512729+0015333200:17:231043147.01
cs00000001_00522024-01-31 05:55:06.081995+002024-01-31 06:11:33.012233+0013793800:16:26986139.90
cs00000001_00032024-01-30 19:02:32.304572+002024-01-30 19:22:49.097716+0013629000:20:161216112.08
cs00000001_00352024-01-31 01:36:48.717188+002024-01-31 01:47:48.176875+0013570300:10:59659205.92
cs00000001_00392024-01-31 02:30:32.811619+002024-01-31 02:44:00.517573+0012852000:13:27807159.26
cs00000001_00582024-01-31 07:05:18.862821+002024-01-31 07:22:08.347929+0012381600:16:491009122.71
cs00000001_00362024-01-31 01:47:48.70498+002024-01-31 02:02:30.044027+0011416800:14:41881129.59
cs00000001_00402024-01-31 02:44:01.342285+002024-01-31 02:57:34.101665+0011199100:13:32812137.92
cs00000001_00092024-01-30 21:05:09.266434+002024-01-30 21:20:52.835695+0011091700:15:43943117.62
cs00000001_00052024-01-30 19:33:59.876671+002024-01-30 19:47:55.908675+0011009100:13:56836131.69
cs00000001_00462024-01-31 04:33:45.182805+002024-01-31 04:44:16.75419+0010967600:10:31631173.81
cs00000001_00302024-01-31 00:56:23.730637+002024-01-31 01:08:34.146005+009758800:12:10730133.68
cs00000001_00112024-01-30 21:32:30.564789+002024-01-30 21:44:14.603914+009416900:11:44704133.76
cs00000001_00232024-01-30 23:20:35.961953+002024-01-30 23:32:05.687917+009151600:11:29689132.82
cs00000001_00612024-01-31 07:36:35.789584+002024-01-31 07:48:44.358093+008711500:12:08728119.66
cs00000001_00222024-01-30 23:08:15.819582+002024-01-30 23:20:34.978638+008689900:12:19739117.59
cs00000001_00252024-01-31 00:12:10.143554+002024-01-31 00:23:25.769649+008421400:11:15675124.76
cs00000001_00212024-01-30 22:56:27.762303+002024-01-30 23:08:14.66468+008398500:11:46706118.96
cs00000001_00102024-01-30 21:20:53.410569+002024-01-30 21:32:29.870923+008325300:11:36696119.62
cs00000001_00322024-01-31 01:14:54.996634+002024-01-31 01:24:50.266775+008007300:09:55595134.58
cs00000001_00452024-01-31 04:21:11.747423+002024-01-31 04:33:44.488302+007831800:12:32752104.15
cs00000001_00042024-01-30 19:22:50.065381+002024-01-30 19:33:58.807122+007778400:11:08668116.44
cs00000001_00312024-01-31 01:08:35.025465+002024-01-31 01:14:54.197098+007189400:06:19379189.69
cs00000001_00442024-01-31 04:11:28.963726+002024-01-31 04:21:11.011287+007087000:09:42582121.77
cs00000001_00012024-01-30 18:47:33.401981+002024-01-30 18:57:25.506314+006535800:09:52592110.40
cs00000001_00282024-01-31 00:46:46.489521+002024-01-31 00:55:33.616289+006455400:08:47527122.49
cs00000001_00532024-01-31 06:11:33.683951+002024-01-31 06:21:27.297008+006183300:09:53593104.27
cs00000001_00412024-01-31 02:57:34.910847+002024-01-31 03:06:51.97583+006108900:09:17557109.68
cs00000001_00482024-01-31 04:51:58.861674+002024-01-31 04:59:49.819602+005334500:07:50470113.50
cs00000001_00472024-01-31 04:44:17.709472+002024-01-31 04:51:58.309158+004853000:07:40460105.50
cs00000001_00272024-01-31 00:39:18.818234+002024-01-31 00:46:45.719918+004817500:07:26446108.02
cs00000001_00332024-01-31 01:24:51.028727+002024-01-31 01:33:02.341647+004588400:08:1149193.45
cs00000001_00202024-01-30 22:49:10.153732+002024-01-30 22:56:27.024515+004425700:07:16436101.51
cs00000001_00602024-01-31 07:28:54.094613+002024-01-31 07:36:34.714826+003839400:07:4046083.47
cs00000001_00592024-01-31 07:22:09.021211+002024-01-31 07:28:53.445342+003771300:06:4440493.35
cs00000001_00152024-01-30 22:15:32.594419+002024-01-30 22:21:15.705491+002689800:05:4334378.42
cs00000001_00372024-01-31 02:02:30.987839+002024-01-31 02:09:28.411798+002422500:06:5741758.09
cs00000001_00572024-01-31 06:59:41.571592+002024-01-31 07:05:17.933701+002259500:05:3633667.25
cs00000001_00022024-01-30 18:57:26.363148+002024-01-30 19:02:31.4724+002236900:05:0530573.34
cs00000001_00182024-01-30 22:42:19.32858+002024-01-30 22:48:26.683217+002226200:06:0736760.66
cs00000001_00142024-01-30 22:10:54.509386+002024-01-30 22:15:31.769242+002074400:04:3727774.89
cs00000001_00132024-01-30 22:07:22.488948+002024-01-30 22:10:53.870081+001913000:03:3121190.66
cs00000001_00342024-01-31 01:33:03.311241+002024-01-31 01:36:48.087995+001823900:03:4422481.42
cs00000001_00172024-01-30 22:38:41.130227+002024-01-30 22:42:18.265948+001649000:03:3721775.99
cs00000001_00542024-01-31 06:21:27.853456+002024-01-31 06:25:02.81652+001629600:03:3421476.15
cs00000001_00492024-01-31 04:59:50.428684+002024-01-31 05:00:48.785603+00695100:00:5858119.84
cs00000001_00192024-01-30 22:48:27.522764+002024-01-30 22:49:09.628092+00531100:00:4242126.45
cs00000001_00292024-01-30 15:58:19.508812+002024-01-30 15:59:25.424545+00513900:01:056579.06
cs00000001_00432024-01-31 04:11:22.798694+002024-01-31 04:11:28.361622+0083000:00:055166.00
cs00000001_00622024-01-31 07:48:45.009151+002024-01-31 07:48:45.009151+000


Database load

Top SQL


Appendix

 Methodology/Approach MCPT, PCON

Methodology/Approach

OAI-PMH (incremental harvesting) was carried out by JMeter script from carrier with 2 main requests: 

  • /oai/records?verb=ListRecords&metadataPrefix=marc21_withholdings&apikey=[APIKey]
  • /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken]

to extract the required number of records was used loop counter with following configuration:

  • 98 loop counts for 10K records;
  • 498 loop counts for 50K records;
  • 4999 loop counts for 500K records;

To run the incremental harvesting test the next time ranges were defined by experimental means. The time range for Test 2* was extended due to the impossibility of harvesting the defined number of records, but the next tests were run after adding 800K instances to database.


Start date Until date
Test 1.2022-12-212023-12-10


OAI-PMH (full harvesting)

Before running OAI-PMH with full harvest, following database commands to optimize the tables were executed (from https://folio-org.atlassian.net/wiki/display/FOLIOtips/OAI-PMH+Best+Practices#OAIPMHBestPractices-SlowPerformance):

REINDEX index <tenant>_mod_inventory_storage.audit_item_pmh_createddate_idx ;
REINDEX index <tenant>_mod_inventory_storage.audit_holdings_record_pmh_createddate_idx;
REINDEX index <tenant>_mod_inventory_storage.holdings_record_pmh_metadata_updateddate_idx;
REINDEX index <tenant>_mod_inventory_storage.item_pmh_metadata_updateddate_idx;
REINDEX index <tenant>_mod_inventory_storage.instance_pmh_metadata_updateddate_idx;
analyze verbose <tenant>_mod_inventory_storage.instance;
analyze verbose <tenant>_mod_inventory_storage.item;
analyze verbose <tenant>_mod_inventory_storage.holdings_record;


Execute the following query in a related database for removing existed 'instances' created by previous harvesting request and a request itself:

TRUNCATE TABLE fs09000000_mod_oai_pmh.request_metadata_lb cascade

 Full harvesting test with consortia enabled triggered from central tenant level was running from ptf-windows machine using EBSCO Harvester. The following cmd command (cmd should be run in the same directory as EBSCO Harvester) start EBSCO Harvester:

OAIPMHHarvester.exe -HarvestMode=full -DefinitionId=pcon-marc21-with-holdings -HarvesterWebClientTimeout_Seconds=0s=0

To start full harvesting sequentially for all member tenants use APIKEY of central tenant. With definition:

 Harvest definition

<?xml version="1.0" encoding="UTF-8"?>
<HarvestDefinition xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="HarvestDefinition.xsd">
    <id>PCON</id>
    <Description>PCON</Description>
    <Urls>
      <!-- include as many as necessary/provided -->
    <Url>https://edge-ptf-consortium.int.aws.folio.org/oai/eyJzIjoiYm5PUWN4dVZabyIsInQiOiJjczAwMDAwaW50IiwidSI6IkVCU0NPRWRnZSJ9</Url>
    

    </Urls>

    <!-- 0 if no throttle required -->
    <ThrottledInMiliseconds>0</ThrottledInMiliseconds>
    
    <!-- metadata to harvest -->
    <MetadataFormat>marc21_withholdings</MetadataFormat>
    
    <!-- Harvested, None, or Custom. If Custom, specific as many child setSpecs as necessary. These will act as filters when harvesting -->
    <!-- Enables the user to segment harvesting requests by setting the StartDate and WindowSizeInDays parameters -->
    
    <Sets use="Harvested">
     </Sets>
     <!--<Sets use="Custom"> -->
    <!--<setSpec>NameOfSetSpecGoesHere</setSpec> -->
    <!--<setSpec>NameOfSetSpecGoesHere</setSpec>
    </Sets> -->
</HarvestDefinition>

To start full or incremental harvesting in parallel triggered from member tenant level use APIKEY of member tenant (host name should stay as for central tenant). It was implemented by jMeter script.

Preparing APIKEY for member tenants "s" parameter should be the same as it is in the central tenant.

Infrastructure

 Infrastructure MCPT

Infrastructure

Environment: MCPT
Release: Poppy (2023 R2)

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)
  • 2 instances of db.r6.4xlarge database instances, one writer 
  • MSK tenant
    • 4 brokers
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • og.retention.minutes=480
    • default.replication.factor=3

Modules

ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
mcpt-pvt
Tue Jan 30 09:03:55 UTC 2024
mod-remote-storage10mod-remote-storage:3.0.124920447210243960512512FALSE
mod-inventory-update6mod-inventory-update:3.2.12102489612876888128FALSE
pub-edge1pub-edge:2023.06.142102489612876800FALSE
mod-inventory-storage10mod-inventory-storage:27.0.424096369020483076384512FALSE
edge-oai-pmh9edge-oai-pmh:2.7.221512136010241440384512FALSE
mod-circulation-storage10mod-circulation-storage:17.1.722880259215361814384512FALSE
mod-source-record-storage10mod-source-record-storage:5.7.525600500020483500384512FALSE
mod-source-record-manager10mod-source-record-manager:3.7.725600500020483500384512FALSE
mod-quick-marc10mod-quick-marc:5.0.11228821761281664384512FALSE
nginx-okapi2nginx-okapi:2023.06.1421024896512000FALSE
okapi-b1okapi:5.1.23168414401024922384512FALSE
mod-oai-pmh9mod-oai-pmh:3.12.824096369020483076384512FALSE
pub-okapi1pub-okapi:2023.06.142102489612876800FALSE
 Infrastructure PCON

Infrastructure

Environment: PCON
Release: Poppy (2023 R2)

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)
  • 2 instances of db.r6.xlarge database instances, one reader, and one writer 
  • MSK tenant
    • 4 brokers
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • og.retention.minutes=480
    • default.replication.factor=3

Modules

ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcon-pvt

pub-edge2pub-edge:2023.06.142102489612876800FALSE
mod-inventory-storage1mod-inventory-storage:27.0.024096369020483076384512FALSE
edge-oai-pmh1edge-oai-pmh:2.7.021512136010241440384512FALSE
mod-circulation-storage1mod-circulation-storage:17.1.022880259215361814384512FALSE
mod-source-record-storage1mod-source-record-storage:5.7.025600500020483500384512FALSE
mod-inventory1mod-inventory:20.1.022880259210241814384512FALSE
mod-source-record-manager1mod-source-record-manager:3.7.025600500020483500384512FALSE
mod-quick-marc1mod-quick-marc:5.0.01228821761281664384512FALSE
nginx-okapi2nginx-okapi:2023.06.1421024896128000FALSE
okapi-b2okapi:5.1.13168414401024922384512FALSE
mod-oai-pmh1mod-oai-pmh:3.12.024096369020483076384512FALSE
pub-okapi2pub-okapi:2023.06.142102489612876800FALSE