OAI-PMH data harvesting[Incremental + Full] (Poppy consortia)

Overview

  • The purpose of the OAI-PMH Full Harvesting tests and Incremental Harvesting tests is to measure performance of Poppy release and to find possible issues, bottlenecks per PERF-706 - Getting issue details... STATUS on MCPT environment.
  • Two jMeter scripts for 2 scenarios were prepared to concurrently execute OAI-PMH by initiating harvesting from the member tenant level. To simulate the behaviour of EBSCO Harvester a delay after each request was used (150-300 ms) in the script.

  • First script should trigger two full havests on 62 tenants creating 124 concurrent harvests. Second scenario should trigger 1 full and 1 incremental where tenants have 10k, 100k and 500k accordingly. In this second scenario incremental harvests should start from 10k, then 100k and 500k. 
  • In total, 124 harvests for the first scenario and 143 harvests during 2 scenario (62 full + 57 incremental with 10k, + 23 with 100k + 1 with 500k).
  • Baseline test for 1 Full + 1 incremental with 500k on 1 tenant should be run against tenant with highest number of instances.  
 Overview PCON
  • The purpose of the OAI-PMH Full Harvesting tests and Incremental Harvesting tests is to measure performance of Poppy release and to find possible issues, bottlenecks per PERF-706 - Getting issue details... STATUS on PCON environment.
  • jMeter scripts were prepared to concurrently execute two OAI-PMH harvests on each tenant by initiating harvesting from the member tenant level. To simulate the behavior of EBSCO Harvester a delay after each request was used (200-400 ms) in the script. 2 scripts were triggered from carrier-io concurrently to carry out 1 Full and 1 Incremental harvesting on each member tenant. One was for Full harvesting and second one was for Incremental. In total, 10 harvests were executed simultaneously during the test.

Summary

  • In mcpt OAI-PMH can concurrently perform 124 harvests - 2 full harvests on each 62 member tenants with constant average throughput 10 requests or 1000 records per second during the test.
  • Average response times (RTs) for /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] request in one tenant test - 0.580 sec,, in 62 tenants test - 18.3 sec. Harvesting work faster when RTs low and it happen when number of harvests decrease. Response times for 2 full harvest on 1 tenant only - 0.540 sec.
  • When a number of harvests increase from 1 to 14 a total throughput grow up to 10 request per second. If number of harvests higher than 14 the throughput do not change. Just response times grow.
  • Duration for 2 full harvests on 62 tenants - 4 hours 30 minutes in scenario 1
  • Duration for 10k incremental harvest - 25 min, 100k - 1 hour 52 minutes, 500k - 3 hours - in scenario 2
  • If to run 1 full and 1 incremental 500k on 1 tenant we have full harvest duration - 01:26:19. Incremental harvest duration - 01:09:46. Average response time 0.601 second. 
  • Duration of full harvest triggered from central tenant level for all 62 tenants running sequentially - 13 hours 22 minutes.
  • CPU utilization in mod-oai-pmh didn't exceed 44% over all tests.
  • Memory utilization  in mod-oai-pmh was not higher 49%. Mod-inventory - 61% in scenario 2, 85% in scenario 1 (here it grew to this level because two 10k DI create jobs were tested along the oai-pmh). Additional DI along the test didn't affect oai-pmh response times. No memory leak trends found. More info in resource utilization table.
  • RDS CPU utilization was 25% for first scenario, not higher 30% at the beginning of test in second scenario.
  • DB connections - 700 in first scenario with SRS record source, 1400 with SRS + Inventory and 1350 in second scenario with SRS.
 Summary PCON
  • OAI-PMH can concurrently operate for a minimum of 5 member tenants with negligible performance decline. The time taken is directly proportional to the quantity of requests made.
  • Average response times (ms) for /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] was 0.618

Recommendations & Jiras

Test Runs & Results

Two jMeter scripts for 2 scenarios were prepared to concurrently execute OAI-PMH by initiating harvesting from the member tenant level. To simulate the behaviour of EBSCO Harvester a delay after each request was used (150-300 ms) in the script.

In total, 124 harvests were executed concurrently during the test for the first scenario and 143 harvests during 2 scenario (62 full + 57 incremental with 10k, + 23 with 100k + 1 with 500k) .

This table contains durations for harvests in tenants with approximate numbers of records 25k, 50k and 100k for these scenarios. And also results of incremental harvests of 10k, 100k and 500k records.

RESULTS for tests #1 - #5
Record source



SRSInventorySRS+Inventory

Full / IncrementalTenantInstancesDurationInstancesDurationInstancesDuration
Scenario #1 2 Full HarvestsFullcs00000001_00372422600:56:11
No data2422600:52:27
cs00000001_00485334601:39:24
No data5334601:40:50
cs00000001_00309758802:36:17
No data9758802:30:48
cs00000001_004263467304:28:34
No data63467304:21:41







Scenario #2 1 Full Harvest + 1 IncrementalFullcs00000001_00372422600:49:52
No data

cs00000001_00485334601:28:33
No data

cs00000001_00309758802:15:58
No data

cs00000001_004263467303:41:44
No data

Incrementalcs00000001_003710k00:25:02
No data

cs00000001_0046100k01:52:01
No data

cs00000001_0042500k02:59:13
No data

Test #1

2 Full harvests on 62 member tenants. were carried out concurrently with SRS record source. During test 22 tenants were harvested fully without problems. The largest amount of records for these tenants was 53345 (schema - cs00000001_0048). The rest member tenants (40) stopped harvesting with 54700 records in Average and didn't finish successfully. Records were harvested equally all over the tenants. For some reason load generator stopped execution of the test. It was retested in test #2.

Average response times (RTs) for 100 records request /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] with 124 concurrent harvests was  22 seconds but with 80 harvests it decreased to 15 seconds.

Duration for 24225 records - 00:59:40,  53345 records - 01:48:40

Test #2

2 Full harvests on 62 member tenants with SRS record source. Harvesting finished successfully.

Average response times (RTs)  for request /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken]  for harvesting on one tenant - 580 ms, for 62 tenants - 18.3 sec. Harvesting work faster when RTs low and it happen when number of harvests decrease.

Duration for 24225 records - 00:56:11,  53345 records - 01:42:20, 97588 records - 02:36:17

Test #3

2 Full harvests on 62 member tenants with Inventory record source. Finished successfully without any load. No data to harvest.

Test #4

2 Full harvests on 62 member tenants with SRS + Inventory record source. 

RTs for 100 records request with 124 concurrent harvests (at the beginning of the test) - 14,5 sec

Duration for 24225 records - 00:52:29,  53345 records - 01:40:50, 97588 records - 02:30:48

Test #5

1 Full harvest on 62 member tenants and 1 incremental harvest with 10k on 57 tenants, pause 2 minutes and then 1 incremental with 100k on 23 tenants, pause and 1 incremental with 500k records with SRS record source.

As far as harvests with 10k ended in 25 minutes response times improved for the rest of harvests from 14.4 seconds in Average to 7,4 seconds for full and 6.7 seconds for incremental. Closer to the end of test number of harvests decreased from 19 to 8 with following response times 1.32 seconds for full and 1.11  seconds for incremental. In range from 8 to 2 harvests response times 0.704 seconds for full and 0.622  seconds

Test #6

1 Full harvest + 1 incremental 500k concurrently on cs00000001_0042 tenant with 634k instances.

Full harvest duration - 01:26:19. 

Incremental harvest duration - 01:09:46

Test #7

1 Full harvest from the central tenant level trigger full harvests for all tenants sequentially. It was carried out from EBSCO harvester AWS windows machine. 

Duration for 62 tenants: 13 hours 22 minutes. Number of harvested records: 6495617. Additional info about test in Full Harvests Duration Table.

 CPU and Memory utilization for tests #1 - #5