IN PROGRESS

key

Jira Legacy
server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
PERF-198
Table of Contents

Overview

The purpose of these set of tests is to measure performance of Kiwi release. Find possible issues, bottlenecks.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-198

Environment

Software versions (Test 1-2)

...

Original PTF dataset containing 1212039 1,212,039 underlying MARC records for 8.7M instances

Software versions (Test 3 with Bugfest Dataset)

mod-oai-pmh:3.6.1
edge-oai-pmh:2.4.0
mod-source-record-manager:3.2.6
mod-source-record-storage-5.2.5
mod-inventory-storage-22.0.3
okapi:4.9.0

Bugfest dataset containing 8034444 8,034,444 underlying MARC records for 8.3M instances

Summary

Kiwi release was able ho harvest 7,808,200 records in 19 hr 8 min (

...

1M records per 2

...

hours and 15 min).

...

Average response time per request 0.

...

874s.
A couple of issues were found
- OutOfMemory exception: MODOAIPMH-374
- Thread block issue: MODOAIPMH-374
- When instances didn't have underlying MARC records, multiple repeating calls from mod-edge-oai-pmh to mod-oai-pmh were occurred, resulting in the end-client receiving an timeout, see MODOAIPMH-383

Test flow

Test consist of few calls:

Initial call :that was performed only once

Code Block
/oai/records?verb=ListRecords&metadataPrefix=marc21_withholdings&apikey=[APIKey]

...

Subsequent harvesting calls:

Code Block
/oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken]

...

These calls were performed repeatedly, harvesting 100 records each time until there is no more data in [tenant]_mod_oai_pmh.instances table to harvest.

[resumptionToken] was set to 100, returning in initial call response and in each harvesting call until there is no more records to harvest. When all data has being harvested - resumptionToken will not return with the response.

...

Issues detected during testing

...

1) OutOfMemory exception. fixed in scope of MODOAIPMH-374
2) Thread block issue. fixed in scope of MODOAIPMH-374
3) DB timeout. MODOAIPMH-383 New issue appearing when we're starting DB transferring and harvesting process at the same time. It's leads to high load on DB and it responding with timeout
2021-12-01T10:02:42,566 ERROR [vert.x-eventloop-thread-0] MarcWithHoldingsRequestHelper Save instance Ids failed: Timeout.
io.vertx.core.impl.NoStackTraceThrowable: Timeout
Fifed with changing of data set to "bugfest" like

Test results

Test 1

...

Total Underlying SRS records: 1,212,039
Duration: 4 hr 57 min
Records

...

transferred: 4,770,043 (should be

...

8,415,303)
Records harvested -

...

20,618 X 100 = 2,061,800.

Total Underlying SRS records: 1,212,039

We can see here unstable part of test. This spikes on chart showing extremely increased response times. which leads to throughput gaps. At this point we still not sure this does it happening we've checked:

...

While data transferring process is going on the background DB CPU usage has reached 70%-75%.
Data transferring process has failed in 10 minutes and transfer only 4770043 from 8M records.
Harvesting itself consumes 15% DB CPU.

Test 2

Total Underlying SRS records: 1,212,039
Duration: 4 hr 25 min
Records transferred

...

: 3,815,867 (should be

...

8,415,303)
Records harvested - 22305 X 100 = 2 230 500.

Total Underlying SRS records: 1,212,039

...

Test 3 (with Bugfest Dataset)

Underlying MARC records: 8,034,444
Records transferred -

...

8,213,392
Records harvested - 78082 X 100 = 7,808,200
Time spent :

...

19 hr 8 min

...

Average response time for call with resumption token 0.874 ms

...

Notable observations:

Unstable parts of first couple of tests was were made by data set:
- instances Instances didn't have underlying records and this causes multiple repeating calls from mod-edge-oai-pmh to mod-oai-pmh.
- this This leads to end client to wait until oai-pmh will find records with underlying records. And often client fail with 504 getaway timeout (load balancer timeout 400 seconds).
Timeouts in DB fixed with changing of dataset to "bugfest" like.
Jira ticket created to handle client waits: MODOAIPMH-383

...

Versions Compared

Old Version 32

New Version 33

Key

Overview

Environment

Software versions (Test 1-2)

Software versions (Test 3 with Bugfest Dataset)

Summary

Test flow

Issues detected during testing

Test results

Test 1

Test 2

Test 3 (with Bugfest Dataset)

Notable observations:

Page Comparison

Versions Compared

Old Version 32

New Version 33

Key

Overview

Environment

Software versions (Test 1-2)

Software versions (Test 3 with Bugfest Dataset)

Summary

Test flow

Issues detected during testing

Test results

Test 1

Test 2

Test 3 (with Bugfest Dataset)

Notable observations: