Skip to end of banner
Go to start of banner

OAI-PMH data harvesting releases comparison

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Overview

As was mentioned in Jira - there is huge performance degradation between releases iris - juniper. According to Jira - previously harvesting process of 350K records takes ±2 hour and now on Juniper release it takes 4 hours. 

Test flow

Test consist of few calls:

Initial call:

  • /oai/records?verb=ListRecords&metadataPrefix=marc21_withholdings&apikey=[APIKey] - performing only once

Harvesting call:

  • /oai/records?verb=ListRecords&apikey=[APIKey]&resumptionToken=[resumptionToken] - performing repeatedly until there is no more data in [tenant]_mod_oai_pmh.instances table to harvest.

[resumptionToken] returning in initial call response and in each harvesting call until there is no more records to harvest. When all data has being harvested - resumptionToken will not return with hte response.


Environments to test on:

  • ICP1 - Iris release;
  • JCP1 - Juniper release;
  • IMTC1 - Latest Juniper release (and more powerful environment).


Software versions

IMTC1 (Juniper)JCP1 (Juniper)ICP1 (Iris)
mod-oai-pmh:3.5.0mod-oai-pmh:3.5.0mod-oai-pmh:3.4.2
edge-oai-pmh:2.3.1edge-oai-pmh:2.3.1edge-oai-pmh:2.3.1

mod-source-record-storage:5.1.5

mod-source-record-storage:5.1.6mod-source-record-storage:5.0.6
mod-source-record-manager:3.1.3mod-source-record-manager:3.1.3mod-source-record-manager:3.0.9
mod-inventory-storage:21.0.3mod-inventory-storage:21.0.3mod-inventory-storage:20.2.2
okapi:4.8.2okapi:4.8.2okapi:4.7.3


Test results 

Harvesting of 350K records.

IMTC1 (Juniper)JCP1 (Juniper)ICP1 (Iris)
1 hr 14 min45 min1 hr 47 min
1 hr 10 min42 min1 hr 26 min


envIMTC1 (Juniper)JCP1 (Juniper)ICP1 (Iris)
instances75641772521278072048
SRS records5972237103711186613


Conclusions:

  • As you can see from table above - we can say that Juniper release at least not slower than Iris release, or even faster.
  • We can see slowness on IMTC1 env (it should be more powerful than others). However it's still better or the same as ICP1 (Iris).
  • The only difference between IMTC1 and JCP1 is state of DB (different data set may affect the performance).
  • We'll need additional tests on same data set but on different releases.


Resource usage


IMTC



JCP1







ICP1







  • No labels