OAI-PMH - Expected behavior in ECS environment

Member library harvests

To perform a successful inventory harvest of a member library, each harvest will  include a combination of shared and local instance and SRS records for full and incremental harvests. The OAI-PMH functions that are currently supported in a single tenant configuration should remain operational. 

Harvesting shared instances

  1. Instance source set to CONSORTIUM-MARC
    1. marc21 - will harvest instances from the central tenant that are shared with the member library.  
    2. marc21withholdings - will harvest instances from the central tenant that are shared with the member library, enriched with the local tenant's holdings and items.
  2. Instance source set to  CONSORTIUM-FOLIO
    1. marc21 - will harvest all shadow instances of the instances from the central tenant that are shared with the member library.
    2. marc21withholdings  -will harvest all shadow instances of the instances from the central tenant that are shared with the member library, enriched with local tenant's holdings and items.

If the instance is marked as shared but none of the member libraries has holdings record associated with it then the instance will not be harvested.  To make sure the record is included in the harvest a member library needs to add a holdings record to it.

Harvesting local records

Harvesting local records should be the same as currently supported for records with source set to MARC or FOLIO.

Central tenant harvests

The requests sent to the central will be considered as cross tenant harvests and will continuously harvest records from each member library. The harvest of the central tenant only is not supported as the central tenant should not contain any holdings and item records and all instance records are shared with member libraries.

The cross-tenant harvest will:

  1. Retrieve the list of member libraries.
  2. For each member library the harvest occurs as described in Member library harvests section.  Each library's OAI-PMH settings are honored.
  3. Harvests completes once all members have been harvested.

All functionality currently supported by OAI-PMH for a single tenant should continue to work, with one exception.  If during the incremental (update) harvest one of the member libraries does not have records that match the provided parameters, the response from this member will be omitted and the harvest will move to the next tenant. There will not be a response with <error code="noRecordsMatch"> but there will be a response without matching records and with the resumption token pointing to the next tenant.


When harvesting continuously through member libraries, the records associated with each member will be differentiated by their identifiers.  That will require correct Base URL provided in OAI-PMH settings for each member library.   For example:






Handling deleted records

Library's OAI-PMH settings are honored the same way for cross tenant harvest and for member library harvest.  For example, if the library's OAI-PMH setting require persistent handling of deleted records, the the records will be handled with a deleted flag for cross-tenant and for member library harvest.

Deleted record support set to "Persistent"

All SRS records with LDR 05 set to "d" and Inventory instances deleted through API calls will be harvested with the "deleted flag.  The same behavior will be for shared (SRS records), shadow (FOLIO shared instances) and local records:


<record>
<header status="deleted">
<identifier>oai:folio.org/oai:fs09000000/ce064ce6-3d9c-4765-a3cf-564289f59b58</identifier>
<datestamp>2021-10-22T18:50:22Z</datestamp>
<setSpec>all</setSpec>
</header>
</record>


Deleted record support set to "No"

All SRS records with LDR 05 set to "d" and Inventory instances deleted through API calls will be omitted from harvest.  The same behavior will be for shared (SRS records), shadow (FOLIO shared instances)  and local records.

Handling suppressed records

If the shared instance is suppressed, then it will be handled as suppressed in each member library and the suppression flag will be applied to local holdings and items associated with this instance.  Suppression for local instances, holdings and items should work as implemented for a single tenant library and described here.

Records included in incremental harvests

Instance record  

The harvest should include all instance records that were added, deleted (or have LDR 05 set to "d") or were modified in the time bracket specified in the request.

Holdings record 

Applicable only for harvest with metadataParameter set to marc21_with holdings. The harvest should include:

  • Added holdings
  • Removed existing holdings
  • Updated:
    • Location
    • Call number
    • Electronic access
    • ILL policy (after MODOAIPMH-523 is implemented)


Item record 

Applicable only for harvest with metadataParameter set to marc21_with holdings.  The harvest should include:

  • Added item

  • Removed existing item
  • Updated:
    • Location
    • Call number
    • Barcode
    • Material type
    • Volume
    • Enumeration
    • Chronology
    • Copy number
    • Loan type
    • Electronic access

Concurrent harvests support

The system should support 

  1. Multiple full harvests:
    1. multiple tenant level harvests
    2. multiple cross-tenant harvests
    3. cross tenant and local harvests 
  2. Multiple incremental harvests
    1. multiple tenant level harvests
    2. multiple cross-tenant harvests
    3. cross tenant and local harvests 

Multiple harvest testing observations

During the testing of multiple full harvests (tenant level) it was noticed that for every new additional harvest started in parallel the total time for response for each harvests increases by 1-2 seconds. On bugfest dataset (approx. 8 millions records) if using 300 records per response, it will be 300 * ( 60 seconds / 2 seconds per response ) = 9000 per minute, or 9000 * 60 = 540_000 records per hour, so it will take approximately 8_000_000 / 540_000 = 15 hours. However, if adding 1 additional full harvest in parallel, it will be 300 * ( 60 seconds / 3 seconds per response ) = 6000 per minute, or 6000 * 60 = 360_000 records per hour and  it will take already 8_000_000 / 360_000 = 22 hours.

One of the possible ways to improve performance in parallel harvest is to increase number of instances of mod-oai-pmh module running on server. However, it should be tested additionally.

Logs

Central Tenant

Central tenant does not provide harvests logs - those logs are available in member library settings.

Member Library

Shows only logs of all library completed harvests (cross-tenant and local)

Testing Priorities:

  1. Single tenant harvest - Member library:
    1. Full harvest: 
      1. marc21_withholdings  - ListRecords
      2. marc21 -ListRecords
    2. Incremental harvest: 
      1. marc21_withholdings  - ListRecords
      2. marc21 -ListRecords
    3. marc21_withholdings - GetRecord
    4. marc21 - GetRecord
    5. all other combination of supported verbs and metadataParameter - time permitting 
  2. Cross-tenant harvest - started from the central tenant:
    1. Full harvest: 
      1. marc21_withholdings  - ListRecords
      2. marc21 - ListRecords
    2. Incremental harvest: 
      1. marc21_withholdings  - ListRecords
      2. marc21 -ListRecords
    3. GetRecord verb is not supported for requests send from the consortial tenant.  
    4. all other combination of supported verbs and metadataParameter - time permitting

Relevant documentation