OAI-PMH Support (UXPROD-993)

[MODOAIPMH-12] OAI-PMH: Implement verb "ListRecords" Created: 28/Sep/18  Updated: 14/Nov/18  Resolved: 09/Nov/18

Status: Closed
Project: mod-oai-pmh
Components: None
Affects versions: None
Fix versions: 1.0.0
Parent: OAI-PMH Support

Type: Story Priority: P3
Reporter: Hkaplanian Assignee: Hkaplanian
Resolution: Done Votes: 0
Labels: epam-thunderjet
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File MarcEditor_Harvester_Editor.png     PNG File MarcEditor_Harvester_Results.png     PNG File MarcEditor_Harvester_Settings.png    
Issue links:
Blocks
is blocked by MODOAIPMH-17 OAI-PMH: Implement verb "GetRecord" Closed
Relates
relates to UXPROD-350 OAI-PMH Support Closed
Sprint: oai-pmh - sprint 50
Story Points: 5
Development Team: Thunderjet
Epic Link: OAI-PMH Support

 Description   

Official specification: https://www.openarchives.org/OAI/openarchivesprotocol.html#ListRecords

1. Implementation
2. Error conditions
3. Date ranges (from/until)
4. Sets ("all")

This story does not include resumptionTokens. That will be handled in a separate story.

Can probably reuse some of the ListIdentifiers code

For now multiple calls to inventory may be required.

Use a default limit of 100 records for now



 Comments   
Comment by Piotr Kalashuk [ 01/Nov/18 ]

The default implementation uses Instance Storage API:

  1. First logic calls /instance-storage/instances endpoint to get instance UUID's.
    • In case the instances are returned successfully, the step #2 is triggered
    • In case no instances returned, the OAI-PMH response is build with noRecordsMatch error and sent to client
    • In case instance-storage returns failure status code, the further processing is stopped and mod-oai-pmh returns response with 500 status code
  2. Once success response comes, the logic builds OAI records with header and calls asynchronously /instance-storage/instances/{instanceId}/source-record/marc-json endpoint to get marc data for each record. The resulting mod-oai-pmh response depends on the response from instance-storage service:
    • If marc-json returns success status code with marc data, the OAI record is updated with metadata
    • If marc-json returns 404 status code, the Record is skipped i.e. it is not being added to the ListRecords response
    • If marc-json returns other failure code, the processing is stopped and mod-oai-pmh returns response with 500 status code

Regards,
Piotr

Comment by Piotr Kalashuk [ 01/Nov/18 ]

Also we've noticed that /instance-storage/instances/{instanceId}/source-record/marc-json returns ind1 and ind2 with 2 backslash symbols as a value when space is intended to be there e.g.

...
  "035": {
    "ind1": "\\",
    "ind2": "\\",
    "subfields": [{
      "a": "(DE-599)GBV727867881"
    }]
  }
...

According to MARC21slim.xsd schema the value is expected in following format (i.e. only one character which is a digit, letter or space)

<xsd:simpleType name="indicatorDataType" id="ind.st">
  <xsd:restriction base="xsd:string">
    <xsd:whiteSpace value="preserve"/>
    <xsd:pattern value="[\da-z ]{1}"/>
  </xsd:restriction>
</xsd:simpleType>

So once marc-json is converted by marc4j to marc xml format and then the logic adds metadata record to OAIPMH response, jaxb complains that the content is invalid according to schema. Even if the jaxb validation is disabled, the response is not valid and harvesters won't be able to handle it (we've checked using MarcEdit tool)
To resolve this issue we've applied a change to replace backslash character by space.

Regards,
Piotr

Comment by Piotr Kalashuk [ 01/Nov/18 ]

Sample of the harvested records via MarcEditor:

Configs
Results
Editor

Note: edge API requires apikey parameter. The only way to add it in MarcEditor is to specify this in Set Name field like all&apikey=Z2luMHVGdjNMZl10kWt1X2Rpa3U=

Regards,
Piotr

Comment by Piotr Kalashuk [ 05/Nov/18 ]

The changes have been merged to master.

Regards,
Piotr

Generated at Fri Feb 09 00:13:39 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.