Spike: MODKBEKBJ-360 - Investigate method for bulk retrieval of resources

MODKBEKBJ-360 - Getting issue details... STATUS  IN REVIEW

Problem: mod-agreements communicates with mod-kb-ebsco-java to retrieve information about the resources. This interaction happens through the endpoint which accepts a single id parameter and returns a single entity.  In case of loading n resources the mod-agreements should perform n requests to mod-kb-ebsco-java, which may cause the delay for display information.

This spike is addressed to provide a solution for reducing a number of requests to  mod-kb-ebsco-java and  introduction of an endpoint to accept list of resources.

Findings

  • Option 1 - Introduce new GET endpoint  with body 

GET /resources/bulk

with body

{ "resources": ["0-1-12345", "0-123-12345", "01-23-34522", "0-12-234567"] }

Advantages:

Based on the specification there is no restriction on usage payload for GET request 

A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.



Disadvantages

  1. It is not commonly used approach
  • Option 2 - Use GET endpoint with additional parameter 'id'

GET /resources?id=resource_id_1,...,resource_id_n

or

GET /resources/bulk?id=resource_id_1&...&id=resource_id_n

Advantages

  1. No need in new endpoint creation

Disadvantages

  1. HTTP protocol does not defined limitation for URI according to specification but the server and browsers do. General recommendation is no longer than 2000 characters.
   HTTP does not place a predefined limit on the length of a
   request-line, as described in Section 2.5.  A server that receives a
   method longer than any that it implements SHOULD respond with a 501
   (Not Implemented) status code.  A server that receives a
   request-target longer than any URI it wishes to parse MUST respond
   with a 414 (URI Too Long) status code (see Section 6.5.12 of
   [RFC7231]).
   Various ad hoc limitations on request-line length are found in
   practice.  It is RECOMMENDED that all HTTP senders and recipients
   support, at a minimum, request-line lengths of 8000 octets.
  • Option 3 - Introduce new POST endpoint 

POST /resources/bulk

with body 

{
	"resources": ["0-1-12345", "0-123-12345", "01-23-34522", "0-12-234567"]
}

Advantages

  1. HTTP protocol does not defined limitation for the body of the POST method but server and browsers do.

Disadvantages

  1. The semantics of POST is intended to create entries but not fetching them.

Selected solution:

During the presentation of the spike it was decided that the best approach to solve current issue is to use POST method to load resources. Below you may find the sequence diagram.

According to the mod-agreements file  mod-agreements is interested in following attributes:

  • data.type
  • data.attributes.publicationType
  • data.attributes.name
  • data.attributes.providerName
  • data.attributes.titleCount
  • data.attributes.customCoverages
  • data.attributes.managedCoverages

some of them are related to package and some to a resource

package attributesresource attributes
  • data.type
  • data.attributes.name
  • data.attributes.providerName
  • data.attributes.titleCount
  • data.attributes.customCoverages
  • data.type
  • data.attributes.publicationType
  • data.attributes.name
  • data.attributes.providerName
  • data.attributes.customCoverages
  • data.attributes.managedCoverages

We also had an assumption that usage of holdings table can simplify the work of loading but some of the properties are absent 

here is an example of loaded holding 

 Click here to expand...
publicationTitle=Interpretation: A Journal of Bible and Theology, printIdentifier=0020-9643, onlineIdentifier=2159-340X, 
dateFirstIssueOnline=1994-01-01, 
numFirstVolOnline=, 
numFirstIssueOnline=, 
dateLastIssueOnline=2014-10-01, 
numLastVolOnline=, 
numLastIssueOnline=, 
titleUrl=https://search.proquest.com/publication/41487, 
titleId=968683, 
embargoInfo=, 
coverageDepth=, 
notes=, 
publisherName=SAGE Publications, 
publicationType=serial, 
dateMonographPublishedPrint=, 
dateMonographPublishedOnline=, 
monographVolume=, 
monographEdition=, 
firstEditor=, 
parentPublicationTitleId=, 
precedingPublicationTitleId=, 
accessType=P, 
packageName=Research Library, 
packageId=4643, 
vendorName=Proquest Info & Learning Co, 
vendorId=22, 
resourceType=Journal
publicationTitle=Advances in Computer Science, Intelligent System and Environment, 
printIdentifier=978-3-642-23776-8, 
onlineIdentifier=978-3-642-23777-5, 
dateFirstIssueOnline=, 
numFirstVolOnline=, 
numFirstIssueOnline=, 
dateLastIssueOnline=, 
numLastVolOnline=, 
numLastIssueOnline=, 
titleUrl=https://link.springer.com/10.1007/978-3-642-23777-5, 
titleId=968675, 
embargoInfo=, 
coverageDepth=, 
notes=, 
publisherName=Springer Berlin Heidelberg, 
publicationType=monograph, 
dateMonographPublishedPrint=2011, 
dateMonographPublishedOnline=2011, 
monographVolume=, 
monographEdition=, 
firstEditor=, 
parentPublicationTitleId=, 
precedingPublicationTitleId=, 
accessType=P, 
packageName=Springer eBooks (Engineering 2011), 
packageId=4769, 
vendorName=Springer Nature, 
vendorId=36, 
resourceType=Book

seems that for managedCoverages dates two types of parameters are used, they are dateMonographPublishedPrint and dateFirstIssueOnline and  it is not clear what property is used for customCoverages if it is present. Also titleCount property is not present. So, the holding table is likely can not be used as a source of the truth. 

Jira Issues created -  MODKBEKBJ-385 - Getting issue details... STATUS   MODKBEKBJ-386 - Getting issue details... STATUS

 Sequence Diagram


Related links: