OAI-PMH Support (UXPROD-993)

[EDGOAIPMH-9] OAI-PMH: Flow Control/Throttling Created: 28/Sep/18  Updated: 14/Nov/18  Resolved: 08/Nov/18

Status: Closed
Project: edge-oai-pmh
Components: None
Affects versions: None
Fix versions: None
Parent: OAI-PMH Support

Type: Story Priority: P3
Reporter: Hkaplanian Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: epam-thunderjet
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to UXPROD-350 OAI-PMH Support Closed
Sprint: oai-pmh - sprint 51
Development Team: Thunderjet
Epic Link: OAI-PMH Support

 Description   

Official specification: http://www.openarchives.org/OAI/2.0/guidelines-repository.htm#FlowControl

The OAI-PMH repository implementation guide (link above) describes a method of throttling harvester requests. It's pretty low-tech, but we should implement it (at the edge) in case some harvesters expect or rely on this sort of feedback.

  • Make the "backoff" time configurable
  • Make the threshold at which we start returning 503s configurable as well.
  • Both of these configurable values should eventually be configurable per tenant (e.g. via mod-configuration), backed up by system properties (as non-tenant-specific defaults), and finally by reasonable hard-coded values if neither are available.
  • There must be a way to disable this feature via configuration.


 Comments   
Comment by Pavel Korolenok [ 08/Nov/18 ]

Hi Craig McNally,

I think we need to work out the details as to when edge service should return 503, specifically what would be an indicator that the service is overloaded. A couple of options to begin the discussion are:
1. When the request to mod-oai-pmh times out
Currently we return 408 status code in this case, but may return 503 with an assumption (because we don't really know why) that time-out happened due to high load
2. Make use of circuit-breaker to calculate number of unexpected errors returned by mod-oai-pmh and return 503 once certain threshold is reached
This is questionable option, because unexpected errors are not an indicator that service is overloaded

Let's discuss it on the grooming meeting.

Comment by Craig McNally [ 08/Nov/18 ]

After discussing with Hkaplanian we decided to scrap this.

  • If the community feels that request quotas/throttling is something that's important enough to build into FOLIO (as opposed to leaving it up to the party hosting the FOLIO instance), it should probably be handled in a common/central place, i.e. OKAPI.
  • The solution described in the implementation guide, while I'm sure was more than reasonable at one time, seems like a kludge in this day and age.
  • API management software (e.g. Redhat's 3Scale) is more likely to be used in enterprise installations of FOLIO, making this rudimentary throttling obsolete in many cases.

So, I'm closing this as "won't do"

Generated at Fri Feb 09 00:13:42 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.