[FOLIO-1747] include edge-oai-pmh in folio-snapshot/testing Created: 28/Jan/19  Updated: 03/Jun/20  Resolved: 01/Apr/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Jakub Skoczen Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks FOLIO-1630 include "edge" modules in daily snaps... Closed
is blocked by EDGCOMMON-19 ApiKeyUtils jar missing dependencies Closed
is blocked by EDGOAIPMH-28 Upgrade to edge-common v2.0.0 Closed
is blocked by FOLIO-1758 Add edge module mod descriptors to re... Closed
is blocked by FOLIO-1759 Create folio-ansible roles for deploy... Closed
Cloners
is cloned by FOLIO-1748 include edge-rtac in folio-snapshot/t... Closed
Sprint: Core: Platform - Sprint 60
Story Points: 3
Development Team: Core: Platform

 Description   

Deploy edge-oai-pmh in folio-snapshot and folio-testing.

The said module includes a ModuleDecriptor and a LaunchDecriptor but is non-standard in a several different ways:

  • the module does not register any paths (interface) with Okapi
  • the module may required a fixed port (not assigned by okapi) and that port should be exposed on the firewall
  • OR the firewall should be configured to route traffic based on fixed path


 Comments   
Comment by Jakub Skoczen [ 29/Jan/19 ]

Hongwei Ji can you please provide information about how the modules are deployed at EBSCO right now?

Comment by Jakub Skoczen [ 29/Jan/19 ]

John Malconian can you please fix up the description according to what we discussed during stand up?

Comment by Hongwei Ji [ 29/Jan/19 ]

Sure. Jakub Skoczen and John Malconian. We deploy all edge modules behind one dedicated load balancer (different from the one used by Okapi) for separation and scalability. To route traffic to different edge modules we use path based routing. For now we have following rules defined:

If request path is /prod/rtac/*, then forward to edge-rtac module;
If request path is /patron*, then forward to edge-patron module;
If request path is /orders*, then forward to edge-orders module;
If request path is /oai*, then forward to edge-oai-pmh module;
If request path is /resolve/*, then forward to edge-resolver module;

Currently edge modules need to convert API key to FOLIO institution user name and password. You have to store a copy of the username/password somewhere outside FOLIO. It is not ideal. Maybe we can consider to enhance FOLIO/Okapi to support API key authentication directly. Thanks.

Comment by John Malconian [ 29/Jan/19 ]

The load balancer configuration that Hongwei describes above is also what I was thinking - path-based routing utilizing the endpoints provided by the edge modules. I was thinking of deploying an nginx container configured with the path-based routes.

Additional information about credentials and other configuration options the edge modules require can be found at:

https://github.com/folio-org/edge-common

For folio-snapshot/testing, perhaps the EphemeralStore option (external config file) is the easiest.

Comment by Jakub Skoczen [ 29/Jan/19 ]

Hongwei Ji If we did the apiKey authentication in Okapi we would need to put the module behind it (as in a regular module), otherwise the authentication could not be enforced. Another solution would be let the Edge module create the institutional user at the point when Okapi informs it about a new tenant (see my last comment in FOLIO-1713 Blocked )

Comment by Hongwei Ji [ 29/Jan/19 ]

Hi Jakub Skoczen, auth could not be enforced, do you mean by Okapi? When client make a request with API key to edge modules, edge modules can check Okapi to see if the key is valid or not. If not, the request can be rejected. Does that count as enforced?

Comment by Jakub Skoczen [ 30/Jan/19 ]

Hongwei Ji My understanding is that the "edge" modules in questions are structured like this:

  • proxy module running in front of Okapi (edge-rtac, edge-oai-pmh)
  • regular module that implements the external protocol (mod-rtac, mod-oai-pmh)

What would be the purpose of the edge- part if we added apiKey authentication (and key provisioning) to Okapi?

Comment by Hongwei Ji [ 30/Jan/19 ]

I do not want to get into the debate of edge module purpose/design. . What I am trying say is that API key authentication seems to a generic feature and Okapi/Folio should consider to support that. Also, current edge modules have to store folio password somewhere, it is kind redundant, don't you think Jakub Skoczen? Also if someone changed the password, the client to edge module will get an error even thought API key has not been changed.

Comment by Wayne Schneider [ 30/Jan/19 ]

For the AWS environment builds, should we expose a separate port for edge modules? Another design would be to run a single nginx that serves the stripes bundle, proxies Okapi, and proxies the edge modules, all on a single port.

Comment by Wayne Schneider [ 30/Jan/19 ]

It looks like the modules' endpoints are documented in the raml for the module (which is sensible)...are those endpoints part of the external API definition? That is, for example, is it required that the OAI-PMH service be exposed at the path /oai, or could we use a common prefix for all the paths, like /edge/oai, /edge/rtac, etc.

Comment by Hongwei Ji [ 30/Jan/19 ]

Yes, they are. Wayne Schneider. That's why put everything in one end point is difficult due to potential path conflict.

Comment by Wayne Schneider [ 30/Jan/19 ]

OK, so here is one way to get this done for the environment builds (thanks David Crossley for a great conversation to work this through):

  • Update the security settings for our environment builds to expose edge modules on an external port (any opinions about what port?)
  • Create a role in folio-ansible to set up nginx as a load balancer in front of the edge modules (could be container or host-based...given the configuration flexibility required, I would suggest that host-based would be more convenient)
  • Create a role in folio-ansible to deploy and configure edge modules. This role would need to:
  • Query Okapi to determine the dependencies of the edge module, then deploy and enable those dependencies for the tenant
  • Pull the Docker image for the edge module, configure, and deploy (outside Okapi), assigning a port
  • Configure nginx to proxy for the edge module
  • Enable the edge module for the tenant and assign permissionSets to admin user (assuming there may be permissionSets that need to be loaded)

How do edge modules work in a multi-tenant environment? Does there need to be a different instance of each edge module for each tenant?

Comment by Hongwei Ji [ 30/Jan/19 ]

Wayne Schneider, all tenants will share the same edge(s). Why do we need different instance for each tenant?

Comment by Wayne Schneider [ 30/Jan/19 ]

Hongwei Ji – How does the edge module know which tenant header to pass for any given request it receives?

Comment by Hongwei Ji [ 30/Jan/19 ]

Wayne Schneider, the tenant info is in the API key. When receiving a request with API key, edge modules decrypt the key to know tenant and user, and then do a look up to get password from vault.

Comment by Craig McNally [ 04/Mar/19 ]

See https://github.com/folio-org/edge-common/releases/tag/v2.0.1

$ java -jar target/edge-common-api-key-utils.jar -g -t diku -u diku_admin
eyJzIjoiYlNXMkZhRWpMaSIsInQiOiJkaWt1IiwidSI6ImRpa3VfYWRtaW4ifQ==

$ java -jar target/edge-common-api-key-utils.jar -p eyJzIjoiYlNXMkZhRWpMaSIsInQiOiJkaWt1IiwidSI6ImRpa3VfYWRtaW4ifQ==
Salt: bSW2FaEjLi
Tenant ID: diku
Username: diku_admin

$ echo eyJzIjoiYlNXMkZhRWpMaSIsInQiOiJkaWt1IiwidSI6ImRpa3VfYWRtaW4ifQ== | base64 --decode
{"s":"bSW2FaEjLi","t":"diku","u":"diku_admin"}
Comment by Jakub Skoczen [ 11/Mar/19 ]

Craig McNally this issue remains blocked on EDGEOAI-28 but it looks like the work there is almost complete. Is there an ETA? I would like to understand if we can put it into the sprint.

Comment by Craig McNally [ 11/Mar/19 ]

Should be done now. See https://github.com/folio-org/edge-oai-pmh/releases/tag/v2.0.0

Comment by Ian Hardy [ 27/Mar/19 ]

This is queued up for tonight's builds. edge-oai-pmh will be available at:

http://folio-snapshot.aws.indexdata.com:8000/oai
http://folio-testing-backend01.aws.indexdata.com:8000/oai

example:

http://folio-snapshot.aws.indexdata.com:8000/oai?apikey=eyJzIjoiNXNlNGdnbXk1TiIsInQiOiJkaWt1IiwidSI6ImRpa3UifQ==&verb=Identify

Comment by Craig McNally [ 27/Mar/19 ]

I'm not sure how traffic is routed/load balanced/proxied through these deployments, but I just ran into something yesterday which might also impact this...

mod-oai-pmh's GET /oai/records/{id} endpoint requires that the ID portion be URLencoded, so you'll need to be careful that proxied requests aren't unencoding that before it gets to the module.

In my case I had to make adjustments to my NGINX config to make this work. This should only be a problem if the reverse proxy is between edge-oai-pmh and mod-oai-pmh.

Let's see how it goes once this is up tomorrow and if you need help with this I can share some notes with you.

Comment by Ian Hardy [ 28/Mar/19 ]

I think I see the issue you're describing in the logs for the mod-oai-pmh container

{{28 Mar 2019 13:00:06:939 INFO LogUtil [34173415eqId] 10.36.1.153:40750 GET /oai/records/oai%3AarXiv.org%3Acs%2F0112017 identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc HTTP_1_1 400 734 21 tid=diku Bad Request
}}

Nginx is unencoding the request before it gets to edge-oai-pmh (but isn't between the two modules). I'll look at the config, and I won't say no to seeing your notes if they're handy.

Comment by Craig McNally [ 28/Mar/19 ]

See https://stackoverflow.com/questions/28684300/nginx-pass-proxy-subdirectory-without-url-decoding/37584637#37584637

That's probably more helpful than me regurgitating the same information

Comment by Ian Hardy [ 29/Mar/19 ]

Looking at this a bit more carefully

mod-oai-pmh's GET /oai/records/

Unknown macro: {id}

endpoint requires that the ID portion be URLencoded, so you'll need to be careful that proxied requests aren't unencoding that before it gets to the module.

I think the IDs are making it to mod-oai-pmh intact (see the id in the response here) http://folio-snapshot.aws.indexdata.com:8000/oai/?verb=GetRecord&identifier=oai:folio.org:diku/fb857902-3ab2-4c34-9772-14ad7acdfe76&metadataPrefix=oai_dc&apikey=eyJzIjoiNXNlNGdnbXk1TiIsInQiOiJkaWt1IiwidSI6ImRpa3UifQ==

This is from the mod-oai-pmh log. In the get, the ID is URLencoded

29 Mar 2019 18:34:33:077 INFO  LogUtil [61852548eqId] 10.36.1.135:34994 GET /oai/records/oai%3Afolio.org%3Adiku%2Ffb857902-3ab2-4c34-9772-14ad7acdfe76 identifier=oai:folio.org:diku/fb857902-3ab2-4c34-9772-14ad7acdfe76&metadataPrefix=oai_dc HTTP_1_1 404 772 39 tid=diku Not Found
Generated at Thu Feb 08 23:15:37 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.