[FOLIO-2563] SPIKE: propose prevention of DoS via CQL query Created: 14/Apr/20 Updated: 13/Jul/21 Resolved: 04/Jun/20 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | Jakub Skoczen | Assignee: | Mikhail Fokanov |
| Resolution: | Done | Votes: | 0 |
| Labels: | platform-backlog, security | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | CP: sprint 87, CP: sprint 90, CP: sprint 88, CP: sprint 89 | ||||||||||||||||||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Description Several application endpoints accessible by normal users perform CQL (Common Query Language) queries leveraging CQL-Java. CQL is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information.1 The application can be forced to perform an OR expression over objects that match a certain patterns. As a result, the CQL engine is forced to do very expensive queries to search and parse the result of the operations. Although it appears that the full result of the operation is filtered downstream when returned to the user, It was noticed that each request to the target endpoint would take around 29 seconds to respond back. An attacker can exploit this by performing a denial of service attack by requesting several GET requests simultaneously or in short intervals, resulting in the exhaustion of the server’s memory resources. Steps to reproduce Perform the following request with the thread count set to 20 and throttle time set to 1 milliseconds. Substitute the value of X-Okapi-Token header with a token belonging to a user that has inventory.instances.collection.get permission. GET /inventory/instances?limit=1000&query=%28keyword%20all%20%22ncc%22%29%20or%201=1 Observe that all inventory related endpoints will become unresponsive Acceptance criteria Propose an approach that will prevent heavy queries to cause a module to become unresponsive. Provide a PoC implementation for review. Potential solutions: Query validation based:
Query execution based:
|
| Comments |
| Comment by Craig McNally [ 16/Apr/20 ] |
|
Which ever approach we take, we need to be very careful not to break existing queries throughout the system. For instance, if you start limiting query length or the number of OR'd terms, queries that ask for a bunch of specific records via UUID might break |
| Comment by Julian Ladisch [ 16/Apr/20 ] |
|
We may use pg_cancel_backend to cancel a long-running query. (Don't use pg_terminate_backend as it may result in an inconsistent state.) |
| Comment by Mikhail Fokanov [ 07/May/20 ] |
|
queryTimeout parameter can be specified (https://vertx.io/docs/vertx-mysql-postgresql-client/java/) for postgres connection in getPostgreSQLClientConfig. In such case we can have 2 pools of connections: without execution time restriction and without it. |
| Comment by Marc Johnson [ 07/May/20 ] |
What would the two different pools be used for? I'm guessing the time-limited pool would be used for queries. What would the non-time-limited pool be used for? |
| Comment by Mikhail Fokanov [ 07/May/20 ] |
|
There could be still some requests from another modules, that should take more than 5 sec by design. For them it could be used. I'm not aware is it the case for now. |
| Comment by Julian Ladisch [ 07/May/20 ] |
|
The vertx-mysql-postgresql-client has the queryTimeout connection parameter: https://vertx.io/docs/vertx-mysql-postgresql-client/java/#_configuration A vertx-pg-client connection can cancel the query using the cancelRequest method ( https://vertx.io/docs/vertx-pg-client/java/#_cancelling_request ):
connection
.query("SELECT pg_sleep(20)")
.execute(ar -> { ... });
connection
.cancelRequest(ar -> { ... });
|
| Comment by Mikhail Fokanov [ 11/May/20 ] |
|
Julian Ladisch, I see. In such case, would it be efficient to implement thread in PostgresClient, which will once per 5 seconds verify whether there are requests that live for more than 5 seconds and kill them? In such case any request will be terminated within range of [5,10] seconds. Long running requests (by design) would be added to that list. There also could be separate url prefix for internal REST calls, which is not visible outside of okapi. Requests for such urls can have long-running queries. Also commands, that use not standard RMB CQL query processing would be not added to the list. |
| Comment by Julian Ladisch [ 11/May/20 ] |
|
https://github.com/vert-x3/vertx-mysql-postgresql-client says: "This client is deprecated - use instead https://github.com/eclipse-vertx/vertx-sql-client", vertx-sql-client contains https://github.com/eclipse-vertx/vertx-sql-client/tree/master/vertx-pg-client |
| Comment by Marc Johnson [ 03/Jul/20 ] |
|
This issue is closed, could someone please summarise what decision was made. |
| Comment by Julian Ladisch [ 11/Aug/20 ] |
|
Solution:
This CQL based denial of service attack has low severity: It requires a valid staff login. |