[FOLIO-1794] SPIKE: db connection provision and pooling Created: 08/Feb/19  Updated: 15/Nov/22

Status: Open
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Jakub Skoczen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to RMB-664 PoC: implement one connection pool fo... Closed
relates to FOLIO-1935 Service creating ROLE and SCHEMA on t... Draft
relates to RMB-680 DB connection pool utilization monito... Open
relates to FOLIO-2031 Remove tenant name and mod_ prefix fr... Open
relates to UXPROD-3209 Implement one connection pool per RMB... Open
relates to RMB-663 patch idle connection release in the ... Closed
relates to RMB-739 patch idle connection release in the ... Closed
relates to UXPROD-1819 FOLIO database architecture improvements Draft
Sprint: CP: ready for planning, CP: Roadmap backlog
Story Points: 5
Development Team: Core: Platform

 Description   

When a module is deployed it gets DB connection params (user/pass/db) in environment variables. In practice, this means a given module instance is able to connect to a single DB instance only. RMB based modules use this database user/role to create a separate schema for each tenant and a dedicated user/role to access this schema..

The main benefit of this approach is that it allows for a single connection pool (Postgres connections cost ~ 10mb) to be maintained for the module instance, across all tenants.

The disadvantage, is that it does not provide access isolation between tenants – e.g a connection retrieved from the pool by tenant A has rights to access schema owned by tenant B. It's not currently possible, for a selected tenant to use different connection params and, in effect, connect to a physically separate database instance or constrain access to a particular schema only.

We need to consider how to improve this to allow for better access isolation.

Factors to account for:

  • scalability (e.g ability to share connections, ability to shard, ability to load balance with tools like PGBouncer)
  • security (tenant isolation)
  • deployment flexibility (e.g due to data privacy requirements)

Notes: FOLIO uses schema-level tenant isolation. Postgres allows for multiple schemas per database.

For each combination of tenant and module RMB creates a separate schema and a dedicated user/role for that schema. Currently RMB has a pool for each combination of tenant and module, and a database connection is NOT reused for a different tenant. Changing the user requires re-establishing a new connection.

However, RMB might implement reusing connections across tenants. PostgreSQL schemas can be authorized to specific roles (users) but to avoid resetting connections, the connecting user needs all roles to be granted and needs to escalate/de-escalte its role when executing statements

or

(similar to sudo). This obviously does not protect against malicious modules wanting to perform cross-tenant modifications but is a good protection from unintended DB modifications.

Managing connections in Postgres: https://brandur.org/postgres-connections
Switching roles after the connection has been established: https://stackoverflow.com/questions/2998597/switch-role-after-connecting-to-database
Connection pooling in Postgres: https://www.citusdata.com/blog/2017/05/10/scaling-connections-in-postgres/
Tuning max_connections in PostgreSQL: https://www.cybertec-postgresql.com/en/tuning-max_connections-in-postgresql/



 Comments   
Comment by Julian Ladisch [ 11/Feb/19 ]

If an attacker can switch the role that attack continues to be successful when we reuse connections only for the same tenant and not for other tenants. We don't get more security but require more connections.

Comment by Julian Ladisch [ 15/Feb/19 ]

This is the unit test that checks that the schema-level tenant isolation works: https://github.com/folio-org/raml-module-builder/blob/v23.5.0/domain-models-runtime/src/test/java/org/folio/rest/persist/PostgresClientIT.java#L373-L387

Comment by Marc Johnson [ 14/Dec/20 ]

The standard way DB connections are set up in FOLIO at this point (when using RMB) is that a single module instance shares the DB connection params (user/pass/db) across all tenants.

I thought that RAML Module Builder generated a user for each tenant and used that to connect to the database based upon the tenant provided in the API request.

Have I misunderstood that or has that changed?

Comment by Julian Ladisch [ 11/Aug/21 ]

No, RMB creates a user for each tenant and uses that to connect to the database.

Currently the database connection is used by a single database user only.

This Jira suggest to allow a single database connection to be re-used by different database users.

Comment by Marc Johnson [ 11/Aug/21 ]

Julian Ladisch Thank you for responding to my question.

Let me try to check my understanding of the current behaviour:

  • RMB creates a new database user for each tenant (using the provided admin database user)
  • Those tenant specific database users are used when making database requests for a specific tenant
  • Each database connection only uses one of these tenant specific users, meaning that connections are not shared between tenants (and connection pools are per tenant)

What parts of that have I misunderstood?

Comment by Julian Ladisch [ 11/Aug/21 ]

None.

You've asked:

Have I misunderstood that or has that changed?

I've answered "No" to both parts of your questions.

 

Comment by Marc Johnson [ 11/Aug/21 ]

Julian Ladisch

None.

Ok. Thank you for clarifying my understanding. Apologies for the inconvenience and the confusion caused by the original comment.

Comment by Julian Ladisch [ 11/Aug/21 ]

I've reworded the first paragraph of this issue's description to avoid some misunderstanding it may have caused.

Comment by Marc Johnson [ 11/Aug/21 ]

Julian Ladisch

I've reworded the first paragraph of this issue's description to avoid some misunderstanding it may have caused.

Thank you, I appreciate that, it reads to me much better and more specifically now.

Comment by Julian Ladisch [ 15/Nov/22 ]

The "SET ROLE" solution has been implemented by RMB-664 Closed = UXPROD-3209 Open and can be enabled by setting DB_MAXSHAREDSPOOLSIZE env variable.

Generated at Thu Feb 08 23:15:57 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.