Splitting Database Read/Write Traffics

Overview

Typical FOLIO environments are deployed with one database node that hosts all the schemas of the storage modules and is responsible for both reading and writing to and from the database. While the modules' services can be scaled out horizontally, the database cannot (or at least cannot be done easily).  Even if the modules can be scaled out, all the traffics eventually will go to the database thus creating a bottleneck.  When this happens, usually a workflow like Data Import that uses nearly all of the database's CPU resources overwhelms other operations, like circulation, leaving other operations' performance severely affected. Circulation's check out or in response times can more than double under these circumstances.

The AWS-hosted FOLIO environments that PTF uses has a read node attached to the write database node for failover purposes. For 99% of the time, the read node sits idly doing nothing.  To use the DB nodes more efficiently and more importantly to address the competing resources problem, PTF with FSE explored various proxy options such as PgBouncer or Pgpool to automatically split the read and write traffics but none has worked out. In 2002 PTF took a stab at implementing Read/Write traffic splitting in RMB, per PERF-348. 

Design

By default the current RMB implementation maintains a pool of connection string for each storage module. These connections can read and write to the default (or the write) node. The R/W split implementation in RMB merely adds another connection pool that points to the read node.  When there is a read operation, a connection is obtained from the read pool, and when there is a write operation or an operation that requires both reading and writing, a connection is obtained from the write pool.  To enable R/W split, the following environment variables must be set: 

  • DB_HOST_READER
  • DB_PORT_READER

If either of these variables is not set, only the write connection pool is created.

NB 1:  The implementation's scope is only about retrieving the appropriate database connections from the desired pool.  The implementation leaves the physical wiring of the read node to the write node and of syncing data between the nodes to the database technology.

NB 2: The connections and the pools inside the modules (maintained by RMB) are in-memory objects that create a channel to communicate with a database node. The actual connections are held on the database nodes.  The read node would maintain its readonly DB connections (to the modules), and the write node would maintain its write DB connections (to the modules). There is no significant memory footprint added to the existing nodes due to doubling the number of connections. The modules themselves need to have two times the number of in-memory representations of the DB connections, though.

Implications

Broadly speaking, going from one logical server (read/write) to two logical servers (read only and read / write) introduces more moving parts into the system, which is naturally more complicated to understand.

Development Complexity

Assuming we want to check that the module works with both configurations, we need to be able to test this capability in our automated tests within the module. That means we need to be able to run two Postgres instances in the tests and have a way to synchronise them (see https://folio-org.atlassian.net/browse/RMB-938) . Developers need to be aware that requests could be going to different servers. This adds cognitive load and makes figuring out why something has gone wrong harder.

There is also the possibility for stale data. This possibility is something that developers (and product folks) need to consider when implementing features and how the system could behave under these conditions.

The configuration possibilities of the system have doubled, this may raise the amount of manual testing needed. There are also implementation specific factors that could affect this, e.g. needing to know how to access/manage multiple connection pools, needing to know whether features of the tooling uses read or write servers (in RMB an example would be the methods that are ambiguous and what to do about them).

Operational Complexity

Operators need to be aware of many of the development aspects too, e.g. knowing that there will be more module -> database server connections overall and that might impact system resources like ports, firewall rules etc.  Operators need to be aware of the trade offs / impacts of these two approaches in order to make a choice for their environment. If they choose to have separate read and write servers, then the operator needs to provision multiple servers and choose and run a mechanism for synchronising the data. Which mechanism they choose might have impacts on resource usage, potential for staleness etc and so they need to be aware of that when choosing. They might choose an option that the system isn’t routinely tested with.

Operators need to have the ability to monitor and react to when the synchronisation process fails (or possible more insidiously gets slower). Operators might need to be able to explain some of this to users of the system e.g. when odd things happen due to the possibility of stale data.

Performance

Performance tests on the following workflows were carried out and the results show that overwhelmingly the R/W split solution improves performance significantly when the database is under high stress.

Links