SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)
Multitenant configuration (2 tenants, 1 instance for every DI module, 1 partition for every DI topic):
module | configuration |
---|---|
mod-data-import | mod-data-import-2.8.0-SNAPSHOT.260 name: DB_MAXPOOLSIZE value: '5' name: JAVA_OPTIONS -XX:MaxRAMPercentage=66.0 -Djava.util.logging.config.file=vertx-default-jul-logging.properties |
mod-source-record-manager | mod-source-record-manager-3.7.0-SNAPSHOT.762 name: DB_MAXPOOLSIZE value: '15' name: JAVA_OPTIONS -XX:MaxRAMPercentage=66.0 -Djava.util.logging.config.file=vertx-default-jul-logging.properties name: DB_RECONNECTINTERVAL value: '1000' name: DB_RECONNECTATTEMPTS value: '3' |
mod-source-record-storage | mod-source-record-storage-5.6.3-SNAPSHOT.608594f name: DB_MAXPOOLSIZE value: '15' name: JAVA_OPTIONS -XX:MaxRAMPercentage=66.0 -Djava.util.logging.config.file=vertx-default-jul-logging.properties |
mod-inventory | mod-inventory-20.1.0-SNAPSHOT.607 name: DB_MAXPOOLSIZE value: '5' name: JAVA_OPTIONS -XX:MaxRAMPercentage=85.0 -Dorg.folio.metadata.inventory.storage.type=okapi |
mod-inventory-storage | mod-inventory-storage-26.0.0 name: DB_MAXPOOLSIZE value: '5' name: JAVA_OPTIONS -XX:MaxRAMPercentage=66.0 |
mod-di-converter-storage | mod-di-converter-storage-2.1.0-SNAPSHOT.9 name: DB_MAXPOOLSIZE value: '5' name: JAVA_OPTIONS -XX:MaxRAMPercentage=66.0 -Djava.util.logging.config.file=vertx-default-jul-logging.properties |
With the default configuration, when importing 10k records in parallel, I sometimes faced import termination with Timeout exceptions:
> SRM: 2023-04-13 12:00:00.944 [vert.x-worker-thread-11] ERROR PostgresClient Opening SQLConnection failed: Timeout
During the investigation on Folijet-PefrRancher and on other environments like Bugfest and PTF it has been noticed that imports are usually stacked when DI hasn't got enough resources.
In single-tenant mode all imports handle sequentially (OCLC imports have the ability to build into the process of importing large files, but the imports are still sequential).
In multi-user mode, the system runs in parallel and imports from different tenants run in parallel, which increases the need for additional resources.
Imports stop stacked and TimeoutExceptions disappear when the number of connections to the database increases.
One of the major bottlenecks in the parallels import is the database because the number of DB connections has multiplied. At means that multitenant systems need more resources.
I found that RMB and Vertx provide some metrics related to some parameters of modules' works like the number of connections, number of queries, and number of requests.
Its described in - RMB-655Getting issue details... STATUS and - RANCHER-621Getting issue details... STATUS (how we can use it by JMX and Prometheus in Gafana).
I created a task to deeply investigate the work of DI, its performance, and its use of resources: - MODSOURMAN-980Getting issue details... STATUS .