...
Kafka Reconnection Problem (Resolved)
Before applying configuration changes, Kafka reconnected approximately every 30 seconds. After implementing the optimizations, reconnections now occur every 5 minutes.
...
Increase the health check interval to reduce TLS handshake overhead.
Exclude the health check page from SSL, allowing the load balancer to use a non-TLS endpoint.
...
Module-to-Module Communication Issue (Resolved)
Analysis of logs revealed that during some workflows, module-to-module calls repeatedly establish new TLS connections instead of reusing existing sessions. This issue, known as TLS Session Resumption, can significantly impact performance due to redundant TLS handshakes.
...
Session IDs (Stateful Resumption):
Server stores session state (e.g., encryption keys) and assigns a unique Session ID to the client.
Client sends the Session ID in subsequent requests to resume the session.
Stateful: The server must maintain a cache of session IDs and their associated keys.
Deprecated in TLS 1.3 in favor of session tickets.
Session Tickets (Stateless Resumption):
Server encrypts session state into a Session Ticket and sends it to the client.
Client stores the ticket and includes it in subsequent requests to resume the session.
Stateless: The server does not need to store session state (ideal for scalability)
...
Drawio | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
. Use FIPS-Compliant Cipher Suites
Only cipher suites approved for FIPS are allowed. For example:
...
Investigate how to enable TLS session resumption within Bouncy Castle.
Delivery
After investigating Bouncy Castle and reviewing its documentation, it was identified that TLS 1.3 session resumption requires support for a specific Pre-Shared Key (PSK) standard.
What is PSK?
PSK (Pre-Shared Key) is a shared secret used in cryptographic systems, particularly in symmetric key algorithms, where both parties have exchanged the secret through a secure channel beforehand.
Key Aspects of PSK:
Usage: PSKs are used in various security protocols, including Wi-Fi encryption (WPA-PSK), Extensible Authentication Protocol (EAP-PSK), and TLS 1.3 session resumption.
Security: The security of PSKs depends on their secrecy and randomness. If compromised, all communications using the key could be exposed.
Key Derivation: PSKs are often used with key derivation functions to generate session keys for encrypting data.
Bouncy Castle & PSK Support
Bouncy Castle supports the PSK standard, but it is not included in the BouncyCastleJsseProvider
—which is the provider we use as a standard Java JSSE Provider to override the default one and maintain a unified SSL context across all connections.
Issue Reference:
Details about this limitation can be found in the following issue:
🔗 GitHub Issue #1604
Example Code from the Library:
Code Block |
---|
java |
CopyEdit
JsseSessionParameters jsseSessionParameters = new JsseSessionParameters( sslParameters.getEndpointIdentificationAlgorithm(), matchedSNIServerName); // TODO[tls13] Resumption/PSK boolean addToCache = provServerEnableSessionResumption && !TlsUtils.isTLSv13(context); this.sslSession = sslSessionContext.reportSession(peerHost, peerPort, connectionTlsSession, jsseSessionParameters, addToCache);
Conclusion
Currently, in Bouncy Castle, we can only use TLS 1.2. In all other cases, session resumption will not work.
Supported TLS Ciphers in Bouncy Castle FIPS Mode:
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
What Was Changed?
Added an environment variable:
Code Block yaml
CopyEdit
QUARKUS_HTTP_SSL_PROTOCOLS: "TLSv1.2"
Applied changes from the relevant branch.
Communication with the Keycloak Issue (Resolved)
Reused the same approach from the previous topic and applied it to all web and HTTP clients across sidecars.
Communication with Secret storage (In progress)
Currently, the implementation for retrieving secret information, such as system user passwords, is handled by the AWS SSM service and is encapsulated in a separate library outside the sidecar implementation. Based on the logs, it is still using TLSv1.3 with the Bouncy Castle not supported for session resumption. To resolve this issue, changes need to be applied to the library as well.
Code Block |
---|
2025-03-24T13:12:18.427Z 2025-03-24 13:12:18,427 INFO [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] opening connection to ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.428Z 2025-03-24 13:12:18,427 INFO [org.bou.jss.pro.ProvTlsClient] (executor-thread-2) [client #13 @4359ff27] established connection with ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.431Z 2025-03-24 13:12:18,431 INFO [org.bou.jss.pro.ProvTlsClient] (executor-thread-3) [client #14 @4f5a5ce6] established connection with ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] notified of selected protocol version: TLSv1.3
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5]: Server did not specify a session ID
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] notified of selected cipher suite: TLS_AES_128_GCM_SHA256 |
Additionally, I noticed that we are not using the correct FIPS endpoints for communication with the service. According to the documentation, the endpoint should be updated from ssm.us-east-1.amazonaws.com
to ssm-fips.us-east-1.amazonaws.com
.
Summary
The performance impact of TLS, frequent reconnections, and inefficient session management contribute to unnecessary system overhead. By implementing session reuse, optimizing connection settings, and refining module interactions, we aim to improve system stability and efficiency
...