Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Kafka Reconnection Problem (Resolved)

Before applying configuration changes, Kafka reconnected approximately every 30 seconds. After implementing the optimizations, reconnections now occur every 5 minutes.

...

  • Increase the health check interval to reduce TLS handshake overhead.

  • Exclude the health check page from SSL, allowing the load balancer to use a non-TLS endpoint.

...

Module-to-Module Communication Issue (Resolved)

Analysis of logs revealed that during some workflows, module-to-module calls repeatedly establish new TLS connections instead of reusing existing sessions. This issue, known as TLS Session Resumption, can significantly impact performance due to redundant TLS handshakes.

...

  1. Session IDs (Stateful Resumption):

    • Server stores session state (e.g., encryption keys) and assigns a unique Session ID to the client.

    • Client sends the Session ID in subsequent requests to resume the session.

    • Stateful: The server must maintain a cache of session IDs and their associated keys.

    • Deprecated in TLS 1.3 in favor of session tickets.

  2. Session Tickets (Stateless Resumption):

    • Server encrypts session state into a Session Ticket and sends it to the client.

    • Client stores the ticket and includes it in subsequent requests to resume the session.

    • Stateless: The server does not need to store session state (ideal for scalability)

...

Drawio
mVer2
zoom1
simple0
zoominComment10
inCommentcustContentId0896106598
pageId896958465
custContentIdlbox8961065981
diagramDisplayNameUntitled Diagram-1742473072871.drawio
lbox1
contentVer3
revision3
baseUrlhttps://folio-org.atlassian.net/wiki
diagramNameUntitled Diagram-1742473072871.drawio
pCenter0
width821
links
tbstyle
height491

...

  • TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

  • TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

. Use FIPS-Compliant Cipher Suites

Only cipher suites approved for FIPS are allowed. For example:

...

  • Investigate how to enable TLS session resumption within Bouncy Castle.

Delivery

After investigating Bouncy Castle and reviewing its documentation, it was identified that TLS 1.3 session resumption requires support for a specific Pre-Shared Key (PSK) standard.

What is PSK?

PSK (Pre-Shared Key) is a shared secret used in cryptographic systems, particularly in symmetric key algorithms, where both parties have exchanged the secret through a secure channel beforehand.

Key Aspects of PSK:

  • Usage: PSKs are used in various security protocols, including Wi-Fi encryption (WPA-PSK), Extensible Authentication Protocol (EAP-PSK), and TLS 1.3 session resumption.

  • Security: The security of PSKs depends on their secrecy and randomness. If compromised, all communications using the key could be exposed.

  • Key Derivation: PSKs are often used with key derivation functions to generate session keys for encrypting data.

Bouncy Castle & PSK Support

Bouncy Castle supports the PSK standard, but it is not included in the BouncyCastleJsseProvider—which is the provider we use as a standard Java JSSE Provider to override the default one and maintain a unified SSL context across all connections.

Issue Reference:
Details about this limitation can be found in the following issue:
🔗 GitHub Issue #1604

Example Code from the Library:

Code Block
java

CopyEdit

JsseSessionParameters jsseSessionParameters = new JsseSessionParameters( sslParameters.getEndpointIdentificationAlgorithm(), matchedSNIServerName); // TODO[tls13] Resumption/PSK boolean addToCache = provServerEnableSessionResumption && !TlsUtils.isTLSv13(context); this.sslSession = sslSessionContext.reportSession(peerHost, peerPort, connectionTlsSession, jsseSessionParameters, addToCache);

Conclusion

Currently, in Bouncy Castle, we can only use TLS 1.2. In all other cases, session resumption will not work.

Supported TLS Ciphers in Bouncy Castle FIPS Mode:

  • TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

  • TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

What Was Changed?

  • Added an environment variable:

    Code Block
    yaml

    CopyEdit

    QUARKUS_HTTP_SSL_PROTOCOLS: "TLSv1.2"

  • Applied changes from the relevant branch.

Communication with the Keycloak Issue (Resolved)

Reused the same approach from the previous topic and applied it to all web and HTTP clients across sidecars.

Communication with Secret storage (In progress)

Currently, the implementation for retrieving secret information, such as system user passwords, is handled by the AWS SSM service and is encapsulated in a separate library outside the sidecar implementation. Based on the logs, it is still using TLSv1.3 with the Bouncy Castle not supported for session resumption. To resolve this issue, changes need to be applied to the library as well.

Code Block
2025-03-24T13:12:18.427Z 2025-03-24 13:12:18,427 INFO  [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] opening connection to ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.428Z 2025-03-24 13:12:18,427 INFO  [org.bou.jss.pro.ProvTlsClient] (executor-thread-2) [client #13 @4359ff27] established connection with ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.431Z 2025-03-24 13:12:18,431 INFO  [org.bou.jss.pro.ProvTlsClient] (executor-thread-3) [client #14 @4f5a5ce6] established connection with ssm.us-east-1.amazonaws.com:443
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE  [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] notified of selected protocol version: TLSv1.3
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE  [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5]: Server did not specify a session ID
2025-03-24T13:12:18.444Z 2025-03-24 13:12:18,440 FINE  [org.bou.jss.pro.ProvTlsClient] (executor-thread-4) [client #16 @4d2c76f5] notified of selected cipher suite: TLS_AES_128_GCM_SHA256

Additionally, I noticed that we are not using the correct FIPS endpoints for communication with the service. According to the documentation, the endpoint should be updated from ssm.us-east-1.amazonaws.com to ssm-fips.us-east-1.amazonaws.com.

Summary

The performance impact of TLS, frequent reconnections, and inefficient session management contribute to unnecessary system overhead. By implementing session reuse, optimizing connection settings, and refining module interactions, we aim to improve system stability and efficiency

...