Skip to content

Conversation

@synhershko
Copy link
Collaborator

@synhershko synhershko commented Sep 6, 2025

Fix hot reloading TLS certificates feature that doesn't work due to certificates being mounted using subPath function.

Fixes #1021

  • OpenSearch operator uses Kubernetes subPath volume mounts for TLS certificates
  • subPath prevents Kubernetes from updating files when underlying Secret is updated
  • Certificate renewals from cert-manager require manual pod restarts
  • Affects users with externally managed certificates (cert-manager, etc.)

Solution:

  • Add new enableHotReload configuration option to TlsCertificateConfig
  • When enabled, mount certificates as directories instead of using subPath
  • Maintains full backward compatibility with existing configurations
  • Allows selective enabling per TLS interface (transport/HTTP)

Changes:

  • api/v1/opensearch_types.go: Add EnableHotReload field to TlsCertificateConfig
  • pkg/reconcilers/tls.go: Add mountWithHotReload() function with conditional subPath usage
  • pkg/reconcilers/tls.go: Update transport & HTTP certificate mounting logic
  • pkg/reconcilers/tls.go: Update OpenSearch config paths for hot reload mode
  • config/crd/bases/: Regenerate CRD with new enableHotReload field
  • pkg/reconcilers/tls_hotreload_test.go: Add comprehensive unit tests

Benefits:

  • Zero downtime certificate renewals
  • Automatic cert-manager integration
  • Backward compatible (default: enableHotReload=false)
  • Per-interface configuration granularity

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@prudhvigodithi
Copy link
Member

@synhershko can you please check the failing CI's?
adding @rootxrishabh to please take a look.

@synhershko synhershko force-pushed the feat/tls-hot-reload-certificates branch from 92e1a30 to 8123c9d Compare September 9, 2025 18:45
@synhershko
Copy link
Collaborator Author

@prudhvigodithi done

@gysel
Copy link
Contributor

gysel commented Sep 18, 2025

Would it make sense to automatically add plugins.security.ssl.certificates_hot_reload.enabled: true when enableHotReload=true is used?

https://docs.opensearch.org/latest/security/configuration/tls/#hot-reloading-tls-certificates

@josedev-union
Copy link
Contributor

Would it make sense to automatically add plugins.security.ssl.certificates_hot_reload.enabled: true when enableHotReload=true is used?

https://docs.opensearch.org/latest/security/configuration/tls/#hot-reloading-tls-certificates

I agree to set the config. Btw, there are a few things to care.
Hot reload works since 2.19.1.
opensearch-project/security@d3d7f74
We need to check cr.Spec.General.Version and set the config conditionally

Also I'd prefer using the directory mount instead of subpath for both cases and we can switch hot reloading on/off using that config param plugins.security.ssl.certificates_hot_reload.enabled

@synhershko
Copy link
Collaborator Author

From Slack (https://opensearch.slack.com/archives/C06QRV1RLD7/p1760456223930269?thread_ts=1760443641.817159&cid=C06QRV1RLD7):

For internal communication between nodes, we use self-signed certificates generated by cert-manager. We have configured a root Certificate Authority (CA) with a 10-year validity period and have disabled private key rotation for this CA. This prevention is crucial because if the CA were to be renewed, new nodes with certificates issued by the rotated CA would be unable to join the existing cluster. The leaf certificates issued for each node have a validity of one year, and their private keys can be rotated.
For external access, we use Let's Encrypt certificates for our Ingress resources. This is achieved by adding the cert-manager.io/cluster-issuer annotation to the Ingress definition:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  # ...
  annotations:
    # ...
    cert-manager.io/cluster-issuer: letsencrypt

Forcing a rolling restart
As you said, the OpenSearch Kubernetes operator does not currently support hot reloading of certificates. Therefore, to apply a new certificate, a rolling restart of the cluster nodes is required.
We do it by changing a dummy annotation in the nodePools

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  ...
spec:
  nodePools:
    - component: nodepool1
       annotations:
         opensearch.last-restart: "<epoch>"
       ...
    - component: nodepool2
       annotations:
         opensearch.last-restart: "<epoch>"
       ...

We will be happy to hear if there is a better way :)

@synhershko
Copy link
Collaborator Author

Just thought it'd make sense to note the above, esp with regards to rolling restarts and self-signed certs

@prudhvigodithi prudhvigodithi self-assigned this Oct 17, 2025
synhershko and others added 2 commits October 21, 2025 16:59
@josedev-union josedev-union force-pushed the feat/tls-hot-reload-certificates branch from 4a0e868 to 0cfd3bf Compare October 21, 2025 14:59
Signed-off-by: josedev-union <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] Hot reloading TLS certificates feature doesn't work

4 participants