Skip to content

block: add volume_encryption flag for dm-crypt support #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: v2.6
Choose a base branch
from

Conversation

vermavis
Copy link

@vermavis vermavis commented Apr 16, 2025

This PR introduces host-side encryption support for block volumes using the dm-crypt module. Host encryption is disabled by default and can be enabled by setting the host_encryption parameter in the StorageClass.

When enabled, the NVMe device discovered via NVMe-TCP is encrypted on the client side using dm-crypt.

To use this feature, users must provide a Kubernetes secret referenced in the StorageClass containing a passphrase field.

@vermavis vermavis force-pushed the vermavis/test-luks branch 5 times, most recently from b976167 to cba368d Compare April 17, 2025 04:49
Adds a `volume_encryption` flag to the storage class configuration.
When enabled, this flag ensures that the NVMe device discovered via NVMe-TCP
is encrypted using the `dm-crypt` module on the client side.

Users must also provide a volume-specific Kubernetes Secret containing the
encryption passphrase. This secret must be referenced in the StorageClass
using the following parameters:

csi.storage.k8s.io/volume-secret-name: <secret-name>
csi.storage.k8s.io/volume-secret-namespace: <namespace>

Example:
kubectl create secret generic volume-secret --from-literal=passphrase='your-passphrase'
@vermavis vermavis force-pushed the vermavis/test-luks branch from cba368d to 4b800a3 Compare April 17, 2025 05:09
@vermavis
Copy link
Author

@koreno we will look forward to your feedback on this. Please review 🙏

@koreno koreno requested a review from vlboiko April 17, 2025 10:05
@@ -26,6 +26,7 @@

{{- $tenant_name := pluck "tenantName" $options $.Values.storageClassDefaults | first | quote -}}
{{- $volume_group := pluck "volumeGroup" $options $.Values.storageClassDefaults | first | quote -}}
{{- $volume_encryption := pluck "volumeEncryption" $options $.Values.storageClassDefaults | first | quote -}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to name this host_encryption, to clarify this is host-side encryption and not storage side encryption

vip_pool_name: vip1
transport_type: TCP
volume_encryption: "true"
csi.storage.k8s.io/volume-secret-name: volume-secret

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host-encryption-secret-name

@@ -30,6 +30,43 @@ def is_native_multipath_enabled():
except Exception:
return False

def is_luks_device(device_path: str) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this to luks_utils.py ?

@@ -30,6 +30,43 @@ def is_native_multipath_enabled():
except Exception:
return False

def is_luks_device(device_path: str) -> bool:
"""
Check if the given device is a LUKS-encrypted volume.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove the word volume from here

bool: True if the device is using LUKS encryption (FSTYPE is crypto_LUKS), False otherwise.
"""
try:
result = subprocess.run(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use hostcmd here, there is a hidden assumption on the existence of utils in the image.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it better to use cryptsetup status <device> ?

@@ -68,6 +68,7 @@ def wrapper(self, request, context):
params = {fld.name: value for fld, value in request.ListFields()}
# secrets are not logged and not the part of function signature.
secrets = params.pop("secrets", {})
volume_secret = params.pop("volume_secret", {})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/volume/host/

@@ -313,6 +315,7 @@ def ControllerPublishVolume(
)
vip_pool_name = volume_context.get("vip_pool_name")
vip_pool_fqdn = volume_context.get("vip_pool_fqdn")
volume_encryption = volume_context.get("volume_encryption")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/volume/host - same for the rest

@@ -479,6 +485,37 @@ def NodeStageVolume(
device_path = device.DevicePath
change_io_policy(device_name=device.Name, io_policy="round-robin")

if volume_encryption:
luks_device_name = f"crypt-{volume_id}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets name the volume something more meaningful like vast-csi-crypt-{volume_id}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move it to a helper as other call sites may need to use it

raise Abort(INVALID_ARGUMENT, "Volume encryption detected, but 'volume_secret' not provided. Please provide a passphrase for the encrypted volume.")

if not os.path.exists(luks_device_path):
is_luks = is_luks_device() and is_luks_crypto()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not passing device_path ?

if not is_luks:
logger.info(f"Formatting device {device_path} with LUKS")
try:
hostcmd.run(["cryptsetup", "luksFormat", "--batch-mode", device_path], input=passphrase.encode())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--type=luks2 ?
--hash?
--cipher?
--key-size?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a known issue with luks memory usage:
https://gitlab.com/cryptsetup/cryptsetup/-/issues/372

I'd also limit the memory usage with --pbkdf-memory= as advised in the issue. Maybe something like 32/64 MB


logger.info(f"Opening encrypted device {device_path} as {luks_device_name}")
try:
hostcmd.run(["cryptsetup", "open", device_path, luks_device_name], input=passphrase.encode())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeUnstageVolume will need to close the luks device as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some parameters that may affect performance here.
--perf-same_cpu_crypt
--perf-submit_from_crypt_cpus
--perf-no_read_workqueue
--perf-no_write_workqueue

worth evaluating these

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we could add performance tuning options as a follow-up to this

@@ -479,6 +485,37 @@ def NodeStageVolume(
device_path = device.DevicePath
change_io_policy(device_name=device.Name, io_policy="round-robin")

if volume_encryption:
luks_device_name = f"crypt-{volume_id}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move it to a helper as other call sites may need to use it

bool: True if the device is using LUKS encryption (FSTYPE is crypto_LUKS), False otherwise.
"""
try:
result = subprocess.run(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it better to use cryptsetup status <device> ?

@vermavis vermavis force-pushed the vermavis/test-luks branch 5 times, most recently from c55882f to e5b6e5b Compare April 24, 2025 22:51
subsystem: {{ $subsystem }}
{{- range $key, $value := dict "vip_pool_name" $vip_pool_name "vip_pool_fqdn" $vip_pool_fqdn "volume_group" $volume_group "transport_type" $transport_type "fsType" $fstype "tenant_name" $tenant_name }}
{{- if and $value (ne $value ( quote "" )) }}
{{ $key }}: {{ if (kindIs "int" $value) }}{{ $value | quote }}{{ else }}{{ $value }}{{ end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken this if (kindIs "int" $value)... prevents strings to be double quoted

{{- end }}
{{- $host_encryption := get $options "hostEncryption" }}
{{- if or (not $host_encryption) (eq $host_encryption "false") }}
host_encryption: "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_encryption here is simple boo val represented as string. Although in values.yaml it is object with many keys. If I understand correctly it is temporary solution for testing right?
Later it can be json string via toJson func

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed- a json with encryption related params as keys would be nice. For now, just a way to turn the knob and test out the flow.

@@ -56,6 +57,7 @@ def from_parameters(
vip_pool_fqdn = parameters.get("vip_pool_fqdn")
vip_pool_name = parameters.get("vip_pool_name")
volume_group = parameters.get("volume_group", "")
host_encryption = "true" if parameters.get("host_encryption", "false").lower() == "true" else "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be json right?

@@ -68,6 +68,7 @@ def wrapper(self, request, context):
params = {fld.name: value for fld, value in request.ListFields()}
# secrets are not logged and not the part of function signature.
secrets = params.pop("secrets", {})
host_encryption_secret = params.pop("host_encryption_secret", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in slack not possible to provide multiple secrets.
You might want to add passphrase to "main" secrets and take it from "secrets"

passprase = secrets.pop("passprase") ...

if is_ephemeral:
payload["vms_session"] = vms_session.serialize(salt=volume_id)

if is_host_encryption:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ephemeral or is_host_entryption: ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why "vms_session" should be serialized in is_host_encryption is True?

It is limitation of ephemeral volumes that secrets are not passed via context. So we need to serialize session for deletion of such volumes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, thanks for catching! Will fix.

@@ -321,6 +329,8 @@ def volume_context(self) -> dict:
context["vip_pool_name"] = self.vip_pool_name
elif self.vip_pool_fqdn:
context["vip_pool_fqdn"] = self.vip_pool_fqdn_with_prefix
if self.host_encryption:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, logically if self.host_encryption is always true because
host_encryption can be "true" or "false". Both for python is True strings.

There is helper to convert "true"/"false" to bool
See: https://github.com/vast-data/vast-csi/blob/v2.6/vast_csi/plugins/csi.py#L330

@vermavis vermavis force-pushed the vermavis/test-luks branch from 315d0c6 to f9fe8cf Compare April 28, 2025 21:15
@@ -58,6 +58,13 @@ parameters:
{{ $key }}: {{ if (kindIs "int" $value) }}{{ $value | quote }}{{ else }}{{ $value }}{{ end }}
{{- end }}
{{- end }}
{{- if $host_encryption }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some generic template for parsing nested volumes might be helpful. Eg in
charts/vastblock/templates/shared/_helpers.tpl add:


{{- define "vastcsi.dictToYaml" -}}
{{- $input := index . 0 -}}        {{/* The map to render */}}
{{- $prefix := index . 1 | default "" -}}  {{/* Optional prefix for keys */}}
{{- if not (kindIs "map" $input) }}
  {{- $errorMsg := printf "Invalid format. Expected a dictionary but got:\n%s" (toYaml $input) }}
  {{- fail $errorMsg }}
{{- else }}
  {{- range $k, $v := $input }}
    {{- if and $v (ne $v "") }}
{{ printf "%s%s: %s" $prefix $k ($v | quote) }}
    {{- end }}
  {{- end }}
{{- end }}
{{- end }}

and in storage-class.yaml replace block below with:

{{- include "vastcsi.dictToYaml" (list $host_encryption "host_encryption.") | indent 2 }}

What do you think?

subsystem: myblock
vip_pool_name: vippool-1
transport_type: TCP
hostEncryption:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally it is converted to host_encryption

@@ -479,6 +486,21 @@ def NodeStageVolume(
device_path = device.DevicePath
change_io_policy(device_name=device.Name, io_policy="round-robin")

host_encryption_present = any(key.startswith("host_encryption.") for key in volume_context)
if host_encryption_present:
host_encryption_secret = getattr(exit_stack, "host_encryption_secret", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not create variable passphrase right here and not re-assign it twice?

host_encryption_secret = getattr(exit_stack, "host_encryption_secret", None)

if host_encryption_secret:
passphrase = host_encryption_secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this if/else

passphrase = host_encryption_secret so first "if host_encryption_secret" should do the trick. Why check again "if not passphrase:" <- Seems like this condition never happen

@@ -505,6 +527,7 @@ def NodeUnstageVolume(self, volume_id, staging_target_path):
staging_target_path = local.path(staging_target_path)
device_bind_path = get_device_bind_path(staging_target_path)
staging_mount, target_mounts = MountInfo.get_mounts_by_source(src=device_bind_path)
host_encryption = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any usage of this variable?

@vermavis vermavis force-pushed the vermavis/test-luks branch 6 times, most recently from 4d8869c to aa12fce Compare April 29, 2025 23:49
@@ -494,7 +511,10 @@ def NodeStageVolume(
with temporary_mount(
src=device_path, tgt_dir=staging_target_path, fs_type=fs_type
) as temp_mount:
resize_device(device=device_path, target_mount=temp_mount, fs_type=fs_type)
if host_encryption_present:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vermavis just initialize passphrase to None in the start - and then you don't need any condition here, just call resize_device with passphrase=passphrase that may or may not be None

@vermavis vermavis force-pushed the vermavis/test-luks branch 2 times, most recently from 4987220 to 7142d1f Compare April 30, 2025 15:34
"""Perform resize of the filesystem."""
if need_resize(device, target_mount, fs_type):
if passphrase:
luks_manager = LuksManager(logger, device_path=device)
luks_manager.luks_resize_device(passphrase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@vermavis vermavis force-pushed the vermavis/test-luks branch 2 times, most recently from 757bba1 to d47fb7d Compare May 2, 2025 23:17

# Check if already encrypted
is_luks = self._is_luks_device()
if is_luks and not self._is_luks_active():
Copy link

@sagigrimberg sagigrimberg May 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets arrange this differently:

if not is_luks:
    // format + open
elif not self._is_luks_active():
    // open
else:
    // error

if key.startswith(prefix)
}

def process_host_encryption(self, passphrase: str) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename this init_host_encryption

except ProcessExecutionError:
return False

def luks_remove_device(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename this _luks_close_device

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wrap that with a public function named fini_host_encryption and have it call private _luks_close_device

@vermavis vermavis force-pushed the vermavis/test-luks branch 5 times, most recently from 59e802d to 618e70c Compare May 8, 2025 17:02
@@ -631,6 +649,13 @@ def NodeUnpublishVolume(self, volume_id, target_path, vms_session=None):
volume_id=volume_id,
vms_session=vms_session,
)
# Check for host_encryption in meta and handle accordingly
luks_manager = LuksManager(logger, volume_id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vermavis why is this in NodeUnpublishVolume? We create the Luks device in NodeStageVolume, so we need to close the Luks device in NodeUnstageVolume no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this instead: in NodeUnstageVolume:

diff --git a/vast_csi/plugins/block.py b/vast_csi/plugins/block.py
index c85f4c33a25f..9f28a12f17bf 100644
--- a/vast_csi/plugins/block.py
+++ b/vast_csi/plugins/block.py
@@ -25,7 +25,7 @@ from easypy.caching import cached_property
 from vast_csi.logging import logger
 from vast_csi.proto import csi_pb2_grpc as csi_grpc
 from vast_csi import csi_types as types
-from vast_csi.luks_utils import LuksManager
+from vast_csi.luks_utils import LuksManager, isLuksPath
 from vast_csi.csi_types import (
     INVALID_ARGUMENT,
     ALREADY_EXISTS,
@@ -535,6 +535,10 @@ class BlockNode(NodeBase, Instrumented):
         else:
             logger.info(f"Device not found at {device_bind_path}")
         remove_path_if_not_mounted(device_bind_path)
+        # If device is encrypted teardown accordingly
+        if isHostCryptPath(device_bind_path, volume_id)
+            with LuksManager(logger, volume_id) as lm:
+                lm.fini_host_encryption()
         return types.UnstageResp()
 
     def NodePublishVolume(

Where:

diff --git a/vast_csi/luks_utils.py b/vast_csi/luks_utils.py
index 967f99b99ce5..6e30e5c87b56 100644
--- a/vast_csi/luks_utils.py
+++ b/vast_csi/luks_utils.py
@@ -9,14 +9,21 @@ from vast_csi.logging import logger
 from vast_csi.exceptions import Abort
 from vast_csi.csi_types import NOT_FOUND
 
+def luksDevicePath(volume_name):
+    return f"/dev/mapper/{volume_name}"
+def luksDeviceName(volume_id):
+    return f"vast-csi-crypt-{volume_id}"
+def isHostCryptPath(device, volume_id):
+    return device == luksDevicePath(luksDeviceName(volume_id))
+
 class LuksManager:
     def __init__(self, logger, vol_id=None, device_path=None, vol_context=None):
         self.vol_id = vol_id
         self.encryption_config = self._parse_encryption_config(vol_context or {})
         self.logger = logger
         self.device_path = device_path
-        self.luks_device_name = f"vast-csi-crypt-{vol_id}"
-        self.luks_device_path = f"/dev/mapper/{self.luks_device_name}"
+        self.luks_device_name = luksDeviceName(vol_id)
+        self.luks_device_path = luksDevicePath(luks_device_name)


hostcmd.test("-e", resolved_device)

device_type = hostcmd.lsblk("-no", "TYPE", resolved_device).strip()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just test with _is_luks_device ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not sufficient, I would go with:
blkid <device> -s TYPE -o value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think self._is_luks_device will work just fine @vermavis

@vermavis vermavis force-pushed the vermavis/test-luks branch from 618e70c to 4fb7a27 Compare May 9, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants