Skip to content

sigstore/model-validation-operator

Model Validation Controller

This project is a proof of concept based on the sigstore/model-transperency-cli. It offers a Kubernetes/OpenShift operator designed to validate AI models before they are picked up by actual workload. This project provides a webhook that adds an initcontainer to perform model validation. The operator uses a custom resource to define how the models should be validated, such as utilizing Sigstore or public keys.

Features

  • Model Validation: Ensures AI models are validated before they are used by workloads.
  • Webhook Integration: A webhook automatically injects an initcontainer into pods to perform the validation step.
  • Custom Resource: Configurable ModelValidation custom resource to specify how models should be validated.
    • Supports methods like Sigstore, pki or public key validation.

Prerequisites

  • Kubernetes 1.29+ or OpenShift 4.16+
  • Proper configuration for model validation (e.g., Sigstore, public keys)
  • A signed model (e.g. check the testdata or examples folder)

Installation

The operator can be installed via kustomize using different deployment overlays.

Production Deployment

For production environments with cert-manager integration:

kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/production
# or local
kubectl apply -k config/overlays/production

Testing Deployment

For testing environments with manual certificate management:

kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/testing
# or local
kubectl apply -k config/overlays/testing

Development Deployment

For development environments, deploying the operator without the webhook integration:

kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/development
# or local
kubectl apply -k config/overlays/development

OLM Deployment

For OpenShift/OLM environments:

kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/olm
# or local
kubectl apply -k config/overlays/olm

Uninstall

To uninstall the operator, use the same overlay you used for installation:

kubectl delete -k config/overlays/production

Configuration Structure

The operator uses a kustomize based, overlay configuration structure, aiming to separate generated content from environment specific content:

config/
├── crd/                      # Custom Resource Definitions
├── rbac/                     # RBAC permissions
├── webhook/                  # Webhook configuration
├── manager/                  # Controller manager deployment
├── manifests/                # OLM manifests
├── components/               # Reusable components
│   ├── webhook/              # Webhook service component
│   ├── certmanager/          # Certificate manager component
│   ├── manual-tls/           # Manual TLS configuration
│   ├── metrics-port/         # Metrics configuration
│   └── webhook-replacements/ # Webhook configuration replacements
└── overlays/                 # Environment-specific overlays
    ├── production/           # Production (cert-manager)
    ├── development/          # Development (operator only, no webhooks)
    ├── testing/              # Testing (manual, self-signed certs)
    └── olm/                  # OpenShift/OLM

Certificate Management

The operator supports different certificate management approaches:

  1. Production: Uses cert-manager for automatic certificate management
    • ⚠️ Important: The default cert-manager configuration uses self-signed certificates
    • For production environments, you should configure cert-manager with a proper CA issuer
  2. Development: Does not use certificates, there are no webhook configurations in this overlay
  3. Testing: Uses manual, self-signed certificate management for testing scenarios
  4. OLM: Uses OLM's built-in certificate management for OpenShift deployments

Running the Webhook Server Locally

The webhook server requires TLS certificates. When you run the operator locally, certificates will be generated automatically:

make run

This command will start the webhook server on https://localhost:9443, using the generated certs.

Known limitations

The project is at an early stage and therefore has some limitations.

  • There is no validation or defaulting for the custom resource.
  • The validation is namespace scoped and cannot be used across multiple namespaces.
  • No more than one validation resource can be used per namespace.
  • There are no status fields for the custom resource.
  • The model and signature path must be specified, there is no auto discovery.
  • TLS certificates used by the webhook are self generated.

Usage

First, a ModelValidation CR must be created as follows:

apiVersion: ml.sigstore.dev/v1alpha1
kind: ModelValidation
metadata:
  name: demo
spec:
  config:
    sigstoreConfig:
      certificateIdentity: "https://github.com/sigstore/model-validation-operator/.github/workflows/sign-model.yaml@refs/tags/v0.0.2"
      certificateOidcIssuer: "https://token.actions.githubusercontent.com"
  model:
    path: /data/tensorflow_saved_model
    signaturePath: /data/tensorflow_saved_model/model.sig

All pods in the namespace where the custom resource exists that have this label validation.ml.sigstore.dev/ml: "true" will be validated. It should be noted that this does not apply to subsequently labeled pods.

apiVersion: v1
kind: Pod
metadata:
  name: whatever-workload
+  labels:
+    validation.ml.sigstore.dev/ml: "true"
spec:
  restartPolicy: Never
  containers:
  - name: whatever-workload
    image: nginx
    ports:
    - containerPort: 80
    volumeMounts:
    - name: model-storage
      mountPath: /data
  volumes:
  - name: model-storage
    persistentVolumeClaim:
      claimName: models

Examples

The example folder contains example files for testing the operator.

Prerequisites for Examples

Before running the examples, create a namespace for testing (separate from the operator namespace):

kubectl create namespace testing

Important: Do not deploy examples in the operator namespace (e.g., model-validation-operator-system). The operator namespace has the label validation.ml.sigstore.dev/ignore: "true" which prevents the webhook from processing pods in that namespace.

Example Files

  • prepare.yaml: Contains a persistent volume claim and a job that downloads a signed test model.
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/prepare.yaml -n testing
# or local
kubectl apply -f examples/prepare.yaml -n testing
  • verify.yaml: Contains a model validation manifest for the validation of this model and a demo pod, which is provided with the appropriate label for validation.
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/verify.yaml -n testing
# or local
kubectl apply -f examples/verify.yaml -n testing
  • unsigned.yaml: Contains an example of a pod that would fail validation (for testing purposes).
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/unsigned.yaml -n testing
# or local
kubectl apply -f examples/unsigned.yaml -n testing

After the example installation, the logs of the generated job should show a successful download:

$ kubectl logs -n testing job/download-extract-model 
Connecting to github.com (140.82.121.3:443)
Connecting to objects.githubusercontent.com (185.199.108.133:443)
saving to '/data/tensorflow_saved_model.tar.gz'
tensorflow_saved_mod  44% |**************                  | 3983k  0:00:01 ETA
tensorflow_saved_mod 100% |********************************| 8952k  0:00:00 ETA
'/data/tensorflow_saved_model.tar.gz' saved
./
./model.sig
./variables/
./variables/variables.data-00000-of-00001
./variables/variables.index
./saved_model.pb
./fingerprint.pb

The operator logs should show that a pod has been modified:

$ kubectl logs -n model-validation-operator-system deploy/model-validation-controller-manager
time=2025-01-20T22:13:05.051Z level=INFO msg="Starting webhook server on :9443"
time=2025-01-20T22:13:47.556Z level=INFO msg="new request, path: /mutate-v1-pod"
time=2025-01-20T22:13:47.557Z level=INFO msg="Execute webhook"
time=2025-01-20T22:13:47.560Z level=INFO msg="Search associated Model Validation CR" pod=whatever-workload namespace=testing
time=2025-01-20T22:13:47.591Z level=INFO msg="construct args"
time=2025-01-20T22:13:47.591Z level=INFO msg="found sigstore config"

Finally, the test pod should be running and the injected initcontainer should have been successfully validated.

$ kubectl logs -n testing whatever-workload model-validation
INFO:__main__:Creating verifier for sigstore
INFO:tuf.api._payload:No signature for keyid f5312f542c21273d9485a49394386c4575804770667f2ddb59b3bf0669fddd2f
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:__main__:Verifying model signature from /data/model.sig
INFO:__main__:all checks passed

In case the workload is modified, is not executed:

ERROR:__main__:verification failed: the manifests do not match

About

Kubernetes controller to validate AI models

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 5