This project is a proof of concept based on the sigstore/model-transperency-cli. It offers a Kubernetes/OpenShift operator designed to validate AI models before they are picked up by actual workload. This project provides a webhook that adds an initcontainer to perform model validation. The operator uses a custom resource to define how the models should be validated, such as utilizing Sigstore or public keys.
- Model Validation: Ensures AI models are validated before they are used by workloads.
- Webhook Integration: A webhook automatically injects an initcontainer into pods to perform the validation step.
- Custom Resource: Configurable
ModelValidation
custom resource to specify how models should be validated.- Supports methods like Sigstore, pki or public key validation.
- Kubernetes 1.29+ or OpenShift 4.16+
- Proper configuration for model validation (e.g., Sigstore, public keys)
- A signed model (e.g. check the
testdata
orexamples
folder)
The operator can be installed via kustomize using different deployment overlays.
For production environments with cert-manager integration:
kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/production
# or local
kubectl apply -k config/overlays/production
For testing environments with manual certificate management:
kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/testing
# or local
kubectl apply -k config/overlays/testing
For development environments, deploying the operator without the webhook integration:
kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/development
# or local
kubectl apply -k config/overlays/development
For OpenShift/OLM environments:
kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/olm
# or local
kubectl apply -k config/overlays/olm
To uninstall the operator, use the same overlay you used for installation:
kubectl delete -k config/overlays/production
The operator uses a kustomize based, overlay configuration structure, aiming to separate generated content from environment specific content:
config/
├── crd/ # Custom Resource Definitions
├── rbac/ # RBAC permissions
├── webhook/ # Webhook configuration
├── manager/ # Controller manager deployment
├── manifests/ # OLM manifests
├── components/ # Reusable components
│ ├── webhook/ # Webhook service component
│ ├── certmanager/ # Certificate manager component
│ ├── manual-tls/ # Manual TLS configuration
│ ├── metrics-port/ # Metrics configuration
│ └── webhook-replacements/ # Webhook configuration replacements
└── overlays/ # Environment-specific overlays
├── production/ # Production (cert-manager)
├── development/ # Development (operator only, no webhooks)
├── testing/ # Testing (manual, self-signed certs)
└── olm/ # OpenShift/OLM
The operator supports different certificate management approaches:
- Production: Uses cert-manager for automatic certificate management
⚠️ Important: The default cert-manager configuration uses self-signed certificates- For production environments, you should configure cert-manager with a proper CA issuer
- Development: Does not use certificates, there are no webhook configurations in this overlay
- Testing: Uses manual, self-signed certificate management for testing scenarios
- OLM: Uses OLM's built-in certificate management for OpenShift deployments
The webhook server requires TLS certificates. When you run the operator locally, certificates will be generated automatically:
make run
This command will start the webhook server on https://localhost:9443, using the generated certs.
The project is at an early stage and therefore has some limitations.
- There is no validation or defaulting for the custom resource.
- The validation is namespace scoped and cannot be used across multiple namespaces.
- No more than one validation resource can be used per namespace.
- There are no status fields for the custom resource.
- The model and signature path must be specified, there is no auto discovery.
- TLS certificates used by the webhook are self generated.
First, a ModelValidation CR must be created as follows:
apiVersion: ml.sigstore.dev/v1alpha1
kind: ModelValidation
metadata:
name: demo
spec:
config:
sigstoreConfig:
certificateIdentity: "https://github.com/sigstore/model-validation-operator/.github/workflows/sign-model.yaml@refs/tags/v0.0.2"
certificateOidcIssuer: "https://token.actions.githubusercontent.com"
model:
path: /data/tensorflow_saved_model
signaturePath: /data/tensorflow_saved_model/model.sig
All pods in the namespace where the custom resource exists that have this label validation.ml.sigstore.dev/ml: "true"
will be validated.
It should be noted that this does not apply to subsequently labeled pods.
apiVersion: v1
kind: Pod
metadata:
name: whatever-workload
+ labels:
+ validation.ml.sigstore.dev/ml: "true"
spec:
restartPolicy: Never
containers:
- name: whatever-workload
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: model-storage
mountPath: /data
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: models
The example folder contains example files for testing the operator.
Before running the examples, create a namespace for testing (separate from the operator namespace):
kubectl create namespace testing
Important: Do not deploy examples in the operator namespace (e.g., model-validation-operator-system
). The operator namespace has the label validation.ml.sigstore.dev/ignore: "true"
which prevents the webhook from processing pods in that namespace.
- prepare.yaml: Contains a persistent volume claim and a job that downloads a signed test model.
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/prepare.yaml -n testing
# or local
kubectl apply -f examples/prepare.yaml -n testing
- verify.yaml: Contains a model validation manifest for the validation of this model and a demo pod, which is provided with the appropriate label for validation.
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/verify.yaml -n testing
# or local
kubectl apply -f examples/verify.yaml -n testing
- unsigned.yaml: Contains an example of a pod that would fail validation (for testing purposes).
kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/unsigned.yaml -n testing
# or local
kubectl apply -f examples/unsigned.yaml -n testing
After the example installation, the logs of the generated job should show a successful download:
$ kubectl logs -n testing job/download-extract-model
Connecting to github.com (140.82.121.3:443)
Connecting to objects.githubusercontent.com (185.199.108.133:443)
saving to '/data/tensorflow_saved_model.tar.gz'
tensorflow_saved_mod 44% |************** | 3983k 0:00:01 ETA
tensorflow_saved_mod 100% |********************************| 8952k 0:00:00 ETA
'/data/tensorflow_saved_model.tar.gz' saved
./
./model.sig
./variables/
./variables/variables.data-00000-of-00001
./variables/variables.index
./saved_model.pb
./fingerprint.pb
The operator logs should show that a pod has been modified:
$ kubectl logs -n model-validation-operator-system deploy/model-validation-controller-manager
time=2025-01-20T22:13:05.051Z level=INFO msg="Starting webhook server on :9443"
time=2025-01-20T22:13:47.556Z level=INFO msg="new request, path: /mutate-v1-pod"
time=2025-01-20T22:13:47.557Z level=INFO msg="Execute webhook"
time=2025-01-20T22:13:47.560Z level=INFO msg="Search associated Model Validation CR" pod=whatever-workload namespace=testing
time=2025-01-20T22:13:47.591Z level=INFO msg="construct args"
time=2025-01-20T22:13:47.591Z level=INFO msg="found sigstore config"
Finally, the test pod should be running and the injected initcontainer should have been successfully validated.
$ kubectl logs -n testing whatever-workload model-validation
INFO:__main__:Creating verifier for sigstore
INFO:tuf.api._payload:No signature for keyid f5312f542c21273d9485a49394386c4575804770667f2ddb59b3bf0669fddd2f
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c
INFO:__main__:Verifying model signature from /data/model.sig
INFO:__main__:all checks passed
In case the workload is modified, is not executed:
ERROR:__main__:verification failed: the manifests do not match