This demonstration showcases the complete machine learning workflow in Red Hat OpenShift AI, taking you from initial experimentation to production deployment. Using Stable Diffusion for text-to-image generation, you'll learn how to experiment with models, fine-tune them with custom data, create automated pipelines, and deploy models as scalable services.
- Data Science Projects: Creating and managing ML workspaces in OpenShift AI
 - GPU-Accelerated Workbenches: Leveraging NVIDIA GPUs for model training and inference
 - Model Experimentation: Working with pre-trained models from Hugging Face
 - Fine-Tuning: Customizing models with your own data using Dreambooth
 - Pipeline Automation: Building repeatable ML workflows with Data Science Pipelines
 - Custom Runtime Development: Building KServe runtimes
 - Model Serving: Deploying models as REST APIs using KServe with multiple deployment options
 - Production Integration: Connecting served models to applications and MCP servers
 - Multi-Modal AI: Combining text and image generation in unified applications
 
- Red Hat OpenShift cluster (4.12+)
 - Red Hat OpenShift AI installed (2.9+)
- For managed service: Available as add-on for OpenShift Dedicated or ROSA
 - For self-managed: Install from OperatorHub
 
 - GPU node with at least 45GB memory (NVIDIA L40S recommended, A10G minimum for smaller models)
 
- S3-compatible object storage (MinIO, AWS S3, or Ceph)
 - Two buckets configured:
pipeline-artifacts: For pipeline execution artifactsmodels: For storing trained models
 
- OpenShift AI Dashboard access
 - Ability to create Data Science Projects
 - (Optional) Hugging Face account with API token for model downloads
 
- 
Access OpenShift AI Dashboard
- Navigate to your OpenShift console
 - Click the application launcher (9-dot grid)
 - Select "Red Hat OpenShift AI"
 
 - 
Create a Data Science Project
- Click "Data Science Projects"
 - Create a new project named 
image-generation 
 - 
Set Up Storage
- Import 
setup/setup-s3.yamlto create local S3 storage (for demos) - Or configure your own S3-compatible storage connections
 
 - Import 
 - 
Create a Workbench
- Select PyTorch notebook image
 - Allocate GPU resources
 - Add environment variables (including 
HF_TOKENif available) - Attach data connections
 
 - 
Clone This Repository
git clone https://github.com/cfchase/text-to-image-demo.git cd text-to-image-demo - 
Follow the Notebooks
1_experimentation.ipynb: Initial model testing2_fine_tuning.ipynb: Training with custom data3_remote_inference.ipynb: Testing deployed models
 
- Workbenches: Jupyter notebook environments for development
 - Pipelines: Automated ML workflows using Kubeflow
 - Custom Runtime: Diffusers runtime for image generation
 - Model Serving: Deploy models as REST APIs with multiple storage options
 - Storage: S3-compatible object storage, PVC, or HuggingFace Hub integration
 - External Integration: MCP server support for modern AI application development
 
oc apply -f setup/setup-s3.yamlThis creates:
- MinIO deployment for S3-compatible storage
 - Two PVCs for buckets
 - Data connections for workbench and pipeline access
 
Create data connections with your S3 credentials:
- Connection 1: "My Storage" - for workbench access
 - Connection 2: "Pipeline Artifacts" - for pipeline server
 
When creating your workbench:
Notebook Image: Choose based on your needs
- Standard Data Science: Basic Python environment
 - PyTorch: Includes PyTorch, CUDA support (recommended for this demo)
 - TensorFlow: For TensorFlow-based workflows
 - Custom: Use your own image with specific dependencies
 
Resources:
- Small: 2 CPUs, 8Gi memory
 - Medium: 7 CPUs, 24Gi memory
 - Large: 14 CPUs, 56Gi memory
 - GPU: Add 1-2 NVIDIA GPUs (required for this demo)
 
Environment Variables:
HF_TOKEN=<your-huggingface-token>  # For model downloads
AWS_S3_ENDPOINT=<s3-endpoint-url>   # Auto-configured if using data connections
AWS_ACCESS_KEY_ID=<access-key>      # Auto-configured if using data connections
AWS_SECRET_ACCESS_KEY=<secret-key>  # Auto-configured if using data connections
AWS_S3_BUCKET=<bucket-name>         # Auto-configured if using data connections
- In your Data Science Project, go to "Pipelines" → "Create pipeline server"
 - Select the "Pipeline Artifacts" data connection
 - Wait for the server to be ready (2-3 minutes)
 
After training your model:
- 
Deploy the custom Diffusers runtime:
cd diffusers-runtime make build make push - 
Choose your deployment template based on model storage:
# For S3 storage-based models oc apply -f templates/redhat-dog.yaml # For HuggingFace Hub models (recommended) oc apply -f templates/redhat-dog-hf.yaml # For PVC-based storage oc apply -f templates/redhat-dog-pvc.yaml # For testing with lightweight models oc apply -f templates/tiny-sd-gpu.yaml
 - 
The runtime includes advanced optimizations:
- Automatic hardware detection (CUDA/MPS/CPU)
 - Intelligent dtype selection with fallback chains
 - Configurable memory optimizations
 - Universal model loading support
 
 
text-to-image-demo/
├── README.md                    # This file
├── ARCHITECTURE.md              # Technical architecture details
├── PIPELINES.md                 # Pipeline automation guide
├── SERVING.md                   # Model serving guide
├── DEMO_SCRIPT.md              # Step-by-step demo script
│
├── 1_experimentation.ipynb      # Initial model testing
├── 2_fine_tuning.ipynb         # Custom training workflow
├── 3_remote_inference.ipynb    # Testing served models
│
├── requirements-base.txt        # Base Python dependencies
├── requirements-gpu.txt         # GPU-specific packages
│
├── finetuning_pipeline/        # Kubeflow pipeline components
│   ├── Dreambooth.pipeline     # Pipeline definition
│   ├── get_data.ipynb         # Data preparation step
│   ├── train.ipynb            # Training execution step
│   └── upload.ipynb           # Model upload step
│
├── diffusers-runtime/          # Custom KServe runtime
│   ├── Dockerfile             # Runtime container definition
│   ├── model.py              # Main KServe predictor (refactored)
│   ├── device_manager.py      # Hardware detection and management
│   ├── dtype_selector.py      # Intelligent dtype selection
│   ├── optimization_manager.py # Memory optimization controls
│   ├── pipeline_loader.py     # Universal model loading
│   ├── Makefile              # Build and deployment automation
│   └── templates/            # Kubernetes deployment manifests
│       ├── redhat-dog.yaml        # S3 storage deployment
│       ├── redhat-dog-hf.yaml     # HuggingFace Hub deployment
│       ├── redhat-dog-pvc.yaml    # PVC storage deployment
│       └── tiny-sd-gpu.yaml       # Lightweight test deployment
│
└── setup/                     # Deployment configurations
    └── setup-s3.yaml         # Demo S3 storage setup
- Load pre-trained Stable Diffusion model
 - Test basic text-to-image generation
 - Identify limitations with generic models
 
- Prepare custom training data (images of "Teddy")
 - Fine-tune model using Dreambooth technique
 - Save trained weights to S3 storage
 
- Convert notebooks to pipeline steps
 - Create repeatable training workflow
 - Enable parameter tuning and experimentation
 
- Deploy custom KServe runtime
 - Create inference service
 - Expose REST API endpoint
 
- Test model via REST API
 - Integrate with applications
 - Monitor performance
 
- No GPU detected: Ensure your node has GPU support and correct drivers
 - Out of memory: Reduce batch size or use gradient checkpointing
 - CUDA errors: Verify PyTorch and CUDA versions match
 
- S3 connection failed: Check credentials and endpoint URL
 - Permission denied: Verify bucket policies and access keys
 - Upload timeouts: Check network connectivity and proxy settings
 
- Pipeline server not starting: Check data connection configuration
 - Pipeline runs failing: Review logs in pipeline run details
 - Missing artifacts: Verify S3 bucket permissions
 
- Model not loading: Check model path (S3/PVC/HuggingFace) and format
 - Inference errors: Review KServe pod logs, check dtype compatibility
 - Timeout errors: Increase resource limits or timeout values
 - Memory issues: Enable optimizations via environment variables:
env: - name: DTYPE value: "auto" # or bfloat16, float16, float32 - name: ENABLE_ATTENTION_SLICING value: "true" - name: ENABLE_VAE_SLICING value: "true" - name: ENABLE_CPU_OFFLOAD value: "true"
 
- Red Hat OpenShift AI Documentation
 - OpenShift AI Learning Resources
 - KServe Documentation
 - Hugging Face Diffusers
 
Contributions are welcome! Please feel free to submit issues or pull requests to improve this demo.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.