A comprehensive, enterprise-grade toolkit for managing OpenShift operators with enhanced stability features, automatic dependency resolution, and Red Hat OpenShift AI (RHOAI) integration.
- Features
- Enhanced Stability Features
- Supported Operators
- Installation
- Usage
- Advanced Usage
- Dependency Management
- RHOAI Integration
- Configuration
- Troubleshooting
- Contributing
- 7 Enterprise Operators: Complete operator stack for modern OpenShift deployments
- Enhanced Stability System: 3-tier stability levels with comprehensive error handling
- Automatic Dependency Resolution: Smart installation order with dependency detection
- Pre-flight Validation: Cluster readiness and permission verification
- Health Monitoring: Real-time operator status tracking and reporting
- Auto-recovery: Intelligent error classification and automatic retry logic
- Comprehensive Error Handling: 59+ exception handlers throughout codebase
- Webhook Certificate Resilience: Automatic timing issue resolution for RHOAI
- Resource Conflict Detection: Prevention of operator namespace conflicts
- Smart Retry Logic: Exponential backoff with contextual error recovery
- Parallel Installation: Optimized performance for multiple operators
- RHOAI DSC/DSCI Management: Complete DataScienceCluster lifecycle control
- Kueue Management States: Dynamic DSC integration with Managed/Unmanaged modes
- KedaController Automation: Automatic KEDA controller creation and validation
- Configurable Timeouts: Flexible timing control for enterprise environments
RHOShift includes a comprehensive stability system designed for enterprise deployments:
- π’ Enhanced (Default): Pre-flight checks + health monitoring + auto-recovery
- π΅ Comprehensive: Maximum resilience with advanced error classification
- βͺ Basic: Standard installation with basic error handling
- β Cluster connectivity and authentication
- β Required permissions verification
- β Resource quota validation
- β Operator catalog accessibility
- β Namespace conflict detection
- β DSCI compatibility validation for RHOAI installations
- π Real-time operator status tracking
- π Multi-resource health validation
- π Installation progress reporting
- β‘ Performance metrics and timing
- π Intelligent retry mechanisms
- π§ Error classification (transient vs. permanent)
- β° Exponential backoff strategies
- π οΈ Automatic resource cleanup and recreation
| Operator | Package | Namespace | Channel | Dependencies |
|---|---|---|---|---|
| OpenShift Serverless | serverless-operator |
openshift-serverless |
stable |
None |
| Service Mesh | servicemeshoperator |
openshift-operators |
stable |
None |
| Authorino | authorino-operator |
openshift-operators |
stable |
None |
| cert-manager | openshift-cert-manager-operator |
cert-manager-operator |
stable-v1 |
None |
| Kueue | kueue-operator |
openshift-kueue-operator |
stable-v1.0 |
cert-manager |
| KEDA | openshift-custom-metrics-autoscaler-operator |
openshift-keda |
stable |
None |
| RHOAI/ODH | opendatahub-operator |
openshift-operators |
stable |
None |
git clone https://github.com/mwaykole/O.git
cd O
pip install -e .rhoshift --help
rhoshift --summary# Install single operator with enhanced stability
rhoshift --serverless
# Install multiple operators with batch optimization
rhoshift --serverless --servicemesh --authorino
# Install with dependency resolution (Kueue + cert-manager)
rhoshift --kueue
# Install all operators (includes DSCI validation for RHOAI)
rhoshift --all
# Install all with RHOAI channel preference
rhoshift --all --rhoai-channel=odh-nightlies
# Show detailed operator summary
rhoshift --summary
# Clean up all operators
rhoshift --cleanup# Install RHOAI with complete setup
rhoshift --rhoai \
--rhoai-channel=odh-nightlies \
--rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
--deploy-rhoai-resources
# Install RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed \
--rhoai-channel=stable \
--rhoai-image=quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25-nightly \
--deploy-rhoai-resources# Install Kueue as Managed (RHOAI controls it)
rhoshift --kueue Managed
# Install Kueue as Unmanaged (independent) - Default
rhoshift --kueue Unmanaged
rhoshift --kueue # Same as above
# Switch management states (updates existing DSC)
rhoshift --kueue Managed # Switch to Managed
rhoshift --kueue Unmanaged # Switch to Unmanaged# Complete ML/AI stack with queue management
rhoshift --all --kueue Managed \
--rhoai-channel=stable \
--rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
--deploy-rhoai-resources \
--timeout=900
# High-availability setup with service mesh
rhoshift --serverless --servicemesh --keda --authorino
# Development environment setup
rhoshift --cert-manager --kueue Unmanaged --keda# Custom timeouts and retries for enterprise clusters
rhoshift --all \
--timeout=1200 \
--retries=5 \
--retry-delay=15
# Custom oc binary path
rhoshift --serverless --oc-binary=/usr/local/bin/oc
# Verbose output for debugging
rhoshift --kueue Managed --verboseRHOShift automatically handles operator dependencies:
- Kueue β cert-manager: Installing Kueue automatically includes cert-manager
- Installation Order: Dependencies installed first, primary operators second
- Conflict Detection: Prevents namespace and resource conflicts
# This command installs BOTH cert-manager AND Kueue in correct order:
rhoshift --kueue
# Output:
# π Pre-flight checks passed. Cluster is ready for installation.
# β οΈ Missing dependency: kueue-operator requires openshift-cert-manager-operator
# π Installing 2 operators with enhanced stability...
# β
cert-manager installed successfully
# β
kueue installed successfullyRHOShift provides complete DSC/DSCI lifecycle management:
# Create RHOAI with DSC/DSCI
rhoshift --rhoai --deploy-rhoai-resources
# RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed --deploy-rhoai-resources- Existing DSC: Automatically updates Kueue managementState
- No DSC: State applied when DSC is created via
--deploy-rhoai-resources - Webhook Resilience: Automatic handling of certificate timing issues
# When DSC exists and gets updated:
π Updating DSC with Kueue managementState: Managed
β
Successfully updated DSC with Kueue managementState: Managed
# When no DSC exists:
βΉοΈ No existing DSC found. Kueue managementState will be applied when DSC is created.Operator Selection:
--serverless Install OpenShift Serverless Operator
--servicemesh Install Service Mesh Operator
--authorino Install Authorino Operator
--cert-manager Install cert-manager Operator
--rhoai Install RHOAI Operator
--kueue [{Managed,Unmanaged}] Install Kueue with DSC integration
--keda Install KEDA (Custom Metrics Autoscaler)
--all Install all operators
--cleanup Clean up all operators
--summary Show operator summary
Configuration:
--oc-binary OC_BINARY Path to oc CLI (default: oc)
--retries RETRIES Max retry attempts (default: 3)
--retry-delay RETRY_DELAY Delay between retries (default: 10s)
--timeout TIMEOUT Command timeout (default: 300s)
RHOAI Options:
--rhoai-channel CHANNEL RHOAI channel (stable/odh-nightlies)
--rhoai-image IMAGE RHOAI container image
--raw RAW Enable raw serving (True/False)
--deploy-rhoai-resources Create DSC and DSCIexport LOG_FILE_LEVEL=DEBUG # File logging level
export LOG_CONSOLE_LEVEL=INFO # Console logging level- Location:
/tmp/rhoshift.log - Rotation: 10MB max size, 5 backup files
- Levels: DEBUG (file) / INFO (console)
- Colors: Supported in compatible terminals
# Verify cluster access
oc whoami
oc auth can-i create subscriptions -n openshift-operators# Check logs
tail -f /tmp/rhoshift.log
# Verify operator catalogs
oc get catalogsource -n openshift-marketplace
# Check with enhanced timeouts
rhoshift --kueue --timeout=900 --retries=5# Verify dependencies are resolved
rhoshift --summary
# Manual dependency installation
rhoshift --cert-manager
rhoshift --kueue# Check DSC status
oc get dsc,dsci -A
# Verify webhook certificates
oc get pods -n opendatahub-operators
# Manual DSC creation
rhoshift --rhoai --deploy-rhoai-resources --timeout=900# Error: MonitoringNamespace is immutable
# This happens when existing DSCI has different monitoring namespace
# Check existing DSCI configuration
oc get dsci default-dsci -o yaml
# Solution 1: Force recreate DSCI (recommended)
rhoshift --rhoai --deploy-rhoai-resources
# Solution 2: Use existing DSCI configuration
# RHOShift will automatically detect and adapt to existing DSCI# Enable verbose output
rhoshift --all --verbose
# Check stability report
rhoshift --summary- Python 3.8+
- OpenShift CLI (oc)
- OpenShift cluster access
- cluster-admin privileges
rhoshift/
βββ rhoshift/
β βββ cli/ # Command-line interface
β βββ logger/ # Logging system
β βββ utils/
β β βββ operator/ # Operator management
β β βββ resilience.py # Error handling & recovery
β β βββ health_monitor.py # Health monitoring
β β βββ stability_coordinator.py # Stability management
β β βββ constants.py # Operator configurations
β βββ main.py # Entry point
βββ scripts/
β βββ cleanup/ # Cleanup utilities
β βββ run_upgrade_matrix.sh # Upgrade testing
βββ tests/ # Test suite
pytest tests/- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Create Pull Request
- Follow Python PEP 8 standards
- Add tests for new features
- Update documentation
- Ensure backward compatibility
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Issues: GitHub Issues
- Documentation: This README and
--helpoutput - Logs:
/tmp/rhoshift.logfor detailed debugging
RHOShift - Enterprise-grade OpenShift operator management with enhanced stability and reliability features.