Skip to content

mwaykole/O

Repository files navigation

RHOShift - OpenShift Operator Installation Toolkit

Python Version OpenShift Compatible Stability Level

A comprehensive, enterprise-grade toolkit for managing OpenShift operators with enhanced stability features, automatic dependency resolution, and Red Hat OpenShift AI (RHOAI) integration.

πŸ“‹ Table of Contents

✨ Features

πŸš€ Core Functionality

  • 7 Enterprise Operators: Complete operator stack for modern OpenShift deployments
  • Enhanced Stability System: 3-tier stability levels with comprehensive error handling
  • Automatic Dependency Resolution: Smart installation order with dependency detection
  • Pre-flight Validation: Cluster readiness and permission verification
  • Health Monitoring: Real-time operator status tracking and reporting
  • Auto-recovery: Intelligent error classification and automatic retry logic

πŸ›‘οΈ Enterprise-Grade Reliability

  • Comprehensive Error Handling: 59+ exception handlers throughout codebase
  • Webhook Certificate Resilience: Automatic timing issue resolution for RHOAI
  • Resource Conflict Detection: Prevention of operator namespace conflicts
  • Smart Retry Logic: Exponential backoff with contextual error recovery
  • Parallel Installation: Optimized performance for multiple operators

πŸ”§ Advanced Integration

  • RHOAI DSC/DSCI Management: Complete DataScienceCluster lifecycle control
  • Kueue Management States: Dynamic DSC integration with Managed/Unmanaged modes
  • KedaController Automation: Automatic KEDA controller creation and validation
  • Configurable Timeouts: Flexible timing control for enterprise environments

πŸ›‘οΈ Enhanced Stability Features

RHOShift includes a comprehensive stability system designed for enterprise deployments:

Stability Levels

  • 🟒 Enhanced (Default): Pre-flight checks + health monitoring + auto-recovery
  • πŸ”΅ Comprehensive: Maximum resilience with advanced error classification
  • βšͺ Basic: Standard installation with basic error handling

Pre-flight Validation

  • βœ… Cluster connectivity and authentication
  • βœ… Required permissions verification
  • βœ… Resource quota validation
  • βœ… Operator catalog accessibility
  • βœ… Namespace conflict detection
  • βœ… DSCI compatibility validation for RHOAI installations

Health Monitoring

  • πŸ“Š Real-time operator status tracking
  • πŸ” Multi-resource health validation
  • πŸ“ˆ Installation progress reporting
  • ⚑ Performance metrics and timing

Auto-recovery Features

  • πŸ”„ Intelligent retry mechanisms
  • 🧠 Error classification (transient vs. permanent)
  • ⏰ Exponential backoff strategies
  • πŸ› οΈ Automatic resource cleanup and recreation

πŸ“¦ Supported Operators

Operator Package Namespace Channel Dependencies
OpenShift Serverless serverless-operator openshift-serverless stable None
Service Mesh servicemeshoperator openshift-operators stable None
Authorino authorino-operator openshift-operators stable None
cert-manager openshift-cert-manager-operator cert-manager-operator stable-v1 None
Kueue kueue-operator openshift-kueue-operator stable-v1.0 cert-manager
KEDA openshift-custom-metrics-autoscaler-operator openshift-keda stable None
RHOAI/ODH opendatahub-operator openshift-operators stable None

πŸš€ Installation

Quick Install

git clone https://github.com/mwaykole/O.git
cd O
pip install -e .

Verify Installation

rhoshift --help
rhoshift --summary

πŸ’» Usage

Basic Commands

# Install single operator with enhanced stability
rhoshift --serverless

# Install multiple operators with batch optimization
rhoshift --serverless --servicemesh --authorino

# Install with dependency resolution (Kueue + cert-manager)
rhoshift --kueue

# Install all operators (includes DSCI validation for RHOAI)
rhoshift --all

# Install all with RHOAI channel preference
rhoshift --all --rhoai-channel=odh-nightlies

# Show detailed operator summary
rhoshift --summary

# Clean up all operators
rhoshift --cleanup

RHOAI with DSC/DSCI

# Install RHOAI with complete setup
rhoshift --rhoai \
  --rhoai-channel=odh-nightlies \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources

# Install RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25-nightly \
  --deploy-rhoai-resources

Kueue Management States

# Install Kueue as Managed (RHOAI controls it)
rhoshift --kueue Managed

# Install Kueue as Unmanaged (independent) - Default
rhoshift --kueue Unmanaged
rhoshift --kueue  # Same as above

# Switch management states (updates existing DSC)
rhoshift --kueue Managed    # Switch to Managed
rhoshift --kueue Unmanaged  # Switch to Unmanaged

πŸ”§ Advanced Usage

Enterprise Deployment

# Complete ML/AI stack with queue management
rhoshift --all --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources \
  --timeout=900

# High-availability setup with service mesh
rhoshift --serverless --servicemesh --keda --authorino

# Development environment setup
rhoshift --cert-manager --kueue Unmanaged --keda

Custom Configuration

# Custom timeouts and retries for enterprise clusters
rhoshift --all \
  --timeout=1200 \
  --retries=5 \
  --retry-delay=15

# Custom oc binary path
rhoshift --serverless --oc-binary=/usr/local/bin/oc

# Verbose output for debugging
rhoshift --kueue Managed --verbose

πŸ”— Dependency Management

RHOShift automatically handles operator dependencies:

Automatic Resolution

  • Kueue β†’ cert-manager: Installing Kueue automatically includes cert-manager
  • Installation Order: Dependencies installed first, primary operators second
  • Conflict Detection: Prevents namespace and resource conflicts

Smart Validation

# This command installs BOTH cert-manager AND Kueue in correct order:
rhoshift --kueue
# Output:
# πŸ” Pre-flight checks passed. Cluster is ready for installation.
# ⚠️  Missing dependency: kueue-operator requires openshift-cert-manager-operator
# πŸš€ Installing 2 operators with enhanced stability...
# βœ… cert-manager installed successfully
# βœ… kueue installed successfully

πŸ€– RHOAI Integration

DataScienceCluster Management

RHOShift provides complete DSC/DSCI lifecycle management:

# Create RHOAI with DSC/DSCI
rhoshift --rhoai --deploy-rhoai-resources

# RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed --deploy-rhoai-resources

DSC Behavior

  • Existing DSC: Automatically updates Kueue managementState
  • No DSC: State applied when DSC is created via --deploy-rhoai-resources
  • Webhook Resilience: Automatic handling of certificate timing issues

Output Examples

# When DSC exists and gets updated:
πŸ”„ Updating DSC with Kueue managementState: Managed
βœ… Successfully updated DSC with Kueue managementState: Managed

# When no DSC exists:
ℹ️  No existing DSC found. Kueue managementState will be applied when DSC is created.

βš™οΈ Configuration

CLI Options

Operator Selection:
  --serverless          Install OpenShift Serverless Operator
  --servicemesh         Install Service Mesh Operator
  --authorino           Install Authorino Operator
  --cert-manager        Install cert-manager Operator
  --rhoai               Install RHOAI Operator
  --kueue [{Managed,Unmanaged}]  Install Kueue with DSC integration
  --keda                Install KEDA (Custom Metrics Autoscaler)
  --all                 Install all operators
  --cleanup             Clean up all operators
  --summary             Show operator summary

Configuration:
  --oc-binary OC_BINARY     Path to oc CLI (default: oc)
  --retries RETRIES         Max retry attempts (default: 3)
  --retry-delay RETRY_DELAY Delay between retries (default: 10s)
  --timeout TIMEOUT         Command timeout (default: 300s)

RHOAI Options:
  --rhoai-channel CHANNEL   RHOAI channel (stable/odh-nightlies)
  --rhoai-image IMAGE       RHOAI container image
  --raw RAW                 Enable raw serving (True/False)
  --deploy-rhoai-resources  Create DSC and DSCI

Environment Variables

export LOG_FILE_LEVEL=DEBUG      # File logging level
export LOG_CONSOLE_LEVEL=INFO    # Console logging level

Logging

  • Location: /tmp/rhoshift.log
  • Rotation: 10MB max size, 5 backup files
  • Levels: DEBUG (file) / INFO (console)
  • Colors: Supported in compatible terminals

πŸ” Troubleshooting

Common Issues

Permission Errors

# Verify cluster access
oc whoami
oc auth can-i create subscriptions -n openshift-operators

Installation Failures

# Check logs
tail -f /tmp/rhoshift.log

# Verify operator catalogs
oc get catalogsource -n openshift-marketplace

# Check with enhanced timeouts
rhoshift --kueue --timeout=900 --retries=5

Dependency Issues

# Verify dependencies are resolved
rhoshift --summary

# Manual dependency installation
rhoshift --cert-manager
rhoshift --kueue

RHOAI/DSC Issues

# Check DSC status
oc get dsc,dsci -A

# Verify webhook certificates
oc get pods -n opendatahub-operators

# Manual DSC creation
rhoshift --rhoai --deploy-rhoai-resources --timeout=900

DSCI Immutable Field Conflicts

# Error: MonitoringNamespace is immutable
# This happens when existing DSCI has different monitoring namespace

# Check existing DSCI configuration
oc get dsci default-dsci -o yaml

# Solution 1: Force recreate DSCI (recommended)
rhoshift --rhoai --deploy-rhoai-resources

# Solution 2: Use existing DSCI configuration
# RHOShift will automatically detect and adapt to existing DSCI

Debug Mode

# Enable verbose output
rhoshift --all --verbose

# Check stability report
rhoshift --summary

πŸ› οΈ Development

Prerequisites

  • Python 3.8+
  • OpenShift CLI (oc)
  • OpenShift cluster access
  • cluster-admin privileges

Project Structure

rhoshift/
β”œβ”€β”€ rhoshift/
β”‚   β”œβ”€β”€ cli/              # Command-line interface
β”‚   β”œβ”€β”€ logger/           # Logging system
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ operator/     # Operator management
β”‚   β”‚   β”œβ”€β”€ resilience.py # Error handling & recovery
β”‚   β”‚   β”œβ”€β”€ health_monitor.py # Health monitoring
β”‚   β”‚   β”œβ”€β”€ stability_coordinator.py # Stability management
β”‚   β”‚   └── constants.py  # Operator configurations
β”‚   └── main.py          # Entry point
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ cleanup/         # Cleanup utilities
β”‚   └── run_upgrade_matrix.sh # Upgrade testing
└── tests/               # Test suite

Running Tests

pytest tests/

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Create Pull Request

Development Guidelines

  • Follow Python PEP 8 standards
  • Add tests for new features
  • Update documentation
  • Ensure backward compatibility

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ†˜ Support

  • Issues: GitHub Issues
  • Documentation: This README and --help output
  • Logs: /tmp/rhoshift.log for detailed debugging

RHOShift - Enterprise-grade OpenShift operator management with enhanced stability and reliability features.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published