"Where complex systems find their orbit"
Orbital Dynamics is a mid-sized space technology company that processes astronomical data and manages satellite constellation networks for research institutions, government agencies, and private space companies. They've recently landed a major NASA contract for real-time processing of deep space telescope data, which is pushing their infrastructure to new limits.
This series of hands-on labs follows the journey of the Orbital Dynamics team as they tackle real-world challenges using the VAST Management System and VAST Database. Through these labs, you'll learn how to use vastpy and vastdb to solve complex data management and processing problems in a space technology environment.
This repository contains educational code and examples only. All code, scripts, and configurations are provided for learning and demonstration purposes. They are NOT intended for production use and should NOT be deployed in production environments without proper review, testing, and modification.
- Educational Purpose Only - These labs are designed to teach VAST SDK concepts
- Review Before Use - Always review and test any code before using it in your environment
Meet the characters who will guide you through these challenges:
- Dr. Alex Sterling (CTO) - Former JPL engineer who understands the domain and drives the company's technical vision
- Maya Chen (Lead SysAdmin) - Gaming industry veteran adapting to space-scale challenges
- Jordan Blake (Senior Developer) - Brilliant but sometimes gets lost in technical details
- Sam Rodriguez (DevOps Engineer) - The alignment specialist who bridges dev and ops
- Mac Thompson (Junior Admin) - Eager learner prone to educational mistakes
Challenge: Monitor existing storage infrastructure and automatically expand quotas when needed
- Use
vastpyto monitor storage utilization across multiple views - Build automated quota expansion with comprehensive safety checks
- Create basic predictive scaling systems that prevent storage crises
- Implement real-time monitoring and alerting for storage health
- Focus: Monitoring existing infrastructure, not creating new views
Challenge: Build a comprehensive metadata database system for efficient data discovery and management
- Use
vastdbto create and manage VAST databases - Build automated metadata extraction workflows for various file formats (FITS, JSON, etc.)
- Create powerful search interfaces with wildcard and date range support
- Focus: Metadata management and search capabilities
Challenge: Build a complete weather data pipeline with advanced analytics and health impact assessment
- Download weather and air quality data from Open-Meteo API for global cities
- Store and manage large-scale weather datasets in VAST Database using
vastdb - Perform advanced correlation analysis between weather patterns and air quality metrics
- Detect dangerous pollution episodes and health risk situations using WHO guidelines
- Analyze long-term trends and seasonal patterns across multiple cities
- Focus: Real-time data ingestion, scalable storage, and advanced analytics
Challenge: Implement systematic version control for research datasets using VAST protection policies
- Use
vastpyto create VAST protection policies with automated schedules and retention - Build named snapshot workflows for key research milestones and calibration events
- Create user-friendly tools for browsing, searching, and restoring from snapshots
- Establish systematic version tracking and change management for research reproducibility
Challenge: Detect astronomical events in real-time and alert appropriate teams
- Use
vastdbfor real-time data ingestion and analysis - Build automated detection for specific astronomical events
- Create multi-level alerting based on event type and urgency
- Develop APIs for external system integration
Challenge: Integrate storage validation and management with data processing pipelines
- Build pre-processing storage availability checks
- Create interactive storage expansion for processing workflows
- Integrate storage validation with processing scripts
- Provide seamless storage management for data processing jobs
-
Clone this repository
git clone https://github.com/vast-data/cosmos-labs cd cosmos-labs -
Create and activate a virtual environment (Recommended)
# Create virtual environment inside project directory python3 -m venv venv # Activate the virtual environment # On Linux/macOS: source venv/bin/activate # On Windows: # venv\Scripts\activate # Verify you're in the virtual environment which python # Should show path to your virtual environment
Note: The
venv/directory is automatically ignored by git (see.gitignore), so it won't be committed to version control. -
Install Python dependencies
# Install all dependencies (recommended) pip install -r requirements.txt # Or use the helper script for guided installation python3 install_dependencies.py
-
Verify installation
# Test individual package imports python -c "import yaml; print('pyyaml installed successfully')" python -c "import vastpy; print('vastpy installed successfully')" python -c "import vastdb; print('vastdb installed successfully')"
- Python 3.7+ with basic programming experience
- VAST Management System cluster access
- Git for version control
- vastpy and vastdb SDKs (installed via requirements.txt)
This repository uses a centralized configuration approach with strict validation:
config.yaml.example- Example configuration template for all labs (non-sensitive settings)secrets.yaml.example- Example secrets template for all labs (sensitive information)config_loader.py- Centralized configuration loader with lab-specific extensionsconfig_validator.py- Strict validation system that prevents dangerous default values
Each lab has its own config_loader.py that inherits from the centralized loader and provides lab-specific configuration methods.
After installing the dependencies (see Installation section above), you need to create your configuration files:
# Copy the example configuration files
cp config.yaml.example config.yaml
cp secrets.yaml.example secrets.yaml
# Edit the files with your specific settings
# config.yaml - Update with your environment settings
# secrets.yaml - Update with your actual credentialsNote: Never commit your actual config.yaml or secrets.yaml files to version control. Only the .example files are tracked.
No Default Values Allowed - This system prevents accidental use of potentially dangerous default values that could overwrite production data. All configuration values must be explicitly defined in the YAML files.
Fail-Fast Approach - If any required configuration is missing, the application will fail to start with clear error messages, preventing silent failures that could lead to data corruption.
All labs include comprehensive dual-mode safety systems to prevent accidental changes to production VAST systems:
- 🛡️ Dry Run Mode (Default) - Preview operations without making changes
- 🚀 Production Mode - Requires explicit confirmation (
--pushtoprodflag) - 🔍 Comprehensive Safety Checks - Multiple validation layers before any changes
See individual lab READMEs for detailed safety information and command-line usage.
test_config_validation.py- Test script to verify configuration validation system
Each lab follows a consistent structure:
- Problem Statement - The business challenge and strategic planning needs
- Technical Challenge Overview - Detailed objectives and requirements
- Implementation Guide - Step-by-step technical instructions
- Success Criteria - Measurable outcomes for each lab
- Business Impact - Real-world benefits and outcomes
By completing these labs, you will learn to:
- Monitor Infrastructure Proactively - Use
vastpyto monitor storage utilization and prevent crises - Automate Quota Management - Build intelligent quota expansion with comprehensive safety checks
- Build Metadata Catalogs - Use
vastdbto create searchable metadata systems for data discovery - Implement Advanced Search - Create powerful search interfaces with wildcards and date ranges
- Build Data Ingestion Pipelines - Create robust systems for downloading and storing large-scale datasets
- Perform Advanced Analytics - Use
vastdbfor correlation analysis and pattern detection in time-series data - Analyze Long-Term Trends - Process and analyze multi-year datasets for historical pattern recognition
- Orchestrate Data Pipelines - Combine both SDKs for unified data processing workflows
- Implement Version Control - Use snapshots for systematic data versioning and recovery
- Create Real-Time Systems - Build event detection and alerting systems for time-critical operations
- Integrate Pipeline Storage - Build storage validation and management for data processing workflows
These labs are designed to mirror real-world challenges faced by organizations dealing with:
- Proactive Infrastructure Monitoring - Monitoring existing systems to prevent crises rather than reacting to them
- Intelligent Quota Management - Automatically expanding storage before it becomes critical
- Metadata-Driven Data Discovery - Building searchable catalogs for efficient data management
- High-Volume Data Processing - Managing petabytes of incoming data
- Multi-Source Data Integration - Combining data from different sources and formats
- Real-Time Requirements - Processing and alerting on time-sensitive events
- Compliance and Auditing - Meeting regulatory and contractual requirements
- Operational Efficiency - Automating manual processes to scale operations
- Choose Your Lab - Start with Lab 1 if you're new to VAST SDKs, or jump to any lab that interests you
- Set Up Your Environment - Ensure you have access to a VAST cluster and the required SDKs
- Follow the Story - Read through the character interactions to understand the business context
- Implement the Solution - Work through the technical challenges step by step
- Validate Results - Use the success criteria to verify your implementation
- VAST Documentation - https://support.vastdata.com/s/
- vastpy GitHub - https://github.com/vast-data/vastpy
- vastdb GitHub - https://github.com/vast-data/vastdb_sdk
- Community Support - Join VAST's Cosmos community for additional help and examples
These labs are designed to be educational and practical. If you find issues or have suggestions for improvements, please contribute by:
- Reporting bugs or unclear instructions
- Suggesting additional scenarios or challenges
- Improving code examples or explanations
- Adding new labs or expanding existing ones
"In space exploration, preparation is everything. We build our data organization systems to handle the scale we expect, not the scale we hope to manage." - Dr. Alex Sterling