This project demonstrates a complete MLOps pipeline for predicting Titanic passenger survival using various modern tools and technologies including Astronomer Airflow, Redis, Alibi Detect for drift detection, Prometheus and Grafana for monitoring.
This project was built to learn and implement various MLOps tools and practices:
- Astronomer Airflow: Orchestrating data pipelines
- Redis: Feature store for caching processed features
- Alibi Detect: Data drift detection
- Prometheus & Grafana: Monitoring and visualization
- PostgreSQL: Data storage
- Google Cloud Platform: Data source
- Flask: Web application interface
- Python 3.11.13
- Astronomer CLI 1.31.0
- Apache Airflow
- Redis
- PostgreSQL
- Prometheus
- Grafana
- Docker & Docker Compose
- Google Cloud Platform (GCP)
- Flask
- Scikit-learn
- Alibi Detect
- Windows 11
- Docker Desktop (running)
- Conda environment
- Google Cloud Platform account (setup instructions below)
- Astronomer CLI 1.31.0
Important: Complete this setup before starting Astro.
- Go to Google Cloud Console → Cloud Storage → Buckets
- Click "Create Bucket"
- Give your bucket a name (e.g.,
bucket_titanic_1
) - In "Choose how to control access to objects":
- UNTICK
Enforce public access prevention on this bucket
- UNTICK
- Continue and create the bucket
- Upload your dataset file (e.g.,
Titanic-Dataset.csv
) to the bucket
- Go to IAM & Admin → Service Accounts
- Click "Create Service Account"
- Give it a name (e.g.,
airflow-service-account
) - Click "Continue"
- Grant this service account access to project by selecting these roles:
Owner
Storage Object Admin
Storage Object Viewer
- Click "Continue" → "Done"
- Find your newly created service account in the list
- Click the three dots (⋮) → "Manage Keys"
- Click "Add Key" → "Create New Key"
- Select "JSON" format
- Click "Create" and download the JSON key file
- Important: Save this file securely - you'll need it for Astro Airflow
- Go back to Cloud Storage → Buckets
- Find your created bucket and click the three dots (⋮)
- Click "Edit Access"
- Add your service account with these principals/roles:
Owner
Storage Object Admin
Storage Object Viewer
- Click "Save"
Note: Make sure the service account JSON key file is downloaded as it will be required for Astro setup.
Create and activate a conda environment:
conda create -n titanic-mlops python=3.
conda activate titanic-mlops
Install Astronomer CLI version 1.31.0:
# Install using winget (Windows)
winget install -e --id Astronomer.Astro -v 1.31.0
# Verify installation
astro version
# Initialize Astro project
astro dev init
# Add Google Cloud Provider to Dockerfile
# Add this line to your Dockerfile:
# RUN pip install apache-airflow-providers-google
# **Note**: Don't add `psycopg2` to `requirements.txt` initially as it conflicts with Astro setup.
Create/modify .astro/config.yaml
:
deployments:
- name: dev
executor: celery
image:
name: quay.io/astronomer/astro-runtime:7.3.0
env: dev
volumes:
- ./include:/usr/local/airflow/include
- Take the service account JSON key file downloaded from GCP (Step 1.3)
- Place it in the
include/
folder of your Astro project - Rename it to
gcp-key.json
Important: Make sure Docker Desktop is running before executing this command.
astro dev start
Once the containers are running, go to http://localhost:8080
(Airflow Dashboard):
- Go to Admin → Connections → Create
- Connection ID:
google_cloud_default
- Connection Type:
Google Cloud
- Keyfile Path:
/usr/local/airflow/include/gcp-key.json
- Scopes:
https://www.googleapis.com/auth/cloud-platform
- Connection ID:
postgres_default
- Connection Type:
Postgres
- Host: Container name (check docker container name)
- Database:
postgres
- Login:
postgres
- Password:
postgres
- Port:
5432
Create dags/extract_data_from_gcp.py
with the DAG to extract data from GCP bucket.
Important Notes:
- Make sure
dag_id="example_astronauts"
matches the DAG ID shown in Airflow dashboard - Ensure your GCP bucket
bucket_titanic_1
containsTitanic-Dataset.csv
- The DAG ID might change, so use the one that appears in the dashboard after creating the file
# Pull Redis image
docker pull redis
# Run Redis container
docker run -d --name redis-container -p 6379:6379 redis
Important: Keep the Redis container running throughout the project.
To ensure proper imports work throughout the project, install the package in development mode:
# Activate your conda environment first
conda activate titanic-mlops
# Install the project in editable mode
pip install -e .
Note: This allows you to import modules from the src/
, config/
, and other project directories without path issues.
- Go to Airflow dashboard (
http://localhost:8080
) - Navigate to DAGs
- Find your DAG (with correct DAG ID)
- Click the play/start button to execute the pipeline
- This will fetch data from GCP and save it to PostgreSQL
Important: After the data pipeline completes, you need to train the model and set up the feature store.
# Make sure both Astro and Redis containers are running
# Run the training pipeline to:
# - Read data from PostgreSQL
# - Preprocess the data
# - Train the ML model
# - Store features in Redis feature store
# - Save the trained model
python pipeline/training_pipeline.py
# You can also run data_ingestion.py, data_processing.py, feature_store.py and model_training.py initally to test their working.
Note: This step is crucial as it:
- Creates the trained model (
artifacts/models/random_forest_model.pkl
) - Populates the Redis feature store with processed features
- Sets up the reference data for drift detection
Ensure you have docker-compose.yml
and prometheus.yml
files in your project folder, then run:
docker-compose up -d
This will start:
- Prometheus (accessible at
http://localhost:9090
) - Grafana (accessible at
http://localhost:3000
)
- Start your Flask application (
app.py
) - Go to
http://localhost:9090
- Navigate to Status → Targets
- Verify that
http://host.docker.internal:5000/metrics
endpoint state is UP
- Go to
http://localhost:3000
(Grafana dashboard) - Navigate to Connections → Data Sources
- Select Prometheus → Add Connection
- Set Prometheus server URL:
http://prometheus:9090
- Click "Save & Test" (should succeed)
Create a new dashboard with these visualizations:
-
Drift Detection Visualization:
- Metric:
drift_count_total
- This tracks data drift occurrences
- Metric:
-
Prediction Count Visualization:
- Metric:
prediction_count_total
- This tracks the number of predictions made
- Metric:
titanic_survival_prediction/
├── artifacts/
│ ├── models/ # Trained models
│ └── raw/ # Raw data
├── config/ # Configuration files
├── dags/ # Airflow DAGs
│ └── extract_data_from_gcp.py
├── include/ # GCP service account key
│ └── gcp-key.json
├── src/ # Source code
├── static/ # Static files for web UI
├── templates/ # HTML templates
├── app.py # Flask application
├── docker-compose.yml # Docker services configuration
├── prometheus.yml # Prometheus configuration
├── requirements.txt # Python dependencies
└── README.md
- Data Extraction: Airflow DAG fetches data from GCP bucket
- Data Storage: Data is stored in PostgreSQL
- Data Processing: Features are processed and stored in Redis feature store
- Model Training: ML model is trained using the pipeline
- Model Serving: Flask app serves predictions with drift detection
- Monitoring: Prometheus collects metrics, Grafana visualizes them
-
Ensure all containers are running:
- Astro Airflow containers
- Redis container
- Prometheus & Grafana containers
-
Execute the Airflow DAG to load data
-
Run the Flask application:
python app.py
-
Access the web interface at
http://localhost:5000
- Prometheus:
http://localhost:9090
- Grafana:
http://localhost:3000
- Airflow:
http://localhost:8080
- Application:
http://localhost:5000
- Data Drift Detection: Using Alibi Detect to monitor input data drift
- Feature Store: Redis-based feature caching
- Real-time Monitoring: Prometheus metrics collection
- Visual Dashboard: Grafana dashboards for monitoring
- Automated Pipeline: Airflow orchestration
- Web Interface: User-friendly prediction interface
- Keep Docker Desktop running throughout the setup
- Ensure both Astro and Redis containers are running simultaneously
- The DAG ID might change when creating the DAG file - use the one shown in Airflow dashboard
- Don't include
psycopg2
inrequirements.txt
initially to avoid conflicts - Make sure your GCP bucket contains the required dataset
- Prometheus server URL in Grafana should use container name (
prometheus:9090
) not localhost
- If containers fail to start, ensure Docker Desktop is running
- If connections fail, verify container names and ports
- If DAG doesn't appear, check the DAG ID and file syntax
- If metrics don't appear in Grafana, ensure the Flask app is running and accessible
This project demonstrates:
- Setting up a complete MLOps pipeline
- Using Astronomer for Airflow orchestration
- Implementing feature stores with Redis
- Data drift detection with Alibi Detect
- Monitoring with Prometheus and Grafana
- Containerization with Docker
- Cloud integration with GCP
Note: This project is designed for learning MLOps concepts and tools. The setup involves multiple components working together to create a comprehensive machine learning operations pipeline.