Titanic Survival Prediction - MLOps Project

This project demonstrates a complete MLOps pipeline for predicting Titanic passenger survival using various modern tools and technologies including Astronomer Airflow, Redis, Alibi Detect for drift detection, Prometheus and Grafana for monitoring.

🎯 Project Overview

This project was built to learn and implement various MLOps tools and practices:

Astronomer Airflow: Orchestrating data pipelines
Redis: Feature store for caching processed features
Alibi Detect: Data drift detection
Prometheus & Grafana: Monitoring and visualization
PostgreSQL: Data storage
Google Cloud Platform: Data source
Flask: Web application interface

🛠️ Technologies Used

Python 3.11.13
Astronomer CLI 1.31.0
Apache Airflow
Redis
PostgreSQL
Prometheus
Grafana
Docker & Docker Compose
Google Cloud Platform (GCP)
Flask
Scikit-learn
Alibi Detect

📋 Prerequisites

Windows 11
Docker Desktop (running)
Conda environment
Google Cloud Platform account (setup instructions below)
Astronomer CLI 1.31.0

🚀 Setup Instructions

1. Google Cloud Platform Setup

Important: Complete this setup before starting Astro.

Step 1.1: Create GCS Bucket

Go to Google Cloud Console → Cloud Storage → Buckets
Click "Create Bucket"
Give your bucket a name (e.g., bucket_titanic_1)
In "Choose how to control access to objects":
- UNTICK Enforce public access prevention on this bucket
Continue and create the bucket
Upload your dataset file (e.g., Titanic-Dataset.csv) to the bucket

Step 1.2: Create Service Account

Go to IAM & Admin → Service Accounts
Click "Create Service Account"
Give it a name (e.g., airflow-service-account)
Click "Continue"
Grant this service account access to project by selecting these roles:
- Owner
- Storage Object Admin
- Storage Object Viewer
Click "Continue" → "Done"

Step 1.3: Generate Service Account Key

Find your newly created service account in the list
Click the three dots (⋮) → "Manage Keys"
Click "Add Key" → "Create New Key"
Select "JSON" format
Click "Create" and download the JSON key file
Important: Save this file securely - you'll need it for Astro Airflow

Step 1.4: Configure Bucket Permissions

Go back to Cloud Storage → Buckets
Find your created bucket and click the three dots (⋮)
Click "Edit Access"
Add your service account with these principals/roles:
- Owner
- Storage Object Admin
- Storage Object Viewer
Click "Save"

Note: Make sure the service account JSON key file is downloaded as it will be required for Astro setup.

2. Environment Setup

Create and activate a conda environment:

conda create -n titanic-mlops python=3.
conda activate titanic-mlops

3. Install Astronomer CLI

Install Astronomer CLI version 1.31.0:

# Install using winget (Windows)
winget install -e --id Astronomer.Astro -v 1.31.0

# Verify installation
astro version

4. Initialize Astro Airflow Project

# Initialize Astro project
astro dev init

# Add Google Cloud Provider to Dockerfile
# Add this line to your Dockerfile:
# RUN pip install apache-airflow-providers-google
# **Note**: Don't add `psycopg2` to `requirements.txt` initially as it conflicts with Astro setup.

5. Configure Astro Settings

Create/modify .astro/config.yaml:

deployments:
  - name: dev
    executor: celery
    image:
      name: quay.io/astronomer/astro-runtime:7.3.0
    env: dev
    volumes:
      - ./include:/usr/local/airflow/include

6. Service Account Setup

Take the service account JSON key file downloaded from GCP (Step 1.3)
Place it in the include/ folder of your Astro project
Rename it to gcp-key.json

7. Start Astro Development Environment

Important: Make sure Docker Desktop is running before executing this command.

astro dev start

8. Configure Airflow Connections

Once the containers are running, go to http://localhost:8080 (Airflow Dashboard):

Google Cloud Connection

Go to Admin → Connections → Create
Connection ID: google_cloud_default
Connection Type: Google Cloud
Keyfile Path: /usr/local/airflow/include/gcp-key.json
Scopes: https://www.googleapis.com/auth/cloud-platform

PostgreSQL Connection

Connection ID: postgres_default
Connection Type: Postgres
Host: Container name (check docker container name)
Database: postgres
Login: postgres
Password: postgres
Port: 5432

9. Create Data Pipeline DAG

Create dags/extract_data_from_gcp.py with the DAG to extract data from GCP bucket.

Important Notes:

Make sure dag_id="example_astronauts" matches the DAG ID shown in Airflow dashboard
Ensure your GCP bucket bucket_titanic_1 contains Titanic-Dataset.csv
The DAG ID might change, so use the one that appears in the dashboard after creating the file

10. Install Redis

# Pull Redis image
docker pull redis

# Run Redis container
docker run -d --name redis-container -p 6379:6379 redis

Important: Keep the Redis container running throughout the project.

11. Install Project in Development Mode

To ensure proper imports work throughout the project, install the package in development mode:

# Activate your conda environment first
conda activate titanic-mlops

# Install the project in editable mode
pip install -e .

Note: This allows you to import modules from the src/, config/, and other project directories without path issues.

12. Run Data Pipeline

Go to Airflow dashboard (http://localhost:8080)
Navigate to DAGs
Find your DAG (with correct DAG ID)
Click the play/start button to execute the pipeline
This will fetch data from GCP and save it to PostgreSQL

13. Run Training Pipeline

Important: After the data pipeline completes, you need to train the model and set up the feature store.

# Make sure both Astro and Redis containers are running
# Run the training pipeline to:
# - Read data from PostgreSQL
# - Preprocess the data
# - Train the ML model
# - Store features in Redis feature store
# - Save the trained model

python pipeline/training_pipeline.py

# You can also run data_ingestion.py, data_processing.py, feature_store.py and model_training.py initally to test their working.

Note: This step is crucial as it:

Creates the trained model (artifacts/models/random_forest_model.pkl)
Populates the Redis feature store with processed features
Sets up the reference data for drift detection

14. Setup Monitoring with Prometheus & Grafana

Ensure you have docker-compose.yml and prometheus.yml files in your project folder, then run:

docker-compose up -d

This will start:

Prometheus (accessible at http://localhost:9090)
Grafana (accessible at http://localhost:3000)

15. Configure Prometheus Monitoring

Start your Flask application (app.py)
Go to http://localhost:9090
Navigate to Status → Targets
Verify that http://host.docker.internal:5000/metrics endpoint state is UP

16. Configure Grafana Dashboard

Go to http://localhost:3000 (Grafana dashboard)
Navigate to Connections → Data Sources
Select Prometheus → Add Connection
Set Prometheus server URL: http://prometheus:9090
Click "Save & Test" (should succeed)

Create Visualizations

Create a new dashboard with these visualizations:

Drift Detection Visualization:
- Metric: drift_count_total
- This tracks data drift occurrences
Prediction Count Visualization:
- Metric: prediction_count_total
- This tracks the number of predictions made

🗂️ Project Structure

titanic_survival_prediction/
├── artifacts/
│   ├── models/          # Trained models
│   └── raw/            # Raw data
├── config/             # Configuration files
├── dags/               # Airflow DAGs
│   └── extract_data_from_gcp.py
├── include/            # GCP service account key
│   └── gcp-key.json
├── src/                # Source code
├── static/             # Static files for web UI
├── templates/          # HTML templates
├── app.py              # Flask application
├── docker-compose.yml  # Docker services configuration
├── prometheus.yml      # Prometheus configuration
├── requirements.txt    # Python dependencies
└── README.md

🔄 Workflow

Data Extraction: Airflow DAG fetches data from GCP bucket
Data Storage: Data is stored in PostgreSQL
Data Processing: Features are processed and stored in Redis feature store
Model Training: ML model is trained using the pipeline
Model Serving: Flask app serves predictions with drift detection
Monitoring: Prometheus collects metrics, Grafana visualizes them

🏃‍♂️ Running the Application

Ensure all containers are running:
- Astro Airflow containers
- Redis container
- Prometheus & Grafana containers
Execute the Airflow DAG to load data
Run the Flask application:
```
python app.py
```
Access the web interface at http://localhost:5000

📊 Monitoring

Prometheus: http://localhost:9090
Grafana: http://localhost:3000
Airflow: http://localhost:8080
Application: http://localhost:5000

🔍 Features

Data Drift Detection: Using Alibi Detect to monitor input data drift
Feature Store: Redis-based feature caching
Real-time Monitoring: Prometheus metrics collection
Visual Dashboard: Grafana dashboards for monitoring
Automated Pipeline: Airflow orchestration
Web Interface: User-friendly prediction interface

🚨 Important Notes

Keep Docker Desktop running throughout the setup
Ensure both Astro and Redis containers are running simultaneously
The DAG ID might change when creating the DAG file - use the one shown in Airflow dashboard
Don't include psycopg2 in requirements.txt initially to avoid conflicts
Make sure your GCP bucket contains the required dataset
Prometheus server URL in Grafana should use container name (prometheus:9090) not localhost

🔧 Troubleshooting

If containers fail to start, ensure Docker Desktop is running
If connections fail, verify container names and ports
If DAG doesn't appear, check the DAG ID and file syntax
If metrics don't appear in Grafana, ensure the Flask app is running and accessible

🎯 Learning Outcomes

This project demonstrates:

Setting up a complete MLOps pipeline
Using Astronomer for Airflow orchestration
Implementing feature stores with Redis
Data drift detection with Alibi Detect
Monitoring with Prometheus and Grafana
Containerization with Docker
Cloud integration with GCP

Note: This project is designed for learning MLOps concepts and tools. The setup involves multiple components working together to create a comprehensive machine learning operations pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
artifacts		artifacts
config		config
dags		dags
notebooks		notebooks
pipeline		pipeline
src		src
static		static
templates		templates
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
DEMO_DATA.md		DEMO_DATA.md
Dockerfile		Dockerfile
README.md		README.md
airflow_settings.yaml		airflow_settings.yaml
app.py		app.py
docker-compose.yml		docker-compose.yml
packages.txt		packages.txt
prometheus.yml		prometheus.yml
requirements.txt		requirements.txt
setup.py		setup.py

avnishs17/user_survival_prediction

Folders and files

Latest commit

History

Repository files navigation