Covtype Classification with CapyMOA

This repository contains a classification model developed to analyze and classify the Covtype dataset using the CapyMOA framework (version 0.7.0). The primary objective is to perform classification on forest cover types and address concept drift within streaming data.

Project Overview

This project uses multiple classifiers to perform real-time, streaming classification on the Covtype dataset. The CapyMOA framework supports these streaming processes, allowing for continuous adaptation to evolving data patterns. To handle concept drift and data imbalance, the project uses specialized metrics and dynamic drift detection.

Authors

Ayan Ahmed - [email protected]
Rayyan Ahmed - [email protected]

Lecturer

Dr. Mariam Barry

Date

November 4, 2024

Dataset Selection and Task
- Covtype Dataset: Multi-class classification of forest cover types, with features related to elevation, distance metrics, and soil types.
- Goal: To predict cover type based on the features and analyze model performance under different conditions, including concept drift.
Data Exploration
- Dataset Statistics: 54 features, 7 classes, and approximately 51,000 instances used for analysis.
- Target Class Distribution: Examines imbalance across classes and variability in distribution over time.
- Attribute Analysis: Visualization of both numerical (e.g., elevation) and categorical (e.g., soil type) attributes to understand spatial variability and trends.
Experimental Setup
- Algorithms Used:
  - Hoeffding Tree
  - Adaptive Random Forest
  - Online Bagging
  - Majority Class Baseline
- Hyperparameter Optimization: Performed using random search.
- Evaluation Metrics:
  - κm (Kappa M) and κt (Temporal Kappa) are chosen to handle class imbalance and to assess adaptability to changes over time.
Results and Analysis
- Performance Evaluation: Comprehensive analysis based on κm, κt, and wall-clock time.
- Observations: Adaptive Random Forest exhibited the highest κt and κm scores, performing best with concept drift. Hoeffding Tree balanced accuracy and computational efficiency, while Majority Class performed poorly in imbalanced scenarios.
Concept Drift Detection
- Drift Detection: Utilized ADWIN (Adaptive Windowing) for automatic concept drift detection, with multiple drift points identified, particularly in instances with substantial changes.
- Adaptive Models: Adaptive Random Forest and Hoeffding Tree demonstrated resilience to concept drift, while Online Bagging was less consistent.
Neural Network-Based Analysis
- Architecture: A custom neural network implemented with PyTorch, leveraging CapyMOA’s instance-based streaming for further classification experiments.
- Training Strategy: Real-time training with a streaming loop and prequential evaluation for continuous assessment of model performance.
- Results: Demonstrated the potential of neural networks for handling non-stationary data with κm values peaking above 90, indicating robust performance over time.

Installation and Setup

Requirements:
- Python >= 3.8
- CapyMOA (version 0.7.0)
- PyTorch
Usage:
- To run the classification pipeline with CapyMOA: bash python main.py
- To visualize concept drift and other metrics: bash python visualize_metrics.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Lab2CapyMOA_AHMED_Ayan_AHMED_Rayyan.pdf		Lab2CapyMOA_AHMED_Ayan_AHMED_Rayyan.pdf
Lab2_CapyMOA_AHMED_Ayan_AHMED_Rayyan.ipynb		Lab2_CapyMOA_AHMED_Ayan_AHMED_Rayyan.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Covtype Classification with CapyMOA

Project Overview

Authors

Lecturer

Date

Contents

Installation and Setup

Acknowledgments

This project is based on the Covtype dataset and leverages the CapyMOA framework for streaming data analysis.

About

Uh oh!

Releases

Packages

Languages

Ray7498/Classification-on-the-Cover-Type-Streaming-Dataset-using-CapyMOA

Folders and files

Latest commit

History

Repository files navigation

Covtype Classification with CapyMOA

Project Overview

Authors

Lecturer

Date

Contents

Installation and Setup

Acknowledgments

This project is based on the Covtype dataset and leverages the CapyMOA framework for streaming data analysis.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages