Skip to content

Modular and explainable stacked machine learning framework for environmental and structured data analysis. Includes preprocessing, evaluation, SHAP explainability, and tutorial notebooks.

License

MIT, MIT licenses found

Licenses found

MIT
LICENCE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

TyMill/EcoStackML

Repository files navigation

🌿 EcoStackML

PyPI Downloads DOI Python License Docs Notebook

Stacked Machine Learning Framework for Environmental and Tabular Data

EcoStackML is a modular and production-ready Python framework that leverages stacked machine learning techniques to deliver robust and explainable models for classification and regression tasks. Designed for environmental researchers, data scientists, and ML engineers.


🚀 Features

  • ✅ Supports multiple base models (Random Forest, XGBoost, SVM, etc.)
  • 🧠 Meta-learner for model stacking (e.g., Logistic Regression, Gradient Boosting)
  • 📊 Built-in evaluation: ROC-AUC, PR Curve, SHAP plots, confusion matrix
  • 🧽 Preprocessing pipeline with anomaly removal, scaling, imputation
  • 📅 Automatic datetime feature extraction
  • 💾 Save and load models, predictions, and metrics
  • 📓 Includes Jupyter notebooks (01–07) with step-by-step tutorials
  • 🔧 YAML-based configuration & logging setup

🛠 Installation

pip install .

For development:

pip install .[dev]

⚙️ Configuration (config.yaml)

data:
  path: "data/raw/sample.csv"
  target_column: "target"

preprocessing:
  missing_strategy: "median"
  scaling: "standard"
  anomaly_method: "iqr"
  datetime_cols: []

model:
  base_models:
    - name: "random_forest"
    - name: "xgboost"
  meta_model: "logistic"
  model_type: "classification"

split:
  test_size: 0.2
  stratify: true
  random_state: 42

output:
  model_dir: "models/"
  results_dir: "results/"

🧪 Quickstart

from ecostackml.data.loader import DataLoader
from ecostackml.preprocessing.cleaner import Cleaner
from ecostackml.data.splitter import split_data
from ecostackml.models.stacker import ModelStacker
from ecostackml.models.evaluator import evaluate_classification

df = DataLoader.from_csv("data/raw/sample.csv")
df["target"] = [0, 0, 1, 0, 1]

cleaner = Cleaner(strategy="median", scaling="standard", anomaly_method="iqr")
df_clean = cleaner.fit_transform(df)

X_train, X_test, _, y_train, y_test, _ = split_data(df_clean, target_column="target")

stacker = ModelStacker(
    base_models_config=[{"name": "random_forest"}, {"name": "xgboost"}],
    meta_model_name="logistic",
    model_type="classification"
)

stacker.fit(X_train, y_train)
y_pred = stacker.predict(X_test)

📈 Evaluation & SHAP

from ecostackml.models.evaluator import evaluate_classification
metrics = evaluate_classification(y_test, y_pred, plot=True)

stacker.explain_base_models(X_test)
stacker.explain_meta_model(X_test)

💾 Save / Load

from ecostackml.utils.save_load import save_model, load_model, save_stacker, load_stacker

save_model(stacker.meta_model.model, "models/meta_model.pkl")
save_stacker(stacker, "models/full_stacker.pkl")

restored = load_stacker("models/full_stacker.pkl")
restored.predict(X_test)

📁 Project Structure

EcoStackML/
├── src/ecostackml/
│   ├── data/
│   ├── preprocessing/
│   ├── models/
│   └── utils/
├── notebooks/
├── main.py
├── config.yaml
├── pyproject.toml
└── README.md

📚 Notebooks

  • 01_data_loading.ipynb – loading CSV, JSON, Parquet, Hive
  • 02_cleaning_and_preprocessing.ipynb – full preprocessing
  • 03_model_training.ipynb – base + stacking models
  • 04_model_evaluation.ipynb – metrics & visualization
  • 05_shap_explainer.ipynb – explainability
  • 06_full_pipeline.ipynb – complete pipeline
  • 07_save_and_load.ipynb – serialization demo

🤝 Contributing

Feel free to fork, contribute, and suggest improvements!


📜 License

MIT © 2025 Tymoteusz Miller

About

Modular and explainable stacked machine learning framework for environmental and structured data analysis. Includes preprocessing, evaluation, SHAP explainability, and tutorial notebooks.

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENCE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published