Skip to content

Learning repository for testing ML workflow patterns through iterative experiments. Documents what works and what fails to enable research and citizen science initiatives.

Notifications You must be signed in to change notification settings

iTrauco/experiments-framework

Repository files navigation

Modular ML Experiments Framework

Repositoryexperiments-framework

A systematic framework for machine learning experiments with modular workflow patterns. Built from lessons learned through iterative experimentation to enable reproducible research.


Table of Contents


Project Structure

experiments-framework/
├── notebooks/                                  # all jupyter notebooks
│   ├── scripts/                               # notebook utility scripts
│   ├── templates/                             # clean starting templates
│   │   ├── 01_preprocessing.working.ipynb     # data prep template
│   │   ├── 02_annotation.working.ipynb        # labeling template
│   │   ├── 03_training.working.ipynb          # model training template
│   │   └── systems.working.ipynb              # infrastructure template
│   ├── machine_learning/                      # ml development
│   │   ├── 01_preprocessing.dev.ipynb         # active preprocessing
│   │   ├── 02_annotation.dev.ipynb            # active annotation
│   │   ├── 03_training.dev.ipynb              # active training
│   │   ├── preprocessing/                     # preprocessing experiments
│   │   ├── annotation/                        # annotation experiments
│   │   └── training/                          # training experiments
│   └── systems/                               # infrastructure notebooks
│       └── systems.dev.ipynb                  # systems development
├── data/                                      # data organization
│   ├── raw/                                   # original recordings
│   ├── clips/                                 # extracted video clips
│   ├── frames/                                # extracted images
│   └── annotations/                           # labels and metadata
├── configs/                                   # workflow configurations
├── models/                                    # trained ml models
├── scripts/                                   # project setup scripts
│   ├── setup_experiments_structure.sh         # creates dirs/notebooks
│   ├── setup_provenance.sh                    # tracks repo evolution
│   └── setup_orcid.sh                         # citation setup
├── lib/                                       # reusable code modules
│   └── notebook_tools/                        # notebook utilities
├── references/                                # citations and refs
├── environment.yml                            # conda environment
└── PROVENANCE.md                              # repo history tracking

Scope

  • Modular workflow patterns for preprocessing, annotation, and training
  • Clear separation of concerns across directories
  • Reproducible structure for researchers and students
  • Minimal dependencies with documented environment setup

Previous Work

  • Primary(ACTIVE) development repo → traffic-vision-v0.4
  • Prior(DEPRECATED) experimental repo → experiments-test
  • Achieved successful vehicle counting on 21 of 30 GDOT traffic camera feeds
  • Framework failed due to monolithic notebooks and environment conflicts
  • Individual notebook execution became a bottleneck

This repo rebuilds the workflow framework to be modular and scalable.

Quick Start

  1. Clone and setup:

    git clone https://github.com/iTrauco/experiments-framework.git
    cd experiments-framework
    chmod +x scripts/*.sh
    ./scripts/setup_experiments_structure.sh
  2. Track your work:

    ./scripts/setup_provenance.sh
  3. Start developing in the .dev notebooks or copy templates to begin new experiments.


Notebook Tools Installation

cd /path/to/notebook_tools
pip install -e .

This installs the library in "editable" mode - any changes you make to the code are immediately available without reinstalling.

⚠️ Development Status: All modules in lib/ are early-stage development prototypes. Functionality is still being worked out - some modules may be dead code, others are spaghetti. Creating modular packages as I identify what's killing my bandwidth.


Reproducibility Framework

Environment Setup

This project uses a Conda environment to manage dependencies for reproducible analysis. Follow these steps to set up the environment:

Prerequisites

  • Anaconda or Miniconda installed on your system
  • Git for cloning the repository

Setup Instructions

  1. Clone the repository:

    git clone https://github.com/iTrauco/experiments-framework.git
    cd experiments-framework
  2. Create the Conda environment:

    conda create -n traffic-vision-env python=3.11 -y
  3. Activate the environment:

    conda activate traffic-vision-env
  4. Install baseline packages:

    conda install -c conda-forge jupyter numpy pandas matplotlib seaborn scikit-learn opencv -y
  5. Install deep learning and computer vision packages:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    pip install ultralytics supervision
  6. Launch Jupyter Notebook:

    jupyter notebook
  7. Access the notebook in your browser via the URL displayed in the terminal.


Environment Details

The environment includes essential data science and computer vision packages:


Scripts

setup_experiments_structure.sh

Creates the complete project directory structure with interactive menu. Features:

  • Default creates in parent directory (../)
  • Remembers last used location
  • Rollback option to undo
  • Installs GitHub Actions and pre-commit hooks

setup_provenance.sh

Documents your experimental evolution across repository rebuilds:

  • Links to previous repos and branches
  • Records what you tested and learned
  • Builds a timeline in PROVENANCE.md

setup_orcid.sh

Adds ORCID identifier and citation infrastructure to your project.


Workflows

Development Flow

  1. Copy templates from notebooks/templates/ to start new work
  2. Develop in .dev notebooks at the machine_learning level
  3. Create experimental variations in subdirectories
  4. Track successful patterns in LESSONS_LEARNED.md files

GitHub Actions

Automatically converts notebooks to markdown on push for better documentation and diffs.

Pre-commit Hooks

Validates notebook metadata and cleans outputs before commits.


Environment Management

For collaborators who enhance the environment with additional packages:

# Export the updated environment
conda activate traffic-vision-env
conda env export > environment.yml

This ensures full reproducibility across systems by preserving all dependencies and versions.


Author: Christopher Trauco | ORCID: 0009-0005-8113-6528

Packages

No packages published

Languages