Skip to content

huggingface/jupyter-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Jupyter Agent 🤓

Thumbnail

Jupyter Agent is an open-source data science agent that lives inside your Jupyter notebook. It can:

  • Read notebook + dataset context
  • Execute Python code (pandas, numpy, matplotlib, …)
  • Produce step-by-step reasoning traces with intermediate computations

👉 Think of it as Cursor, but built natively for data analysis workflows.

📖 Learn more in our blog post or try the live demo.

🚀 What’s Included

We release:

🎯 Why This Matters

  • Jupyter notebooks are the de facto environment for scientists and analysts.
  • We built a dataset + training pipeline that helps small models become strong data agents.
  • On the DABStep benchmark, our tuned 4B model reaches SOTA performance for its size on realistic data science tasks.

🏗️ Pipeline Overview

Our pipeline processes the Meta Kaggle Notebooks dataset (2TB) into training-ready data:

  1. Deduplicate notebooks (~90% duplicates)
  2. Fetch linked datasets for executability
  3. Score notebooks for educational quality
  4. Filter irrelevant content
  5. Generate dataset-grounded QA pairs
  6. Produce reasoning + execution traces
  7. Curate final dataset (~2B tokens)

Pipeline

🔧 Quick Start

Clone the repo:

git clone https://github.com/huggingface/jupyter-agent.git
cd jupyter-agent

Run the Code

  • To generate the dataset, check the data/ folder.
  • To fine-tune the model, check the finetuning/ folder.

Load the Dataset

from datasets import load_dataset
ds = load_dataset("data-agents/jupyter-agent-dataset", split="non-thinking")

Run a Fine-Tuned Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = "data-agents/jupyter-agent-qwen3-4b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, torch_dtype="auto", device_map="auto")

📊 Results

  • Base Qwen3-4B-Instruct (easy split): 38.7%
  • With scaffolding: 52.8%
  • After fine-tuning on our dataset: 75%

DABstep Easy Score

Our fine-tuned model is the current SOTA small-model agent on DABStep.

📚 Resources

📜 Citation

@misc{jupyteragentdataset,
  title={Jupyter Agent Dataset},
  author={Colle, Baptiste and Yukhymenko, Hanna and von Werra, Leandro},
  year={2025}
}

About

Training LLMs to reason and analyze data with notebooks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published