Nested-GroundedSAM

Hierarchical Segmentation of Nested Objects in Complex Outdoor Scenes

📖 Overview

This repository contains the implementation of Nested-GroundedSAM, a hybrid modular pipeline for open-set object detection and segmentation.
The project extends GroundedSAM by introducing a hierarchical strategy:

Large, enclosing objects (e.g., building facades) are detected first.
Sub-objects (e.g., windows, doors) are then identified within these regions.

The goal is to evaluate whether such a hierarchical prompt strategy improves detection and segmentation performance in complex outdoor scenes, compared to the flat reference approach.

🔍 Motivation

Standard zero-shot segmentation models like GroundedSAM struggle with nested and overlapping objects, e.g., windows in buildings or wheels on vehicles.
By focusing detection on context-specific subregions, Nested-GroundedSAM aims to:

Reduce background noise.
Improve recall for small or occluded objects.
Enable semantic evaluation across multiple hierarchy levels.

🏗️ Methodology

The pipeline consists of four main components:

Distributor
- Iterates over ADE20K dataset instances.
- Manages vocabulary for flat and nested prompts.
- Supports manual prompt definition and automated prompt reasoning with an LLM.
Reference Approach (Flat)
- Implements a restructured GroundedSAM pipeline.
- Uses a single global prompt for direct object detection.
Nested-GroundedSAM
- Introduces hierarchical detection.
- First detects enclosing objects (e.g., building).
- Recursively searches for sub-objects (e.g., window).
- Includes coordinate transformations to align Grounding DINO and SAM outputs.
Evaluation System
- Computes metrics (Precision, Recall, F1, mIoU).
- Aggregates results with a Composite Indicator (Distance-P2 method).
- Records runtime and outputs structured JSON logs.

📊 Evaluation

Dataset: ADE20K (300 instances, class: building_facade).
Metrics: Precision, Recall, F1, mIoU, Composite Indicator.
Statistical tests: Wilcoxon Signed-Rank, Pearson correlation.

Results (summary):

Nested-GroundedSAM does not show significant overall improvements in classical metrics.
Trade-off observed:
- Slightly higher recall (more true positives).
- Slightly lower precision (more false positives).
Runtime:
- Flat: ~614 ms per instance.
- Nested: ~636 ms per instance (+3.6%).
- Nested (cropped input): ~559 ms (-9%).

⚙️ Installation & Usage

Prerequisites

Python 3.10+
PyTorch ≥ 2.0
CUDA-compatible GPU (recommended)
Jupyter Notebook (for prototype development)

Setup

git clone https://github.com/MikaDinnus/nested-grounded-sam.git
cd nested-groundedsam

Run

Use the main.py to define the vocab and run the pipeline. Be aware, that a Microsoft Azure Computer Vision API and Gemini API Key is needed to run the reasoning part. It is possible to disable this part by defining the vocab by hand in main.py

All evaluations will be stored in structured JSON logs for reproducibility.

🚀 Future Work

Improve upsampling methods, as rescaling introduces artifacts.
Optimize runtime by integrating EfficientSAM backbones (distilled ViT encoders).
Extend experiments to multiple ADE20K classes beyond facades.
Explore tighter coupling with LLM reasoning for context-aware prompt generation.

📚 Citation

If you use this project, please cite:

@thesis{dinnus2025nested,
  title     = {Hierarchical Segmentation of Nested Objects in Complex Outdoor Scenes with GroundedSAM},
  author    = {Mika Frederic Dinnus},
  school    = {University of Münster},
  year      = {2025},
  type      = {Bachelor's Thesis}
}

🙏 Acknowledgements

Grounding DINO
Segment Anything Model (SAM)
ADE20K dataset by MIT CSAIL.
Supervision: Prof. Dr. Benjamin Risse, University of Münster.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
building_facade		building_facade
images		images
upsampling_experiment		upsampling_experiment
.gitignore		.gitignore
README.md		README.md
composite_indicator.py		composite_indicator.py
evaluation.py		evaluation.py
evaluation_values.xlsx		evaluation_values.xlsx
flat.py		flat.py
main.py		main.py
nested.py		nested.py
reasoning.py		reasoning.py
write_excel.py		write_excel.py
write_json.py		write_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nested-GroundedSAM

📖 Overview

🔍 Motivation

🏗️ Methodology

📊 Evaluation

⚙️ Installation & Usage

Prerequisites

Setup

Run

🚀 Future Work

📚 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

MikaDinnus/nested-grounded-sam

Folders and files

Latest commit

History

Repository files navigation

Nested-GroundedSAM

📖 Overview

🔍 Motivation

🏗️ Methodology

📊 Evaluation

⚙️ Installation & Usage

Prerequisites

Setup

Run

🚀 Future Work

📚 Citation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages