PyBullet Robotic Arm Simulation with Vision-Language Models

A Python simulation that combines PyBullet robotic arm control with zero-shot image-text matching capabilities using the CLIP vision-language model. This project was created as part of my exploration into applying vision-language models in robotics.

Overview

This project demonstrates:

Physics-based robotic arm simulation using PyBullet
Vision-guided object selection using CLIP vision-language model
Intelligent pick and place operations based on text prompts

The robot captures images from its camera, analyzes them using CLIP, selects objects based on text descriptions, and performs pick-and-place operations, including color sorting and interactive placement.

Implementation Videos

VLM interaction:

vlm.mp4

Automatic Sorting:

Sorting.mp4

Project Structure

├── panda_vision_simulation.py             # Vision-guided robot simulation Class
├── color_sorting_vlm.py                   # Color sorting and interactive demo with VLM
├── simple_pick_place_demo.py              # Simple pick and place demo
├── requirements.txt                       # Python dependencies
└── README.md                              # This documentation

Features

Vision-Guided Robot Simulation

Camera image capture from PyBullet simulation
Object detection using 3D-to-2D projection and segmentation masks
Vision-language matching with CLIP for object selection
Text-prompted pick and place ("pick up the red cube")
Multiple objects (red cube, red sphere, blue cube, green sphere, yellow cylinder)
Color sorting zones with physical borders to prevent objects from falling
Interactive zone selection and throw action for object placement

Requirements

Basic Requirements (for robot simulation)

Python 3.7+
PyBullet
NumPy

Full Requirements (for vision integration)

All basic requirements plus:
PyTorch
Transformers (Hugging Face)
Pillow (PIL)
Matplotlib
Requests

Installation

Clone or download this project:

git clone https://github.com/Nabil-Miri/vlm-robot-color-sorting.git

(Recommended) Create and activate a Python virtual environment:
```
python3 -m venv .venv
source .venv/bin/activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Color Sorting and Interactive Demo

To run the color sorting and interactive demo:

python3 color_sorting_vlm.py

When you start the demo, you will be prompted to choose a mode:

Automatic Color Sorting: The robot will automatically sort all objects into their matching colored zones.
Interactive Text Prompt: You can enter text prompts to select which object the robot should pick up, and choose where to place it (including a "throw away" option).

Follow the on-screen instructions to interact with the robot and sorting zones.

Technical Details

CLIP Vision-Language Matching

The robot uses CLIP to match images of objects to your text prompt:

# For each object crop, CLIP returns a similarity score with the text prompt
similarity_scores = model.compute_object_similarity(crops, text_prompt)
selected_object, best_score = model.select_best_object(similarity_scores)
# Robot picks and places the selected object

CLIP compares each cropped object image to your prompt (like "red cube") and returns a score for each. The robot picks the object with the highest score and moves it as you choose.

Inverse Kinematics

The robot uses PyBullet's built-in inverse kinematics solver to calculate joint angles needed to reach target positions:

joint_positions = p.calculateInverseKinematics(
    self.robot_id,
    endEffectorLinkIndex=11,  # Panda end-effector link
    targetPosition=target_position,
    targetOrientation=target_orientation
)

Robot Control

Joint control uses position control mode:

Position Control: Joints move to target positions
Gripper Control: Two-finger gripper with synchronized motion

Future Enhancements

Possible extensions to this project:

Integrate advanced object segmentation for more accurate identification.
Detect and localize area zones visually, not just by preset coordinates.
Enable multi-object reasoning for commands involving relationships (e.g., “Stack all blue cubes, then put the red sphere on top”).

Resources

Contributing

Feel free to fork, modify, and submit PRs! Suggestions and improvements are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyBullet Robotic Arm Simulation with Vision-Language Models

Overview

Implementation Videos

Project Structure

Features

Vision-Guided Robot Simulation

Requirements

Basic Requirements (for robot simulation)

Full Requirements (for vision integration)

Installation

Usage

Color Sorting and Interactive Demo

Technical Details

CLIP Vision-Language Matching

Inverse Kinematics

Robot Control

Future Enhancements

Resources

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
media		media
README.md		README.md
color_sorting_vlm.py		color_sorting_vlm.py
panda_vision_simulation.py		panda_vision_simulation.py
requirements.txt		requirements.txt
simple_pick_place_demo.py		simple_pick_place_demo.py

Nabil-Miri/vlm-robot-color-sorting

Folders and files

Latest commit

History

Repository files navigation

PyBullet Robotic Arm Simulation with Vision-Language Models

Overview

Implementation Videos

Project Structure

Features

Vision-Guided Robot Simulation

Requirements

Basic Requirements (for robot simulation)

Full Requirements (for vision integration)

Installation

Usage

Color Sorting and Interactive Demo

Technical Details

CLIP Vision-Language Matching

Inverse Kinematics

Robot Control

Future Enhancements

Resources

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages