Agent S: Use Computer Like a Human

🌐 [S2 blog] 📄 [S2 Paper (COLM 2025)] 🎥 [S2 Video]

🌐 [S1 blog] 📄 [S1 Paper (ICLR 2025)] 🎥 [S1 Video]

Skip the setup? Try Agent S in Simular Cloud

🥳 Updates

2025/08/01: Agent S2.5 is released (gui-agents v0.2.5): simpler, better, and faster! New SOTA on OSWorld-Verified!
2025/07/07: The Agent S2 paper is accepted to COLM 2025! See you in Montreal!
2025/04/27: The Agent S paper won the Best Paper Award 🏆 at ICLR 2025 Agentic AI for Science Workshop!
2025/04/01: Released the Agent S2 paper with new SOTA results on OSWorld, WindowsAgentArena, and AndroidWorld!
2025/03/12: Released Agent S2 along with v0.2.0 of gui-agents, the new state-of-the-art for computer use agents (CUA), outperforming OpenAI's CUA/Operator and Anthropic's Claude 3.7 Sonnet Computer-Use!
2025/01/22: The Agent S paper is accepted to ICLR 2025!
2025/01/21: Released v0.1.2 of gui-agents library, with support for Linux and Windows!
2024/12/05: Released v0.1.0 of gui-agents library, allowing you to use Agent-S for Mac, OSWorld, and WindowsAgentArena with ease!
2024/10/10: Released the Agent S paper and codebase!

💡 Introduction

Welcome to Agent S, an open-source framework designed to enable autonomous interaction with computers through Agent-Computer Interface. Our mission is to build intelligent GUI agents that can learn from past experiences and perform complex tasks autonomously on your computer.

Whether you're interested in AI, automation, or contributing to cutting-edge agent-based systems, we're excited to have you here!

🎯 Current Results

Benchmark	Agent S2.5	Previous SOTA
OSWorld Verified (100 step)	56.0%	53.1%
OSWorld Verified (50 step)	54.2%	50.6%

🛠️ Installation & Setup

Prerequisites

Single Monitor: Our agent is designed for single monitor screens
Security: The agent runs Python code to control your computer - use with care
Supported Platforms: Linux, Mac, and Windows

Installation

pip install gui-agents

API Configuration

Option 1: Environment Variables

Add to your .bashrc (Linux) or .zshrc (MacOS):

export OPENAI_API_KEY=<YOUR_API_KEY>
export ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
export HF_TOKEN=<YOUR_HF_TOKEN>

Option 2: Python Script

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

Supported Models

We support Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. See models.md for details.

Grounding Models (Required)

For optimal performance, we recommend UI-TARS-1.5-7B hosted on Hugging Face Inference Endpoints or another provider. See Hugging Face Inference Endpoints for setup instructions.

🚀 Usage

⚡️ Recommended Setup:
For the best configuration, we recommend using OpenAI o3-2025-04-16 as the main model, paired with UI-TARS-1.5-7B for grounding.

CLI

Run Agent S2.5 with the required parameters:

agent_s \
    --provider openai \
    --model o3-2025-04-16 \
    --ground_provider huggingface \
    --ground_url http://localhost:8080 \
    --ground_model ui-tars-1.5-7b \
    --grounding_width 1920 \
    --grounding_height 1080

Required Parameters

--provider: Main generation model provider (e.g., openai, anthropic, etc.) - Default: "openai"
--model: Main generation model name (e.g., o3-2025-04-16) - Default: "o3-2025-04-16"
--ground_provider: The provider for the grounding model - Required
--ground_url: The URL of the grounding model - Required
--ground_model: The model name for the grounding model - Required
--grounding_width: Width of the output coordinate resolution from the grounding model - Required
--grounding_height: Height of the output coordinate resolution from the grounding model - Required

Grounding Model Dimensions

The grounding width and height should match the output coordinate resolution of your grounding model:

UI-TARS-1.5-7B: Use --grounding_width 1920 --grounding_height 1080
UI-TARS-72B: Use --grounding_width 1000 --grounding_height 1000

Optional Parameters

--model_url: Custom API URL for main generation model - Default: ""
--model_api_key: API key for main generation model - Default: ""
--ground_api_key: API key for grounding model endpoint - Default: ""
--max_trajectory_length: Maximum number of image turns to keep in trajectory - Default: 8
--enable_reflection: Enable reflection agent to assist the worker agent - Default: True

`gui_agents` SDK

First, we import the necessary modules. AgentS2_5 is the main agent class for Agent S2.5. OSWorldACI is our grounding agent that translates agent actions into executable python code.

import pyautogui
import io
from gui_agents.s2_5.agents.agent_s import AgentS2_5
from gui_agents.s2_5.agents.grounding import OSWorldACI

# Load in your API keys.
from dotenv import load_dotenv
load_dotenv()

current_platform = "linux"  # "darwin", "windows"

Next, we define our engine parameters. engine_params is used for the main agent, and engine_params_for_grounding is for grounding. For engine_params_for_grounding, we support custom endpoints like HuggingFace TGI, vLLM, and Open Router.

engine_params = {
  "engine_type": provider,
  "model": model,
  "base_url": model_url,     # Optional
  "api_key": model_api_key,  # Optional
}

# Load the grounding engine from a custom endpoint
ground_provider = "<your_ground_provider>"
ground_url = "<your_ground_url>"
ground_model = "<your_ground_model>"
ground_api_key = "<your_ground_api_key>"

# Set grounding dimensions based on your model's output coordinate resolution
# UI-TARS-1.5-7B: grounding_width=1920, grounding_height=1080
# UI-TARS-72B: grounding_width=1000, grounding_height=1000
grounding_width = 1920  # Width of output coordinate resolution
grounding_height = 1080  # Height of output coordinate resolution

engine_params_for_grounding = {
  "engine_type": ground_provider,
  "model": ground_model,
  "base_url": ground_url,
  "api_key": ground_api_key,  # Optional
  "grounding_width": grounding_width,
  "grounding_height": grounding_height,
}

Then, we define our grounding agent and Agent S2.5.

grounding_agent = OSWorldACI(
    platform=current_platform,
    engine_params_for_generation=engine_params,
    engine_params_for_grounding=engine_params_for_grounding,
    width=1920,  # Optional: screen width
    height=1080  # Optional: screen height
)

agent = AgentS2_5(
    engine_params,
    grounding_agent,
    platform=current_platform,
    max_trajectory_length=8,  # Optional: maximum image turns to keep
    enable_reflection=True     # Optional: enable reflection agent
)

Finally, let's query the agent!

# Get screenshot.
screenshot = pyautogui.screenshot()
buffered = io.BytesIO() 
screenshot.save(buffered, format="PNG")
screenshot_bytes = buffered.getvalue()

obs = {
  "screenshot": screenshot_bytes,
}

instruction = "Close VS Code"
info, action = agent.predict(instruction=instruction, observation=obs)

exec(action[0])

Refer to gui_agents/s2_5/cli_app.py for more details on how the inference loop works.

OSWorld

To deploy Agent S2.5 in OSWorld, follow the OSWorld Deployment instructions.

💬 Citations

If you find this codebase useful, please cite:

@misc{Agent-S2,
      title={Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents}, 
      author={Saaket Agashe and Kyle Wong and Vincent Tu and Jiachen Yang and Ang Li and Xin Eric Wang},
      year={2025},
      eprint={2504.00906},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.00906}, 
}

@inproceedings{Agent-S,
    title={{Agent S: An Open Agentic Framework that Uses Computers Like a Human}},
    author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2025},
    url={https://arxiv.org/abs/2410.08164}
}

Name		Name	Last commit message	Last commit date
Latest commit History 302 Commits
.github/workflows		.github/workflows
evaluation_sets		evaluation_sets
gui_agents		gui_agents
images		images
osworld_setup		osworld_setup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
WAA_setup.md		WAA_setup.md
models.md		models.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent S: Use Computer Like a Human

🥳 Updates

Table of Contents

💡 Introduction

🎯 Current Results

🛠️ Installation & Setup

Prerequisites

Installation

API Configuration

Option 1: Environment Variables

Option 2: Python Script

Supported Models

Grounding Models (Required)

🚀 Usage

CLI

Required Parameters

Grounding Model Dimensions

Optional Parameters

`gui_agents` SDK

OSWorld

💬 Citations

Star History

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 16

Uh oh!

Languages

License

simular-ai/Agent-S

Folders and files

Latest commit

History

Repository files navigation

Agent S: Use Computer Like a Human

🥳 Updates

Table of Contents

💡 Introduction

🎯 Current Results

🛠️ Installation & Setup

Prerequisites

Installation

API Configuration

Option 1: Environment Variables

Option 2: Python Script

Supported Models

Grounding Models (Required)

🚀 Usage

CLI

Required Parameters

Grounding Model Dimensions

Optional Parameters

gui_agents SDK

OSWorld

💬 Citations

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 16

Uh oh!

Languages

`gui_agents` SDK

Packages