Skip to content

Commit 1b6e3ff

Browse files
authored
Merge pull request #1 from facebookexternal/skeleton
Initial skeleton
2 parents fd3f040 + 54824d2 commit 1b6e3ff

File tree

17 files changed

+1273
-18
lines changed

17 files changed

+1273
-18
lines changed

.flake8

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[flake8]
2+
max-line-length = 79
3+
extend-ignore = E203,W503
4+
exclude =
5+
.git,
6+
__pycache__,
7+
.venv,
8+
venv,
9+
build,
10+
dist,
11+
*.egg-info

.gitignore

Lines changed: 25 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ coverage.xml
5252
# Virtual environments
5353
.env
5454
.venv
55-
env/
5655
venv/
5756
ENV/
5857
env.bak/
@@ -66,21 +65,33 @@ dmypy.json
6665
# Pyre type checker
6766
.pyre/
6867

69-
# IDE
70-
.vscode/
68+
# pytype static type analyzer
69+
.pytype/
70+
71+
# Cython debug symbols
72+
cython_debug/
73+
74+
# PyCharm
75+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
76+
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
77+
# project, it is recommended to ignore the whole idea folder.
7178
.idea/
72-
*.swp
73-
*.swo
74-
*~
7579

76-
# OS
80+
# VS Code
81+
.vscode/
82+
83+
# macOS
7784
.DS_Store
78-
.DS_Store?
79-
._*
80-
.Spotlight-V100
81-
.Trashes
82-
ehthumbs.db
85+
86+
# Windows
8387
Thumbs.db
88+
ehthumbs.db
89+
Desktop.ini
90+
91+
# Docker
92+
.dockerignore
8493

85-
# Logs
86-
*.log
94+
# Exclude anything containing "claude" (case-insensitive)
95+
*claude*
96+
*Claude*
97+
*CLAUDE*

README.md

Lines changed: 195 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,195 @@
1-
# envtorch
2-
An environment library for RL and beyond
1+
# EnvTorch: Agentic Execution Environments
2+
3+
A unified framework for CodeAct environments that supports both agent execution and RL training, built on Gym/Gymnasium APIs with PyTorch/HuggingFace integration patterns.
4+
5+
## Overview
6+
7+
EnvTorch provides a standard for agentic execution environments following the CodeAct paradigm, where actions are arbitrary Python code that can chain multiple tool calls. The framework bridges traditional RL environments with modern agent capabilities.
8+
9+
### Key Features
10+
11+
- **CodeAct Execution**: Actions are Python code strings executed in persistent contexts
12+
- **State Persistence**: Variables and functions persist across steps within episodes
13+
- **Tool Integration**: MCP (Model Context Protocol) support for external capabilities
14+
- **RL Compatibility**: Transform system for reward computation and training
15+
- **Error Handling**: Exceptions become observations for agent learning
16+
- **Clean APIs**: Minimal, opinionous design following KISS principles
17+
18+
## Quick Start
19+
20+
```python
21+
from src import create_codeact_env, CodeAction
22+
23+
# Create environment
24+
env = create_codeact_env()
25+
obs = env.reset()
26+
27+
# Execute Python code
28+
action = CodeAction(code="""
29+
x = 10
30+
y = 20
31+
result = x * y
32+
print(f"Result: {result}")
33+
result # Return value
34+
""")
35+
36+
obs = env.step(action)
37+
print(f"Output: {obs.execution_result.stdout}")
38+
print(f"Return: {obs.execution_result.return_value}")
39+
```
40+
41+
## Core Components
42+
43+
### Actions and Observations
44+
45+
```python
46+
# Actions contain arbitrary Python code
47+
action = CodeAction(code="math.sqrt(16)")
48+
49+
# Observations include execution results
50+
obs = env.step(action)
51+
print(obs.execution_result.return_value) # 4.0
52+
print(obs.execution_result.success) # True
53+
print(obs.execution_result.stdout) # Any print output
54+
```
55+
56+
### Tool Integration
57+
58+
```python
59+
from src import create_mcp_environment
60+
61+
# Environment with MCP tools
62+
env = create_mcp_environment()
63+
obs = env.reset()
64+
65+
# Tools available as Python objects
66+
action = CodeAction(code="""
67+
content = "Hello, world!"
68+
file_write("/tmp/hello.txt", content)
69+
result = file_read("/tmp/hello.txt")
70+
print(f"File contents: {result}")
71+
""")
72+
73+
obs = env.step(action)
74+
```
75+
76+
### RL Training with Transforms
77+
78+
```python
79+
from src import create_math_env_transform
80+
81+
# Environment that rewards correct math solutions
82+
transform = create_math_env_transform(expected_answer=42)
83+
env = create_codeact_env()
84+
env.transform = transform
85+
86+
# Agent gets rewarded for correct answers
87+
action = CodeAction(code="21 * 2") # Correct answer
88+
obs = env.step(action)
89+
print(obs.reward) # 1.0 (success) + quality bonuses
90+
```
91+
92+
## Architecture
93+
94+
### Type System
95+
- `Action` / `CodeAction`: Base and concrete action types
96+
- `Observation` / `CodeObservation`: Base and concrete observation types
97+
- `State` / `CodeState`: Environment state with execution context
98+
- `ExecutionResult`: Detailed code execution results
99+
100+
### Core Classes
101+
- `Environment`: Base class following Gym API
102+
- `CodeActEnvironment`: Main environment for code execution
103+
- `Transform`: Base class for observation modification
104+
- `ToolRegistry`: Manages available tools and functions
105+
106+
### Transform Examples
107+
- `CodeSafetyTransform`: Penalizes unsafe code patterns
108+
- `MathProblemTransform`: Rewards correct numerical answers
109+
- `CodeQualityTransform`: Evaluates code quality metrics
110+
- `CompositeTransform`: Combines multiple transforms
111+
112+
## File Structure
113+
114+
```
115+
src/
116+
├── types.py # Core type definitions
117+
├── interfaces.py # Abstract base classes
118+
├── environment.py # Main CodeAct environment
119+
├── transforms.py # Transform implementations
120+
├── mcp.py # MCP integration
121+
└── __init__.py # Clean exports
122+
```
123+
124+
## Usage Patterns
125+
126+
### Agent Exploration
127+
```python
128+
env = create_codeact_env()
129+
obs = env.reset()
130+
131+
# Multi-step problem solving
132+
action1 = CodeAction(code="data = [1, 2, 3, 4, 5]")
133+
obs = env.step(action1)
134+
135+
action2 = CodeAction(code="mean = sum(data) / len(data); mean")
136+
obs = env.step(action2) # Uses persistent data from step 1
137+
```
138+
139+
### RL Training Loop
140+
```python
141+
# Create environment with reward function
142+
transform = create_safe_env_transform()
143+
env = create_codeact_env()
144+
env.transform = transform
145+
146+
for episode in range(100):
147+
obs = env.reset()
148+
action = generate_action() # From your policy
149+
obs = env.step(action)
150+
151+
reward = obs.reward # Computed by transforms
152+
# Update policy based on reward
153+
```
154+
155+
### Hybrid Agent + RL
156+
```python
157+
# Phase 1: Agent exploration
158+
env = create_codeact_env()
159+
# Agent explores different solution approaches
160+
161+
# Phase 2: RL optimization
162+
env.transform = optimization_transform
163+
# Train to optimize based on exploration insights
164+
```
165+
166+
## Design Principles
167+
168+
- **KISS Approach**: Minimal, opinionated design
169+
- **Single Way**: One clear way to accomplish tasks
170+
- **Pythonic**: Follows PyTorch/HuggingFace patterns
171+
- **No Inline Comments**: Code should be self-explanatory
172+
- **Functional Composition**: Private functions explain complex logic
173+
174+
## Testing
175+
176+
Run the test suite:
177+
```bash
178+
python test_unified.py
179+
```
180+
181+
Run examples:
182+
```bash
183+
python example.py
184+
```
185+
186+
## Requirements
187+
188+
See `requirements.txt` for dependencies. Core requirements:
189+
- Python 3.9+
190+
- PyTorch 2.0+
191+
- HuggingFace datasets
192+
193+
## License
194+
195+
BSD 3-Clause License (see LICENSE file)

example.py

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
#!/usr/bin/env python3
2+
# Copyright (c) Meta Platforms, Inc. and affiliates.
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under the BSD-style license found in the
6+
# LICENSE file in the root directory of this source tree.
7+
8+
"""
9+
Simple example demonstrating EnvTorch environment usage.
10+
11+
This shows the minimal steps to get started with code execution environments.
12+
"""
13+
14+
from src import CodeAction, CodeExecutionEnvironment, CodingEnv, Transform
15+
16+
17+
def basic_code_execution_example():
18+
"""Basic example using CodeExecutionEnvironment."""
19+
print("=== Basic Code Execution Example ===")
20+
21+
# Create basic code execution environment
22+
env = CodeExecutionEnvironment()
23+
24+
print("Note: This example shows the interface but requires Docker to actually run")
25+
print("Environment created successfully!")
26+
27+
# Create an action to calculate compound interest
28+
action = CodeAction(
29+
code="""
30+
# Calculate compound interest
31+
principal = 1000
32+
rate = 0.05
33+
time = 3
34+
35+
final_amount = principal * (1 + rate) ** time
36+
interest_earned = final_amount - principal
37+
38+
print(f"Principal: ${principal}")
39+
print(f"Rate: {rate*100}%")
40+
print(f"Time: {time} years")
41+
print(f"Final amount: ${final_amount:.2f}")
42+
print(f"Interest earned: ${interest_earned:.2f}")
43+
44+
final_amount
45+
"""
46+
)
47+
48+
print(f"Created action with code length: {len(action.code)} characters")
49+
print()
50+
51+
52+
def coding_environment_example():
53+
"""Example using CodingEnv with safety and quality transforms."""
54+
print("=== Coding Environment Example ===")
55+
56+
# Create coding environment with built-in transforms
57+
env = CodingEnv()
58+
59+
print("CodingEnv created with safety and quality transforms!")
60+
print("This environment includes:")
61+
print("• Code safety checks")
62+
print("• Code quality analysis")
63+
print("• Composite transform system")
64+
65+
# Example of safe code
66+
safe_action = CodeAction(
67+
code="""
68+
# Safe mathematical calculation
69+
import math
70+
71+
def calculate_fibonacci(n):
72+
if n <= 1:
73+
return n
74+
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)
75+
76+
# Calculate first 10 Fibonacci numbers
77+
fib_sequence = [calculate_fibonacci(i) for i in range(10)]
78+
print(f"First 10 Fibonacci numbers: {fib_sequence}")
79+
fib_sequence
80+
"""
81+
)
82+
83+
print(f"Created safe action with code length: {len(safe_action.code)} characters")
84+
print()
85+
86+
87+
def transform_system_example():
88+
"""Example showing how to create custom transforms."""
89+
print("=== Transform System Example ===")
90+
91+
# Example custom transform
92+
class RewardTransform(Transform):
93+
"""Transform that adds rewards based on code execution results."""
94+
95+
def __call__(self, observation):
96+
# This is just an example - actual implementation would need
97+
# a proper observation object with execution results
98+
print("Custom transform would analyze execution results here")
99+
print("and add rewards based on success criteria")
100+
return observation
101+
102+
transform = RewardTransform()
103+
print("Created custom RewardTransform")
104+
105+
print("Transform system allows:")
106+
print("• Chaining multiple transforms")
107+
print("• Adding rewards for RL training")
108+
print("• Custom observation processing")
109+
print("• Safety and quality checks")
110+
print()
111+
112+
113+
if __name__ == "__main__":
114+
print("EnvTorch Environment Examples")
115+
print("=" * 40)
116+
print()
117+
118+
basic_code_execution_example()
119+
coding_environment_example()
120+
transform_system_example()
121+
122+
print("=" * 40)
123+
print("Examples complete! 🎉")
124+
print()
125+
print("Key takeaways:")
126+
print("• CodeAction(code='...') for arbitrary Python execution")
127+
print("• CodeExecutionEnvironment provides base functionality")
128+
print("• CodingEnv adds safety and quality transforms")
129+
print("• Transform system enables customization and RL training")
130+
print("• Docker integration provides sandboxed execution")
131+
print("=" * 40)

0 commit comments

Comments
 (0)