meta-pytorch
diff --git a/‎.flake8‎
Lines changed: 11 additions & 0 deletions b/‎.flake8‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 25 additions & 14 deletions b/‎.gitignore‎
Lines changed: 25 additions & 14 deletions
diff --git a/‎README.md‎
Lines changed: 195 additions & 2 deletions b/‎README.md‎
Lines changed: 195 additions & 2 deletions
diff --git a/‎example.py‎
Lines changed: 131 additions & 0 deletions b/‎example.py‎
Lines changed: 131 additions & 0 deletions
@@ -0,0 +1,11 @@
+[flake8]
+max-line-length = 79
+extend-ignore = E203,W503
+exclude =
+    .git,
+    __pycache__,
+    .venv,
+    venv,
+    build,
+    dist,
+    *.egg-info
@@ -52,7 +52,6 @@ coverage.xml
 # Virtual environments
 .env
 .venv
-env/
 venv/
 ENV/
 env.bak/
@@ -66,21 +65,33 @@ dmypy.json
 # Pyre type checker
 .pyre/
 
-# IDE
-.vscode/
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be added to the global gitignore or merged into this project gitignore.  For a PyCharm
+#  project, it is recommended to ignore the whole idea folder.
 .idea/
-*.swp
-*.swo
-*~
 
-# OS
+# VS Code
+.vscode/
+
+# macOS
 .DS_Store
-.DS_Store?
-._*
-.Spotlight-V100
-.Trashes
-ehthumbs.db
+
+# Windows
 Thumbs.db
+ehthumbs.db
+Desktop.ini
+
+# Docker
+.dockerignore
 
-# Logs
-*.log
+# Exclude anything containing "claude" (case-insensitive)
+*claude*
+*Claude*
+*CLAUDE*
@@ -1,2 +1,195 @@
-# envtorch
-An environment library for RL and beyond
+# EnvTorch: Agentic Execution Environments
+
+A unified framework for CodeAct environments that supports both agent execution and RL training, built on Gym/Gymnasium APIs with PyTorch/HuggingFace integration patterns.
+
+## Overview
+
+EnvTorch provides a standard for agentic execution environments following the CodeAct paradigm, where actions are arbitrary Python code that can chain multiple tool calls. The framework bridges traditional RL environments with modern agent capabilities.
+
+### Key Features
+
+- **CodeAct Execution**: Actions are Python code strings executed in persistent contexts
+- **State Persistence**: Variables and functions persist across steps within episodes
+- **Tool Integration**: MCP (Model Context Protocol) support for external capabilities
+- **RL Compatibility**: Transform system for reward computation and training
+- **Error Handling**: Exceptions become observations for agent learning
+- **Clean APIs**: Minimal, opinionous design following KISS principles
+
+## Quick Start
+
+```python
+from src import create_codeact_env, CodeAction
+
+# Create environment
+env = create_codeact_env()
+obs = env.reset()
+
+# Execute Python code
+action = CodeAction(code="""
+x = 10
+y = 20
+result = x * y
+print(f"Result: {result}")
+result  # Return value
+""")
+
+obs = env.step(action)
+print(f"Output: {obs.execution_result.stdout}")
+print(f"Return: {obs.execution_result.return_value}")
+```
+
+## Core Components
+
+### Actions and Observations
+
+```python
+# Actions contain arbitrary Python code
+action = CodeAction(code="math.sqrt(16)")
+
+# Observations include execution results
+obs = env.step(action)
+print(obs.execution_result.return_value)  # 4.0
+print(obs.execution_result.success)       # True
+print(obs.execution_result.stdout)        # Any print output
+```
+
+### Tool Integration
+
+```python
+from src import create_mcp_environment
+
+# Environment with MCP tools
+env = create_mcp_environment()
+obs = env.reset()
+
+# Tools available as Python objects
+action = CodeAction(code="""
+content = "Hello, world!"
+file_write("/tmp/hello.txt", content)
+result = file_read("/tmp/hello.txt")
+print(f"File contents: {result}")
+""")
+
+obs = env.step(action)
+```
+
+### RL Training with Transforms
+
+```python
+from src import create_math_env_transform
+
+# Environment that rewards correct math solutions
+transform = create_math_env_transform(expected_answer=42)
+env = create_codeact_env()
+env.transform = transform
+
+# Agent gets rewarded for correct answers
+action = CodeAction(code="21 * 2")  # Correct answer
+obs = env.step(action)
+print(obs.reward)  # 1.0 (success) + quality bonuses
+```
+
+## Architecture
+
+### Type System
+- `Action` / `CodeAction`: Base and concrete action types
+- `Observation` / `CodeObservation`: Base and concrete observation types
+- `State` / `CodeState`: Environment state with execution context
+- `ExecutionResult`: Detailed code execution results
+
+### Core Classes
+- `Environment`: Base class following Gym API
+- `CodeActEnvironment`: Main environment for code execution
+- `Transform`: Base class for observation modification
+- `ToolRegistry`: Manages available tools and functions
+
+### Transform Examples
+- `CodeSafetyTransform`: Penalizes unsafe code patterns
+- `MathProblemTransform`: Rewards correct numerical answers
+- `CodeQualityTransform`: Evaluates code quality metrics
+- `CompositeTransform`: Combines multiple transforms
+
+## File Structure
+
+```
+src/
+├── types.py          # Core type definitions
+├── interfaces.py     # Abstract base classes
+├── environment.py    # Main CodeAct environment
+├── transforms.py     # Transform implementations
+├── mcp.py           # MCP integration
+└── __init__.py      # Clean exports
+```
+
+## Usage Patterns
+
+### Agent Exploration
+```python
+env = create_codeact_env()
+obs = env.reset()
+
+# Multi-step problem solving
+action1 = CodeAction(code="data = [1, 2, 3, 4, 5]")
+obs = env.step(action1)
+
+action2 = CodeAction(code="mean = sum(data) / len(data); mean")
+obs = env.step(action2)  # Uses persistent data from step 1
+```
+
+### RL Training Loop
+```python
+# Create environment with reward function
+transform = create_safe_env_transform()
+env = create_codeact_env()
+env.transform = transform
+
+for episode in range(100):
+    obs = env.reset()
+    action = generate_action()  # From your policy
+    obs = env.step(action)
+
+    reward = obs.reward  # Computed by transforms
+    # Update policy based on reward
+```
+
+### Hybrid Agent + RL
+```python
+# Phase 1: Agent exploration
+env = create_codeact_env()
+# Agent explores different solution approaches
+
+# Phase 2: RL optimization
+env.transform = optimization_transform
+# Train to optimize based on exploration insights
+```
+
+## Design Principles
+
+- **KISS Approach**: Minimal, opinionated design
+- **Single Way**: One clear way to accomplish tasks
+- **Pythonic**: Follows PyTorch/HuggingFace patterns
+- **No Inline Comments**: Code should be self-explanatory
+- **Functional Composition**: Private functions explain complex logic
+
+## Testing
+
+Run the test suite:
+```bash
+python test_unified.py
+```
+
+Run examples:
+```bash
+python example.py
+```
+
+## Requirements
+
+See `requirements.txt` for dependencies. Core requirements:
+- Python 3.9+
+- PyTorch 2.0+
+- HuggingFace datasets
+
+## License
+
+BSD 3-Clause License (see LICENSE file)
@@ -0,0 +1,131 @@
+#!/usr/bin/env python3
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Simple example demonstrating EnvTorch environment usage.
+
+This shows the minimal steps to get started with code execution environments.
+"""
+
+from src import CodeAction, CodeExecutionEnvironment, CodingEnv, Transform
+
+
+def basic_code_execution_example():
+    """Basic example using CodeExecutionEnvironment."""
+    print("=== Basic Code Execution Example ===")
+
+    # Create basic code execution environment
+    env = CodeExecutionEnvironment()
+
+    print("Note: This example shows the interface but requires Docker to actually run")
+    print("Environment created successfully!")
+
+    # Create an action to calculate compound interest
+    action = CodeAction(
+        code="""
+# Calculate compound interest
+principal = 1000
+rate = 0.05
+time = 3
+
+final_amount = principal * (1 + rate) ** time
+interest_earned = final_amount - principal
+
+print(f"Principal: ${principal}")
+print(f"Rate: {rate*100}%")
+print(f"Time: {time} years")
+print(f"Final amount: ${final_amount:.2f}")
+print(f"Interest earned: ${interest_earned:.2f}")
+
+final_amount
+"""
+    )
+
+    print(f"Created action with code length: {len(action.code)} characters")
+    print()
+
+
+def coding_environment_example():
+    """Example using CodingEnv with safety and quality transforms."""
+    print("=== Coding Environment Example ===")
+
+    # Create coding environment with built-in transforms
+    env = CodingEnv()
+
+    print("CodingEnv created with safety and quality transforms!")
+    print("This environment includes:")
+    print("• Code safety checks")
+    print("• Code quality analysis")
+    print("• Composite transform system")
+
+    # Example of safe code
+    safe_action = CodeAction(
+        code="""
+# Safe mathematical calculation
+import math
+
+def calculate_fibonacci(n):
+    if n <= 1:
+        return n
+    return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)
+
+# Calculate first 10 Fibonacci numbers
+fib_sequence = [calculate_fibonacci(i) for i in range(10)]
+print(f"First 10 Fibonacci numbers: {fib_sequence}")
+fib_sequence
+"""
+    )
+
+    print(f"Created safe action with code length: {len(safe_action.code)} characters")
+    print()
+
+
+def transform_system_example():
+    """Example showing how to create custom transforms."""
+    print("=== Transform System Example ===")
+
+    # Example custom transform
+    class RewardTransform(Transform):
+        """Transform that adds rewards based on code execution results."""
+
+        def __call__(self, observation):
+            # This is just an example - actual implementation would need
+            # a proper observation object with execution results
+            print("Custom transform would analyze execution results here")
+            print("and add rewards based on success criteria")
+            return observation
+
+    transform = RewardTransform()
+    print("Created custom RewardTransform")
+
+    print("Transform system allows:")
+    print("• Chaining multiple transforms")
+    print("• Adding rewards for RL training")
+    print("• Custom observation processing")
+    print("• Safety and quality checks")
+    print()
+
+
+if __name__ == "__main__":
+    print("EnvTorch Environment Examples")
+    print("=" * 40)
+    print()
+
+    basic_code_execution_example()
+    coding_environment_example()
+    transform_system_example()
+
+    print("=" * 40)
+    print("Examples complete! 🎉")
+    print()
+    print("Key takeaways:")
+    print("• CodeAction(code='...') for arbitrary Python execution")
+    print("• CodeExecutionEnvironment provides base functionality")
+    print("• CodingEnv adds safety and quality transforms")
+    print("• Transform system enables customization and RL training")
+    print("• Docker integration provides sandboxed execution")
+    print("=" * 40)