Inspired by the paper "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".
Alita is an intelligent meta-agent system that automatically invents Python scripts as tools to solve complex tasks through an iterative CodeReAct (Code Reasoning and Acting) loop. The system can analyze natural language requirements, detect capability gaps, search for external resources, generate and register executable code, and manage isolated execution environments between tasks.
- Intelligent Task Analysis: Uses LLM-powered brainstorming to analyze tasks and detect capability gaps
- Dynamic Code Generation: Automatically generates self-contained Python scripts based on task specifications
- Isolated Execution Environment: Creates and manages Conda environments for safe script execution
- External Resource Integration: Searches and incorporates web resources when needed
- Iterative Refinement: Learns from execution failures and refines solutions automatically
- MCP Registry: Stores and reuses successful Model Context Protocols (MCPs)
- Comprehensive Benchmarking: Supports evaluation on GAIA, MathVista, and PathVQA datasets
Alita consists of several core components:
- ManagerAgent: Central coordinator that orchestrates the entire pipeline
- MCPBrainstorm: Analyzes tasks and generates tool specifications using LLM
- ResearchAgent: Performs intelligent information retrieval using LangGraph and MCP tools
- ScriptGenerator: Generates executable Python scripts from specifications
- CodeRunner: Executes scripts in isolated Conda environments
- EnvironmentManager: Manages Conda environment creation and dependency installation
- MCPRegistry: Persistent storage for successful Model Context Protocols
- Benchmark: Evaluation framework for multiple datasets
flowchart TD
A["π― Input Task"] --> B["π§ ManagerAgent.orchestrate()"]
B --> C["π MCPBrainstorm.brainstorm()"]
C --> D{"π Capability Gap Detected?"}
D -->|Yes| E["π ResearchAgent.search()"]
D -->|No| F["π ScriptGenerator.generate_script()"]
E --> G["π ResearchAgent.retrieve()"]
G --> H["π Collect External Resources"]
H --> F
F --> I["ποΈ EnvironmentManager.create_environment()"]
I --> J["π¦ EnvironmentManager.install_dependencies()"]
J --> K["βΆοΈ CodeRunner.run_script()"]
K --> L{"β
Execution Successful?"}
L -->|Yes| M["πΎ MCPRegistry.register_mcp()"]
L -->|No| N{"π Max Iterations Reached?"}
N -->|No| O["π Update Context with Error"]
O --> C
N -->|Yes| P["β Return Failure"]
M --> Q["β¨ Return Success Result"]
style A fill:#e1f5fe
style B fill:#f3e5f5
style C fill:#fff3e0
style E fill:#e8f5e8
style F fill:#fff8e1
style K fill:#fce4ec
style M fill:#e0f2f1
style Q fill:#e8f5e8
style P fill:#ffebee
- Task Analysis: Analyze input task and detect capability gaps
- Resource Gathering: Search external resources if gaps are detected
- Script Generation: Generate self-contained Python script
- Environment Setup: Create isolated Conda environment with dependencies
- Execution: Run script and capture output
- Registration: Store successful scripts as reusable MCPs
- Iteration: Refine based on feedback if execution fails
- Python 3.8+
- Conda package manager
- OpenAI API access
- Required Python packages (see installation section)
-
Clone the repository:
git clone <repository-url> cd Alita_repo
-
Install dependencies:
pip install -r requirements.txt
Required packages:
openaiexa-pyrequestspyyamlconda(system package)
-
Set up configuration:
- Copy
config.yaml.exampletoconfig.yamland update the API keys:
api: openai_api_key: "your-actual-openai-api-key-here" anthropic_api_key: "your-actual-anthropic-api-key-here" # If using Anthropic models exa: exa_api_key: "your-actual-exa-api-key-here"
- Copy
The system is configured through config.yaml. Key configuration sections:
agent:
primary_llm: "gpt-4o" # Primary LLM model
secondary_llm: "gpt-4o-mini" # Secondary model
mcp_prompt_template: "templates/mcp_prompt.txt"
script_gen_prompt_template: "templates/script_template.txt"environment:
conda_base_env: "base" # Base Conda environment
env_prefix: "alita_env_" # Environment name prefix
dependency_timeout: 300 # Installation timeout (seconds)api:
openai_api_key: "<YOUR_OPENAI_API_KEY_HERE>" # OpenAI API key
openai_api_url: "https://api.openai.com/v1" # OpenAI API endpoint
anthropic_api_key: "<YOUR_ANTHROPIC_API_KEY_HERE>" # Anthropic API key
anthropic_base_url: "https://api.anthropic.com" # Anthropic API endpoint (optional)
exa:
exa_api_key: "<YOUR_EXA_API_KEY_HERE>" # Exa API key for semantic searchbenchmark:
gaia:
dataset_path: "data/gaia.json"
mathvista:
sample_size: 100
dataset_path: "data/mathvista.json"
pathvqa:
sample_size: 100
dataset_path: "data/pathvqa.json"Run Alita on a single natural language task:
python app.pySet the experiment mode in config.yaml:
misc:
experiment_mode: "single_task"Then enter your task when prompted:
Enter a natural language query/task: Calculate the fibonacci sequence up to 100
Run evaluation on benchmark datasets:
python app.pySet the experiment mode in config.yaml:
misc:
experiment_mode: "benchmark"This will evaluate the system on GAIA, MathVista, and PathVQA datasets and output metrics including pass@1 and pass@3 scores.
from manager_agent import ManagerAgent
from utils import get_global_config
# Load configuration
config = get_global_config("config.yaml")
# Initialize the agent
manager = ManagerAgent(config)
# Process a task
result = manager.orchestrate("Create a function to sort a list of numbers")
print(result)Alita_repo/
βββ app.py # Main entry point
βββ config.yaml # Configuration file
βββ manager_agent.py # Central coordinator
βββ mcp_brainstorm.py # Task analysis module
βββ web_agent.py # Web search and navigation
βββ script_generator.py # Code generation module
βββ code_runner.py # Script execution module
βββ env_manager.py # Environment management
βββ mcp_registry.py # MCP storage and retrieval
βββ benchmark.py # Evaluation framework
βββ utils.py # Shared utilities
βββ templates/ # Prompt templates (create this directory)
β βββ mcp_prompt.txt
β βββ script_template.txt
βββ data/ # Dataset files (create this directory)
β βββ gaia.json
β βββ mathvista.json
β βββ pathvqa.json
βββ logs/ # Log files (auto-created)
βββ alita.log
Create the following template files in the templates/ directory:
Analyze the following task and determine if there are any capability gaps:
Task: {task}
Context: {context}
Respond with a JSON object containing:
- "capability_gap": boolean indicating if new tools are needed
- "mcp_spec": detailed specification if gap exists
- "dependencies": list of required Python packages
- "search_query": query for external resource search
Generate a complete, self-contained Python script for the following task:
Task: {task_description}
Specification: {tool_spec}
External Resources: {external_context}
The script should:
1. Include all necessary imports
2. Be executable without external files
3. Handle errors gracefully
4. Print clear output
The system supports comprehensive evaluation with the following metrics:
- Pass@1: Success rate on first attempt
- Pass@3: Success rate within 3 attempts
- Dataset-specific metrics:
- GAIA: Breakdown by difficulty levels (Level 1, 2, 3)
- MathVista: Mathematical reasoning accuracy
- PathVQA: Medical image question answering accuracy
Logs are automatically generated in logs/alita.log. Configure logging level in config.yaml:
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR
log_file: "logs/alita.log"This project is inspired by the Alita project by CharlesQ9 and the concepts presented in the research paper "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".
Original Alita Project: CharlesQ9/Alita on GitHub Research Paper: Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (arXiv:2505.20286) Full credits to the authors and contributors of these works for the foundational architecture and ideas.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the LLM API
- The research community for benchmark datasets (GAIA, MathVista, PathVQA)
- Contributors and maintainers of the open-source libraries used
For questions, issues, or contributions, please:
- Open an issue on GitHub
- Check the logs in
logs/alita.logfor debugging - Ensure your OpenAI API key is properly configured
- Verify Conda is installed and accessible