[!] EXPERIMENTAL SOFTWARE - USE ONLY IN AUTHORIZED, SAFE, SANDBOXED ENVIRONMENTS [!]
Cyber-AutoAgent is a proactive security assessment tool that autonomously conducts intelligent penetration testing with natural language reasoning, dynamic tool selection, and evidence collection using AWS Bedrock, Litellm or local Ollama models with the core Strands framework.
- Important Disclaimer
- Quick Start
- Documentation
- Features
- Architecture
- Model Providers
- Observability
- Installation & Deployment
- Configuration
- Development & Testing
- Troubleshooting
- Contributing
- License
THIS TOOL IS FOR EDUCATIONAL AND AUTHORIZED SECURITY TESTING PURPOSES ONLY.
| Requirement | Description | 
|---|---|
| Authorization | Use only on systems you own or have explicit written permission to test | 
| Legal Compliance | Ensure compliance with all applicable laws and regulations | 
| Sandboxed Environment | Deploy in safe, sandboxed environments isolated from production systems | 
| Responsibility | Never use on unauthorized systems or networks - users are fully responsible for legal and ethical use | 
The React-based terminal interface is now the default UI, providing interactive configuration, real-time operation monitoring, and guided setup in all deployment modes.
# Clone and setup
git clone https://github.com/westonbrown/Cyber-AutoAgent.git
cd Cyber-AutoAgent
# Build React terminal interface
cd src/modules/interfaces/react
npm install
npm run build
# Run interactive terminal (guided setup on first launch)
npm start
# Or run directly with parameters
node dist/index.js \
  --target "http://testphp.vulnweb.com" \
  --objective "Security assessment" \
  --auto-runThe React terminal will automatically spawn the Python agent as a subprocess and guide you through configuration on first launch.
Complete User Guide - Detailed setup, configuration, operation modules, troubleshooting, and examples
# Interactive mode with React terminal
docker run -it --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent
# Or start directly with parameters
docker run -it --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent \
  --target "http://testphp.vulnweb.com" \
  --objective "Identify SQL injection vulnerabilities" \
  --auto-runSetup: Create .env file in project root with your configuration:
# Copy example and configure
cp .env.example .env
# Edit .env with your provider settingsExample .env for LiteLLM:
CYBER_AGENT_PROVIDER=litellm
CYBER_AGENT_LLM_MODEL=gemini/gemini-2.5-flash
GEMINI_API_KEY=your_api_key_hereRun assessments with the full observability stack:
# Run with React terminal UI and full observability
docker compose -f docker/docker-compose.yml run --rm cyber-autoagent
# With root access for dynamic tool installation
docker compose -f docker/docker-compose.yml run --user root --rm cyber-autoagentNote: The .env file in the project root is automatically loaded by docker-compose.
The compose stack automatically provides:
- Langfuse observability at http://localhost:3000 (login: [email protected] / changeme)
- Persistent databases (PostgreSQL, ClickHouse, Redis, MinIO)
- Network access to challenge containers
- React terminal UI as the default interface
- User Guide - Complete usage, configuration, and examples
- Agent Architecture - Strands framework, tools, and metacognitive design
- Memory System - Mem0 backends, storage, and evidence management
- Observability & Evaluation - Langfuse tracing, Ragas metrics, and performance monitoring
- Deployment Guide - Docker, Kubernetes, and production setup
- Terminal Frontend - React interface architecture and event protocol
- Prompt Management - Module system and prompt loading
- Autonomous Operation: Conducts security assessments with minimal human intervention
- Intelligent Tool Selection: Automatically chooses appropriate security tools (nmap, sqlmap, nikto, etc.)
- Natural Language Reasoning: Uses Strands framework with metacognitive architecture
- Evidence Collection: Automatically stores findings with Mem0 memory (category="finding")
- Meta-Tool Creation: Dynamically creates custom exploitation tools when needed
- Adaptive Execution: Metacognitive assessment guides strategy based on confidence levels
- Assessment Reporting: Generates comprehensive reports with findings and remediation
- Swarm Intelligence: Deploy parallel agents with shared memory for complex tasks
- Real-Time Monitoring: React interface displays live agent reasoning and tool execution
- Observability: Built-in Langfuse tracing and Ragas evaluation metrics
Full Architecture Guide - Complete technical deep dive into Strands framework, tools, and metacognitive design
graph LR
    A[User Input<br/>Target & Objective] --> B[Cyber-AutoAgent]
    B --> C[AI Models<br/>Remote/Local]
    B --> D[Agent Tools<br/>shell, swarm, editor, etc.]
    B --> E[Evidence Storage<br/>Memory System]
    B --> O[Observability<br/>Langfuse + Ragas]
    C --> B
    D --> E
    E --> F[Final Report<br/>AI Generated]
    O --> G[Performance<br/>Metrics]
    style A fill:#e3f2fd
    style F fill:#e8f5e8
    style B fill:#f3e5f5
    style C fill:#fff3e0
    style O fill:#e1f5fe
    style G fill:#f1f8e9
    Key Components:
- User Interface: React-based terminal interface or command-line with target and objective specification
- Agent Core: Strands framework orchestration with metacognitive reasoning and tool selection
- AI Models: GenAI tool use models (AWS Bedrock remote) or local models (Ollama)
- Security Tools: Pentesting tools (nmap, sqlmap, nikto, metasploit, custom tools, etc.)
- Evidence Storage: Persistent memory with FAISS, OpenSearch, or Mem0 Platform backends
- Observability: Real-time tracing with Langfuse and automated evaluation with Ragas metrics
sequenceDiagram
    participant U as User
    participant A as Agent
    participant M as AI Model
    participant T as Tools
    participant E as Evidence
    participant L as Observability
    participant R as Evaluator
    U->>A: Start Assessment
    A->>L: Initialize Trace
    A->>E: Initialize Storage
    loop Assessment Steps
        A->>M: Analyze Situation
        M-->>A: Next Action
        A->>L: Log Decision
        A->>T: Execute Tool
        T-->>A: Results
        A->>L: Log Tool Execution
        A->>E: Store Findings
        A->>L: Log Evidence Storage
        alt Critical Discovery
            A->>T: Exploit Immediately
            T-->>A: Access Gained
            A->>E: Store Evidence
            A->>L: Log Exploitation
        end
        A->>A: Check Progress
        alt Success
            break Complete
                A->>U: Report Success
            end
        end
    end
    A->>M: Generate Report
    M-->>A: Final Analysis
    A->>L: Complete Trace
    A->>R: Trigger Evaluation
    R-->>L: Upload Scores
    A->>U: Deliver Report
    Enhanced Execution Pattern:
- Real-time Monitoring: Every action traced for complete visibility
- Intelligent Analysis: Agent continuously analyzes situation using metacognitive reasoning
- Dynamic Tool Selection: Chooses appropriate tools based on confidence and findings
- Evidence Collection: All discoveries stored in persistent memory with categorization
- Immediate Exploitation: Critical vulnerabilities trigger immediate exploitation attempts
- Automated Evaluation: System scores tool selection, evidence quality, and methodology
- Report Generation: Final analysis combines findings with performance metrics
flowchart TD
    A[Think: Analyze Current State] --> B{Select Tool Type}
    B --> |Basic Task| C[Shell Commands]
    B --> |Security Task| D[Cyber Tools via Shell]
    B --> |Complex Task| E[Create Meta-Tool]
    B --> |Parallel Task| P[Swarm Orchestration]
    C --> F[Reflect: Evaluate Results]
    D --> F
    E --> F
    P --> F
    F --> G{Findings?}
    G --> |Critical| H[Exploit Immediately]
    G --> |Informational| I[Store & Continue]
    G --> |None| J[Try Different Approach]
    H --> K[Document Evidence]
    I --> L{Objective Met?}
    J --> A
    K --> L
    L --> |Yes| M[Complete Assessment]
    L --> |No| A
    style A fill:#e3f2fd
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style E fill:#f3e5f5
    style P fill:#fce4ec
    style H fill:#ffcdd2
    Metacognitive Process:
Design Philosophy: Meta-Everything Architecture
At the core of Cyber-AutoAgent is a "meta-everything" design philosophy that enables dynamic adaptation and scaling:
- Meta-Agent: The swarm capability deploys dynamic agents as tools, each tailored for specific subtasks with their own reasoning loops
- Meta-Tooling: Through the editor and load_tool capabilities, the agent can create, modify, and deploy new tools at runtime to address novel challenges
- Meta-Learning: Continuous memory storage and retrieval enables cross-session learning, building expertise over time
- Meta-Cognition: Self-reflection and confidence assessment drives strategic decisions about tool selection and approach
This meta-architecture allows the system to transcend static tool limitations and evolve its capabilities during execution.
Process Flow:
- Assess Confidence: Evaluate current knowledge and confidence level (High >80%, Medium 50-80%, Low <50%)
- Adaptive Strategy:
- High confidence: Use specialized tools directly
- Medium confidence: Deploy swarm for parallel exploration
- Low confidence: Gather more information, try alternatives
 
- Execute: Tool hierarchy based on confidence:
- Specialized security tools for known vulnerabilities (sqlmap, nikto, nmap)
- Swarm deployment when multiple approaches needed (with memory access)
- Parallel shell for rapid reconnaissance (up to 7 commands)
- Meta-tool creation only when no existing tool suffices
 
- Learn & Store: Store findings with category="finding" for memory persistence
Tool Selection Hierarchy (Confidence-Based):
- Specialized cyber tools (sqlmap, nikto, metasploit) - when vulnerability type is known
- Swarm deployment - when confidence <70% or need multiple perspectives (includes memory)
- Parallel shell execution - for rapid multi-command reconnaissance
- Meta-tool creation - only for novel exploits when existing tools fail
Cyber-AutoAgent supports multiple model providers for maximum flexibility:
- Best for: Production use, high-quality results, no local GPU requirements
- Requirements: AWS account with Bedrock access
- Default Model: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929-v1:0)
- Benefits: Latest models, reliable performance, managed infrastructure
- Best for: Privacy, offline use, cost control, local development
- Requirements: Local Ollama installation
- Default Models: qwen3-coder:30b-a3b-q4_K_M(LLM),mxbai-embed-large(embeddings)
- Benefits: No cloud dependencies, complete privacy, no API costs
- Best for: Multi-provider flexibility, unified interface
- Requirements: API keys for desired providers
- Supported: 100+ models from OpenAI, Anthropic, Cohere, Google, Azure, etc.
- Benefits: Switch providers easily, fallback support, unified API
| Feature | Bedrock | Ollama | LiteLLM | 
|---|---|---|---|
| Cost | Pay per call | Free | Varies by provider | 
| Performance | High | Hardware dependent | Provider dependent | 
| Offline Use | No | Yes | No | 
| Setup | Easy | Higher | Medium | 
| Model Selection | 100+ models | Limited | 100+ models | 
Complete Observability & Evaluation Guide - Langfuse tracing, Ragas metrics, and automated performance evaluation
Cyber-AutoAgent includes built-in observability and evaluation using self-hosted Langfuse for tracing and Ragas for automated performance metrics. This provides complete visibility into agent operations and continuous assessment of cybersecurity effectiveness.
Observability (Langfuse):
- Complete operation traces: Every penetration test operation is traced end-to-end
- Tool execution timeline: Visual timeline of nmap, sqlmap, nikto usage
- Token usage metrics: Track LLM token consumption and costs
- Memory operations: Monitor agent memory storage and retrieval patterns
- Error tracking: Failed tool executions and error analysis
Evaluation (Ragas):
- Tool Selection Accuracy: How well the agent chooses appropriate cybersecurity tools
- Evidence Quality: Assessment of collected security findings and documentation
- Answer Relevancy: Alignment of agent actions with stated objectives
- Context Precision: Effective use of memory and tool outputs
- Automated scoring: Every operation receives performance metrics automatically
When running with Docker Compose, observability and evaluation are enabled by default:
# Start with observability and evaluation
cd docker
docker-compose up -d
# Access Langfuse UI at http://localhost:3000
# Login: [email protected] / changeme
# Enable evaluation for your operations
export ENABLE_AUTO_EVALUATION=trueEssential Environment Variables:
| Variable | Default | Description | 
|---|---|---|
| ENABLE_OBSERVABILITY | true | Enable/disable Langfuse tracing | 
| ENABLE_AUTO_EVALUATION | false | Enable automatic Ragas evaluation | 
| LANGFUSE_HOST | http://langfuse-web:3000 | Langfuse server URL | 
| RAGAS_EVALUATOR_MODEL | us.anthropic.claude-3-5-sonnet-20241022-v2:0 | Model for evaluation | 
The system automatically evaluates four key metrics after each operation:
- Tool Selection Accuracy (0.0-1.0): Strategic tool choice and sequencing
- Evidence Quality (0.0-1.0): Comprehensive vulnerability documentation
- Answer Relevancy (0.0-1.0): Alignment with security objectives
- Context Precision (0.0-1.0): Effective use of previous findings
For production deployments, update security keys:
LANGFUSE_ENCRYPTION_KEY=$(openssl rand -hex 32)
LANGFUSE_SALT=$(openssl rand -hex 16)
LANGFUSE_ADMIN_PASSWORD=strong-password-hereComplete Deployment Guide - Docker, Kubernetes, production setup, and troubleshooting
System Requirements
- Node.js: Version 20+ required for React CLI interface
- Python: Version 3.10+ for local installation
- Docker: For containerized deployments
- macOS Users: Xcode Command Line Tools required
React CLI Setup (Required for Interactive Mode)
# Ensure Node.js 20+ is installed
node --version  # Should show v20.x.x or higher
# macOS users: Update Xcode tools if needed
sudo softwareupdate --install -a
sudo xcode-select --reset
# Install React CLI dependencies
cd src/modules/interfaces/react
npm install
# If installation fails, clear caches:
npm cache clean --force
rm -rf ~/.node-gyp
npm installBedrock Provider
# Option 1: Configure AWS credentials
aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=your_region
# Option 2: Use AWS Bedrock API key (bearer token)
export AWS_BEARER_TOKEN_BEDROCK=your_bearer_token
export AWS_REGION=your_regionNote: Bearer token authentication is only supported with the native Bedrock provider. LiteLLM does not currently support AWS bearer tokens - use standard AWS credentials instead.
LiteLLM Provider (Universal Gateway)
LiteLLM supports 100+ model providers. Set the appropriate environment variables for your chosen provider:
# For OpenAI models (GPT-4, GPT-3.5, etc.)
export OPENAI_API_KEY=your_openai_key
# Usage: --model "openai/gpt-4"
# For Google models (Gemini)
export GEMINI_API_KEY=your_gemini_key
# Usage: --model "gemini/gemini-pro"
# For Azure OpenAI
export AZURE_API_KEY=your_azure_key
export AZURE_API_BASE=https://your-resource.openai.azure.com
export AZURE_API_VERSION=2024-02-15-preview
# Usage: --model "azure/your-deployment-name"
# For Hugging Face
export HUGGINGFACE_API_KEY=your_hf_key
# Usage: --model "huggingface/meta-llama/Llama-2-7b-chat-hf"Important: When using LiteLLM with Bedrock models, AWS bearer tokens (AWS_BEARER_TOKEN_BEDROCK) are NOT supported. Use standard AWS credentials only.
Ollama Provider
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start service and pull models
ollama serve
ollama pull qwen3-coder:30b-a3b-q4_K_M
ollama pull mxbai-embed-large# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
# Build image
docker build -f docker/Dockerfile -t cyber-autoagent .
# Using environment variables
docker run --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_BEARER_TOKEN_BEDROCK=${AWS_BEARER_TOKEN_BEDROCK} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/tools:/app/tools \
  cyber-autoagent \
  --target "x.x.x.x" \
  --objective "Identify vulnerabilities" \
  --iterations 50# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -e .
# Install React CLI interface
cd src/modules/interfaces/react
npm install
cd ../../../..
# Optional: Install security tools (non-exhaustive list)
sudo apt install nmap nikto sqlmap gobuster  # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster      # macOS
# Run with React CLI
cd src/modules/interfaces/react
npm start
# Or run Python directly
python src/cyberautoagent.py \
  --target "http://testphp.vulnweb.com" \
  --objective "Comprehensive security assessment"Unified Output Structure (default, enabled by CYBER_AGENT_ENABLE_UNIFIED_OUTPUT=true):
| Data Type | Location | 
|---|---|
| Evidence | ./outputs/<target>/OP_<id>/ | 
| Logs | ./outputs/<target>/OP_<id>/logs/ | 
| Reports | ./outputs/<target>/OP_<id>/ | 
| Tools | ./tools/ | 
| Utils | ./outputs/<target>/OP_<id>/utils/ | 
| Memory | ./outputs/<target>/memory/ | 
The unified structure organizes all artifacts under operation-specific directories with unique IDs (OP_YYYYMMDD_HHMMSS), making it easy to track and manage results from multiple assessment runs. All directories are created automatically.
Required Arguments:
- --objective: Security assessment objective
- --target: Target system/network to assess (ensure you have permission!)
Optional Arguments:
- --provider: Model provider -- bedrock(AWS),- ollama(local), or- litellm(universal), default: bedrock
- --module: Security module -- general(web apps) or- ctf(challenges), default: general
- --iterations: Maximum tool executions before stopping, default: 100
- --model: Model ID to use (default: remote=claude-sonnet-4-5, local=qwen3-coder:30b-a3b-q4_K_M)
- --region: AWS region for Bedrock, default: us-east-1
- --verbose: Enable verbose output with detailed debug logging
- --confirmations: Enable tool confirmation prompts (default: disabled)
- --memory-path: Path to existing memory store to load past memories
- --memory-mode: Memory initialization mode -- auto(loads existing) or- fresh(starts new), default: auto
- --keep-memory: Keep memory data after operation completes (default: true)
- --output-dir: Custom output directory (default: ./outputs)
# Basic Python Usage (Bedrock Provider)
python src/cyberautoagent.py \
  --target "http://testphp.vulnweb.com" \
  --objective "Find SQL injection vulnerabilities" \
  --provider bedrock \
  --iterations 50
# Using LiteLLM with OpenAI
export OPENAI_API_KEY=your_key
python src/cyberautoagent.py \
  --target "http://testphp.vulnweb.com" \
  --objective "Security assessment" \
  --provider litellm \
  --model "openai/gpt-4o"
# Docker with full observability, evaluation and root access (for package installation)
docker run --rm \
  --user root \
  --network cyber-autoagent_default \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -e LANGFUSE_HOST=http://langfuse-web:3000 \
  -e LANGFUSE_PUBLIC_KEY=cyber-public \
  -e LANGFUSE_SECRET_KEY=cyber-secret \
  -e ENABLE_AUTO_EVALUATION=true \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/tools:/app/tools \
  cyber-autoagent:dev \
  --target "http://testphp.vulnweb.com" \
  --objective "Comprehensive SQL injection and XSS assessment" \
  --iterations 25By default, the agent runs as a non-root user (cyberagent) for security. This limits the agent's ability to install additional tools on the fly during execution. If you need the agent to install packages dynamically, you can override this at container start:
# Small example, full command above
docker run --user root cyber-autoagentNote: Running as root reduces security isolation but enables full system access for tool installation.
The agent uses a centralized configuration system defined in src/modules/config/. All settings can be customized through environment variables, with sensible defaults provided.
Recent Improvements:
- Modular architecture with organized agents/, config/, tools/, prompts/, and evaluation/ directories
- Langfuse prompt management for dynamic prompt loading and versioning
- Unified output structure for better organization (enabled by default)
- Standardized paths for logs, reports, and evidence collection
- Enhanced memory management with cross-operation persistence
- Dedicated report agent for improved report generation
Copy the example environment file and customize it for your needs:
cp .env.example .envThe .env.example file contains detailed configuration options with inline comments for all supported features including model providers, memory systems, and observability settings. Key environment variables include:
- AWS_ACCESS_KEY_ID,- AWS_SECRET_ACCESS_KEY,- AWS_BEARER_TOKEN_BEDROCK,- AWS_REGIONfor remote mode (AWS Bedrock)
- OLLAMA_HOSTfor local mode (Ollama)
- CYBER_AGENT_OUTPUT_DIR,- CYBER_AGENT_ENABLE_UNIFIED_OUTPUTfor output management
- LANGFUSE_PUBLIC_KEY,- LANGFUSE_SECRET_KEYfor observability
- MEM0_API_KEYor- OPENSEARCH_HOSTfor memory backends
See .env.example for complete configuration options and usage examples.
This project uses uv for dependency management and testing:
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_agent.py
# Run tests with verbose output
uv run pytest -v
# Run tests with coverage
uv run pytest --cov=srccyber-autoagent/
├── src/                       # Source code
│   ├── cyberautoagent.py      # Main entry point and CLI
│   └── modules/               # Core modules (modular architecture)
│       ├── agents/            # Agent implementations
│       │   ├── cyber_autoagent.py  # Main Strands agent creation
│       │   └── report_agent.py     # Dedicated report generation
│       ├── config/            # Configuration management
│       │   ├── manager.py     # Centralized configuration system
│       │   └── environment.py # Environment setup and validation
│       ├── tools/             # Tool implementations
│       │   └── memory.py      # Mem0 memory management tool
│       ├── prompts/           # Prompt management
│       │   ├── system.py      # AI prompts and configurations
│       │   └── manager.py     # Langfuse prompt management
│       ├── evaluation/        # Evaluation system
│       │   └── evaluation.py  # Ragas evaluation metrics
│       ├── handlers/          # Callback handling and UI utilities
│       │   ├── base.py        # Base classes and constants
│       │   ├── callback.py    # Main ReasoningHandler class
│       │   ├── display.py     # Result display formatting
│       │   ├── tools.py       # Tool execution handling
│       │   ├── reporting.py   # Report generation utilities
│       │   └── utils.py       # UI utilities and analysis
│       ├── interfaces/
│       │   └── react/         # React terminal interface
│       └── operation_plugins/ # Security modules (general, ctf)
├── docs/                      # Documentation
│   ├── architecture.md       # Agent architecture and tools
│   ├── memory.md             # Memory system (Mem0 backends)
│   ├── observability.md      # Langfuse monitoring setup
│   └── deployment.md         # Docker and production deployment
├── docker/                   # Docker deployment files
│   ├── docker-compose.yml    # Full stack (agent + Langfuse)
│   └── Dockerfile            # Agent container build
├── pyproject.toml            # Dependencies and project config
├── uv.lock                   # Dependency lockfile
├── .env.example              # Environment configuration template
├── outputs/                  # Unified output directory (auto-created)
│   └── <target>/            # Target-specific organization
│       ├── OP_<id>/        # Operation-specific files
│       │   ├── report.md   # Security findings (when generated)
│       │   ├── cyber_operations.log  # Operation log
│       │   ├── artifacts/  # Ad-hoc files
│       │   └── tools/      # Custom tools created by agent
│       └── memory/         # Cross-operation memory
│           ├── mem0.faiss
│           └── mem0.pkl
└── README.md                 # This file
| File | Purpose | 
|---|---|
| src/cyberautoagent.py | CLI entry point, observability setup | 
| src/modules/agents/cyber_autoagent.py | Strands agent creation, model configuration | 
| src/modules/agents/report_agent.py | Report generation agent | 
| src/modules/config/manager.py | Centralized configuration system | 
| src/modules/tools/memory.py | Unified Mem0 tool (FAISS/OpenSearch/Platform) | 
| src/modules/evaluation/evaluation.py | Ragas evaluation system | 
| src/modules/prompts/system.py | AI prompts and configurations | 
| src/modules/prompts/manager.py | Langfuse prompt management | 
| .env.example | Environment configuration template | 
| docker/docker-compose.yml | Complete observability stack | 
| docker/Dockerfile | Agent container build | 
| docs/architecture.md | Technical architecture deep dive | 
Node.js Version
# Check Node.js version (must be 20+)
node --version
# Install Node.js 20+ via nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 20
nvm use 20macOS Build Errors
# Update Xcode Command Line Tools
sudo softwareupdate --install -a
sudo xcode-select --reset
# Clear npm and node-gyp caches
npm cache clean --force
rm -rf ~/.node-gyp
# Reinstall
cd src/modules/interfaces/react
npm installLinux Build Errors
# Install build essentials
sudo apt-get update
sudo apt-get install build-essential python3-dev
# Clear caches and reinstall
npm cache clean --force
cd src/modules/interfaces/react
npm install# Option 1: Configure AWS CLI
aws configure
# Option 2: Set traditional environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1
# Option 3: Use AWS Bedrock API key (bearer token)
export AWS_BEARER_TOKEN_BEDROCK=your_bearer_token
export AWS_REGION=us-east-1# Request model access in AWS Console
# Navigate to: Amazon Bedrock > Model access > Request model accessSee Memory System Guide for complete backend configuration and troubleshooting
# For local FAISS backend (default)
pip install faiss-cpu  # or faiss-gpu for CUDA
# For Mem0 Platform
export MEM0_API_KEY=your_api_key
# For OpenSearch backend
export OPENSEARCH_HOST=your_host
export AWS_REGION=your_region
# Check memory storage location
ls -la ./mem0_faiss_OP_*/# Install missing security tools
sudo apt install nmap nikto sqlmap gobuster  # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster      # macOSOllama Server Not Running
# Start Ollama service
ollama serve
# Check if running
curl http://localhost:11434/api/versionRequired Models Missing
# Pull required models
ollama pull qwen3-coder:30b-a3b-q4_K_M
ollama pull mxbai-embed-large
# List available models
ollama listConnection Errors
# Check Ollama is accessible
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-coder:30b-a3b-q4_K_M", "prompt": "test", "stream": false}'Docker Networking (Local Mode) Cyber-AutoAgent automatically detects the correct Ollama host for your environment:
# Ensure Ollama is running on your host
ollama serve
# Test connection from host
curl http://localhost:11434/api/versionPerformance Issues
# Monitor resource usage
htop  # Check CPU/Memory during execution
# For better performance, consider:
# - Using smaller models (e.g., llama3.1:8b instead of 70b)
# - Allocating more RAM to Ollama
# - Using GPU acceleration if available- Fork the repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is provided for educational and authorized security testing purposes only. Users are solely responsible for ensuring they have proper authorization before testing any systems. The authors assume no liability for misuse or any damages that may result from using this software.
- Strands Framework - Agent orchestration & swarm intelligence
- AWS Bedrock - Foundation model access
- Ollama - Local model inference
- Mem0 - Advanced memory management with FAISS/OpenSearch/Platform backends
Remember: With great power comes great responsibility. Use this tool ethically and legally.

