Skip to content

Experimental multi-agent CPD activities scraper using Google Agent Development Kit (ADK) with Claude AI and Playwright web automation

Notifications You must be signed in to change notification settings

bayshanntech/experiment_activities_scraping_googleadk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPD Activity Scraper - Multi-Agent System

🏴‍☠️ A sophisticated multi-agent system built with Google Agent Development Kit (ADK) that scrapes CPD (Continuing Professional Development) activities from websites using coordinated AI agents.

Architecture

The system employs a multi-agent architecture with specialized agents:

  • Lead Agent: Orchestrates the entire workflow and coordinates between agents
  • Browser Agent: Uses Playwright to navigate websites and extract content
  • Investigator Agent: Analyzes content to identify and extract CPD activities
  • Formatter Agent: Structures extracted data into JSON format
  • File Writer Agent: Saves results to activities.json

Features

  • 🤖 Multi-Agent Coordination: Non-sequential workflow where agents decide the next steps based on state
  • 🌐 Web Scraping: Robust web scraping with Playwright (headless Chrome)
  • 🔍 Intelligent Content Analysis: AI-powered content analysis to identify CPD activities
  • 📊 Structured Output: Clean JSON output with all required CPD activity fields
  • Error Handling: Comprehensive error handling with timeouts and fallbacks
  • 🧪 Integration Testing: Full end-to-end testing with real websites

Requirements

  • JDK 17 or higher (tested with JDK 22)
  • Gradle 8.10+
  • ANTHROPIC_API_KEY environment variable

Quick Start

  1. Set up your API key:

    export ANTHROPIC_API_KEY="your-claude-api-key-here"
  2. Run the scraper:

    make run https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents

    Or using Gradle directly:

    ./gradlew runCpd -Purl="https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents"
  3. Check results:

    cat activities.json

Data Structure

The system extracts CPD activities with the following structure:

{
  "activities": [
    {
      "title": "Activity Title",
      "dateTime": "26 Sept 2025",
      "location": "Kildare St, Dublin",
      "cpdCredits": "6",
      "summary": "Brief description of the activity",
      "format": "conference|webinar|workshop|course",
      "speakers": ["Dr. Speaker Name", "Prof. Another Speaker"]
    }
  ]
}

Tested Websites

The system has been tested and validated with:

  • RCPI (Royal College of Physicians of Ireland): https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents
  • Irish Psychiatry: https://irishpsychiatry.ie/all-events/

Testing

Run the comprehensive integration test:

./gradlew test

The integration test validates:

  • Valid JSON output structure
  • Expected number of activities (8 for RCPI)
  • Specific activity details including dates, credits, location, and speakers

Build and Development

# Build the project
make build

# Run tests
make test

# Clean build artifacts
make clean

Technical Details

  • Framework: Google Agent Development Kit (ADK) 0.2.0
  • Language: Kotlin 2.1.0
  • Web Automation: Microsoft Playwright 1.40.0
  • AI Model: Claude 3.5 Sonnet via Anthropic API
  • JSON Processing: Kotlinx Serialization

Agent Communication

Agents communicate using structured JSON messages:

{
  "from": "agent_name",
  "to": "agent_name", 
  "action": "action_type",
  "data": {...},
  "state": "current_state"
}

States: startingbrowsinginvestigatingformattingwritingcomplete

Error Handling

  • Browser Timeouts: 30-second page load timeout with 5-second content wait
  • Missing Data: Uses "N/A" or null for missing fields rather than failing
  • Dynamic Content: Waits for content to load before extraction
  • Network Issues: Graceful degradation with meaningful error messages

Configuration

The system uses environment variables for configuration:

  • ANTHROPIC_API_KEY: Required - Your Claude API key
  • Additional configuration can be added to src/main/resources/application.conf

Known Issues - Google ADK Bug

⚠️ Critical Bug Identified: Google ADK 0.2.0 + Anthropic Java SDK Compatibility Issue

Bug Description

We have identified a critical compatibility issue between Google ADK version 0.2.0 and the Anthropic Java SDK (tested with versions 1.0.0, 2.1.0, and 2.5.1):

Error: NullPointerException: Parameter specified as non-null is null: method com.anthropic.models.messages.MessageCreateParams$Builder.toolChoice, parameter toolChoice

Location: com.google.adk.models.Claude.generateContent(Claude.java:127)

Root Cause Analysis

  • Simple agents without tools: Work perfectly with Claude
  • Agents with tools/functions: Trigger the toolChoice NullPointerException
  • The Google ADK Claude wrapper attempts to set a toolChoice parameter but passes null instead of a proper tool choice object
  • This affects any LlmAgent that has .tools() configured

Workaround Implemented

This system has been redesigned to avoid tools entirely:

  1. Direct browser operations (bypassing FunctionTool)
  2. Simple Claude agents without tool definitions
  3. Manual coordination instead of tool-based agent delegation

Bug Verification

The issue is reproducible and documented in our test suite at /experiment_adk_agentcore_jvm/src/test/kotlin/com/example/agent/ClaudeIntegrationTest.kt (line 166-168).

Recommendation

  • Use Google ADK with Gemini models (which work with tools)
  • OR use Claude models only for simple agents without tools
  • OR wait for a fix in Google ADK or Anthropic SDK compatibility

Development Notes

This implementation demonstrates:

  • Multi-agent coordination without tool dependencies
  • ADK framework capabilities with Claude models
  • Workarounds for ADK/Anthropic compatibility issues
  • State-based decision making without function tools

The system serves as both a working CPD scraper and a case study in handling ADK framework limitations.


Built with ⚓ by Captain Blackbeard McCode and the ADK crew

About

Experimental multi-agent CPD activities scraper using Google Agent Development Kit (ADK) with Claude AI and Playwright web automation

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •