CPD Activity Scraper - Multi-Agent System

🏴‍☠️ A sophisticated multi-agent system built with Google Agent Development Kit (ADK) that scrapes CPD (Continuing Professional Development) activities from websites using coordinated AI agents.

Architecture

The system employs a multi-agent architecture with specialized agents:

Lead Agent: Orchestrates the entire workflow and coordinates between agents
Browser Agent: Uses Playwright to navigate websites and extract content
Investigator Agent: Analyzes content to identify and extract CPD activities
Formatter Agent: Structures extracted data into JSON format
File Writer Agent: Saves results to activities.json

Features

🤖 Multi-Agent Coordination: Non-sequential workflow where agents decide the next steps based on state
🌐 Web Scraping: Robust web scraping with Playwright (headless Chrome)
🔍 Intelligent Content Analysis: AI-powered content analysis to identify CPD activities
📊 Structured Output: Clean JSON output with all required CPD activity fields
⚡ Error Handling: Comprehensive error handling with timeouts and fallbacks
🧪 Integration Testing: Full end-to-end testing with real websites

Requirements

JDK 17 or higher (tested with JDK 22)
Gradle 8.10+
ANTHROPIC_API_KEY environment variable

Quick Start

Set up your API key:

export ANTHROPIC_API_KEY="your-claude-api-key-here"

Run the scraper:

make run https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents

Or using Gradle directly:

./gradlew runCpd -Purl="https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents"

Check results:
```
cat activities.json
```

Data Structure

The system extracts CPD activities with the following structure:

{
  "activities": [
    {
      "title": "Activity Title",
      "dateTime": "26 Sept 2025",
      "location": "Kildare St, Dublin",
      "cpdCredits": "6",
      "summary": "Brief description of the activity",
      "format": "conference|webinar|workshop|course",
      "speakers": ["Dr. Speaker Name", "Prof. Another Speaker"]
    }
  ]
}

Tested Websites

The system has been tested and validated with:

✅ RCPI (Royal College of Physicians of Ireland): https://www.rcpi.ie/Calendar/calendar?Type=CPDEvents
✅ Irish Psychiatry: https://irishpsychiatry.ie/all-events/

Testing

Run the comprehensive integration test:

./gradlew test

The integration test validates:

Valid JSON output structure
Expected number of activities (8 for RCPI)
Specific activity details including dates, credits, location, and speakers

Build and Development

# Build the project
make build

# Run tests
make test

# Clean build artifacts
make clean

Technical Details

Framework: Google Agent Development Kit (ADK) 0.2.0
Language: Kotlin 2.1.0
Web Automation: Microsoft Playwright 1.40.0
AI Model: Claude 3.5 Sonnet via Anthropic API
JSON Processing: Kotlinx Serialization

Agent Communication

Agents communicate using structured JSON messages:

{
  "from": "agent_name",
  "to": "agent_name", 
  "action": "action_type",
  "data": {...},
  "state": "current_state"
}

States: starting → browsing → investigating → formatting → writing → complete

Error Handling

Browser Timeouts: 30-second page load timeout with 5-second content wait
Missing Data: Uses "N/A" or null for missing fields rather than failing
Dynamic Content: Waits for content to load before extraction
Network Issues: Graceful degradation with meaningful error messages

Configuration

The system uses environment variables for configuration:

ANTHROPIC_API_KEY: Required - Your Claude API key
Additional configuration can be added to src/main/resources/application.conf

Known Issues - Google ADK Bug

⚠️ Critical Bug Identified: Google ADK 0.2.0 + Anthropic Java SDK Compatibility Issue

Bug Description

We have identified a critical compatibility issue between Google ADK version 0.2.0 and the Anthropic Java SDK (tested with versions 1.0.0, 2.1.0, and 2.5.1):

Error: NullPointerException: Parameter specified as non-null is null: method com.anthropic.models.messages.MessageCreateParams$Builder.toolChoice, parameter toolChoice

Location: com.google.adk.models.Claude.generateContent(Claude.java:127)

Root Cause Analysis

Simple agents without tools: Work perfectly with Claude
Agents with tools/functions: Trigger the toolChoice NullPointerException
The Google ADK Claude wrapper attempts to set a toolChoice parameter but passes null instead of a proper tool choice object
This affects any LlmAgent that has .tools() configured

Workaround Implemented

This system has been redesigned to avoid tools entirely:

Direct browser operations (bypassing FunctionTool)
Simple Claude agents without tool definitions
Manual coordination instead of tool-based agent delegation

Bug Verification

The issue is reproducible and documented in our test suite at /experiment_adk_agentcore_jvm/src/test/kotlin/com/example/agent/ClaudeIntegrationTest.kt (line 166-168).

Recommendation

Use Google ADK with Gemini models (which work with tools)
OR use Claude models only for simple agents without tools
OR wait for a fix in Google ADK or Anthropic SDK compatibility

Development Notes

This implementation demonstrates:

Multi-agent coordination without tool dependencies
ADK framework capabilities with Claude models
Workarounds for ADK/Anthropic compatibility issues
State-based decision making without function tools

The system serves as both a working CPD scraper and a case study in handling ADK framework limitations.

Built with ⚓ by Captain Blackbeard McCode and the ADK crew

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPD Activity Scraper - Multi-Agent System

Architecture

Features

Requirements

Quick Start

Data Structure

Tested Websites

Testing

Build and Development

Technical Details

Agent Communication

Error Handling

Configuration

Known Issues - Google ADK Bug

Bug Description

Root Cause Analysis

Workaround Implemented

Bug Verification

Recommendation

Development Notes

About

Uh oh!

Contributors 2

Uh oh!

Languages

bayshanntech/experiment_activities_scraping_googleadk

Folders and files

Latest commit

History

Repository files navigation

CPD Activity Scraper - Multi-Agent System

Architecture

Features

Requirements

Quick Start

Data Structure

Tested Websites

Testing

Build and Development

Technical Details

Agent Communication

Error Handling

Configuration

Known Issues - Google ADK Bug

Bug Description

Root Cause Analysis

Workaround Implemented

Bug Verification

Recommendation

Development Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages