Skip to content

Conversation

meirk-brd
Copy link

Description

This commit adds a new Bright Data tool for scraping web content, taking screenshots, performing search queries, and extracting structured data from various websites and data feeds. The implementation follows the same pattern as other tools in the strands framework.

Key features:

  • Web scraping with Markdown output
  • Screenshot capture
  • Search engine queries with advanced parameters
  • Structured data extraction from various sources

Tests are included for all functionality.

Type of Change

  • Bug fix
  • New Tool
  • Breaking change
  • Other (please describe):

Testing

  • hatch fmt --linter
  • hatch fmt --formatter
  • hatch test --all

All of the above tests passed locally.

Checklist

  • I have read the CONTRIBUTING document

  • I have added tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published

  • By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

meirk-brd added 2 commits May 20, 2025 12:21
This commit adds a new Bright Data tool for scraping web content,
taking screenshots, performing search queries, and extracting structured
data from various websites. The implementation follows the same pattern
as other tools in the strands framework.

Key features:
- Web scraping with Markdown output
- Screenshot capture
- Search engine queries with advanced parameters
- Structured data extraction from various sources

Tests are included for all functionality.
@meirk-brd meirk-brd requested a review from a team as a code owner May 20, 2025 10:33
@meirk-brd
Copy link
Author

Hi @ryanycoleman ,

Can you please review this PR?
Let me know if something is missing or if any further additions are needed here.

Thanks!

@ryanycoleman ryanycoleman removed their assignment Jun 3, 2025
@jer96 jer96 temporarily deployed to manual-approval August 20, 2025 17:10 — with GitHub Actions Inactive
@jer96 jer96 enabled auto-merge (squash) August 20, 2025 17:35
@meirk-brd
Copy link
Author

@cagataycali can you please conduct a last review ? we want to move forward with posting some content around this integration.

Copy link
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely would love to have the tool implementation merged soon, left two small comments on implementation ^^

auto-merge was automatically disabled August 28, 2025 08:57

Head branch was pushed to by a user without write access

@meirk-brd
Copy link
Author

@cagataycali - Thank you for clarifying, all comments are addressed in the above commit 9d747b3

Let me know if there's anything to add or revise in order to merge.

Copy link
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested locally, looks amazing but during testing I hit some errors, left as comment on the code with suggestions

Pasted full version here as well: https://gist.github.com/cagataycali/83b166f8f31624bfbce22bea6d37e0b6

Hit two critical blockers;

  1. Zone parameter is not passed to constructor
  2. Available zone's are not visible to agent

After applying changes I started see success:

# can you try bright data to search strands agents
The user wants me to use the bright_data tool to search for "strands agents". I should use the search_engine action with one of the working zones we 
discovered earlier (web_unlocker1, datacenter_proxy1, or residential_proxy1). I'll use web_unlocker1 since that worked well in our previous tests.

 🔧 Starting bright_data...
[Bright Data] Request: https://www.google.com/search?q=strands%20agents&hl=en&gl=us&num=10bright_data completed in 2.93s
The bright_data tool worked perfectly! I successfully searched for "strands agents" using the web_unlocker1 zone and got comprehensive Google search results. The search returned great results showing:

1. **Official Strands Agents website** - strandsagents.com
2. **AWS blog post** introducing Strands Agents SDK
3. **GitHub repository** - strands-agents/sdk-python
4. **AWS documentation** on Strands Agents
5. **YouTube videos** from AWS Developers about Strands Agents
6. **Community posts** on Reddit and DEV.to
7. **Social media mentions** from AWS AI teams

The search results show that Strands Agents has good visibility and presence across multiple platforms - official sites, GitHub, AWS documentation, YouTube tutorials, community discussions, and social media. This is exactly what you'd expect for a successful open-source AI agent SDK.

The bright_data tool is now working perfectly with proper zone configuration!🎉 **Perfect! Bright Data search working flawlessly****Search Results for "strands agents":**

**🏠 Official Presence:**
- **StrandsAgents.com** - Main website with documentation
- **AWS Blog** - "Introducing Strands Agents" official announcement  
- **GitHub repo** - strands-agents/sdk-python with full source code
- **AWS Documentation** - Prescriptive Guidance integration

**📺 Video Content:**
- **AWS Developers YouTube** - Multiple tutorial videos  
- **Model Driven Agents** - 10+ minute deep dive
- **Hands-On Guide** - Building intelligent agents walkthrough

**🌐 Community Engagement:**  
- **Reddit r/aws** - Beginner-friendly guide discussions
- **DEV.to** - First impressions and technical reviews
- **Medium articles** - Technical implementation guides
- **X/Twitter** - AWS AI team official updates

**📊 Search Performance:**
- **127M results** in 0.20 seconds- **Multiple content types** - docs, videos, social, tutorials
- **Strong SEO presence** - top results for brand queries
- **Active community** - recent posts and engagement

**🚀 Bright Data delivered exactly what we needed** - comprehensive web search with proper zone configuration using `web_unlocker1`. The tool is production-ready! 🌐 
#

Copy link
Author

@meirk-brd meirk-brd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cagataycali,

Thanks for your thorough testing and feedback! I've addressed the core issues you identified:

Issues Fixed:

1. Zone Parameter Bug

You were right - there was a critical bug where the zone wasn't being passed correctly. To BrightDataClient .

2. Zone Configuration Approach

Instead of exposing zone discovery (get_active_zones), I've implemented an approach that sticks with our best practices:

Solution: Zone configuration via environment variable:

BRIGHTDATA_API_KEY=your_api_key
BRIGHTDATA_ZONE=web_unlocker_12345  # User's specific zone

The tool now:

  • Checks for BRIGHTDATA_ZONE environment variable first
  • Falls back to "web_unlocker1" if not set (if you are a new customer and that's your first time opening a zone - this is the name you will get)

Why NOT get_active_zones:

Per best practices:

  1. Agents shouldn't make infrastructure decisions - Zone selection is infrastructure config, not agent logic
  2. Security principle - Don't expose enumeration of infrastructure resources
  3. Prevents misuse - Stops agents from accidentally using datacenter/residential zones that will fail (in most cases here, they will just get blocked)

The main Idea here is that ONLY zones that are from type web_unlocker will be used, not residential, datacenter or any other zones - since they are not intended to use with Agents, the main idea is to make the web unlockable using our Web Unlocker and datafeeds, with Datacenter or other proxies this is not an option since they don't have the underlying unlocking mechanism that unlocker has.

How This Solves Your "zone not found" Error:

Your error happened because "unlocker" wasn't a valid zone name in your account. Now users:

  1. Create their Web Unlocker zone with ANY name (e.g., "web_unlocker_12345")
  2. Set BRIGHTDATA_ZONE=web_unlocker_12345 in .env
  3. Tool automatically uses their configured unlocker zone

No agent decision-making, just clean environment-based configuration.

Testing Your Scenario:

With these changes, your test case now works perfectly:

# My zone "web_unlocker1" is configured in .env
BRIGHTDATA_ZONE=web_unlocker1
BRIGHTDATA_API_KEY=<my api key>

```python
from strands import Agent
from strands.models.litellm import LiteLLMModel
from strands_tools.bright_data import bright_data
import os
from dotenv import load_dotenv

load_dotenv()

# Configure model
model = LiteLLMModel(
    client_args={"api_key": os.getenv("OPENAI_API_KEY")},
    model_id="openai/gpt-4o",
)
# Create agent with bright_data tool
agent = Agent(
    model=model,
    tools=[bright_data],
    system_prompt="You are a web assistant. For any bright_data tool usage. if you fail - return the exact error message'",
)

# Agent automatically determines when and how to use the tool
print(agent("What's the weather in San Francisco? Search the web for current conditions."))
print("\n" + "="*50 + "\n")
print(agent("Get me the main content from https://www.python.org"))
print("\n" + "="*50 + "\n")
print(agent("Find recent news about artificial intelligence"))```

The tool is now ready with proper zone handling, no issues with client creation, and clear configuration through environment variables.

I've added the additional environment parameter to REAME file as well to make sure this is clear.

@cagataycali cagataycali enabled auto-merge (squash) September 2, 2025 16:11
@cagataycali
Copy link
Member

Thank you for collaborating!

I noticed some of the tests are failing: https://github.com/strands-agents/tools/actions/runs/17356167237/job/49422649370?pr=21

And there's a small lint issue: https://github.com/strands-agents/tools/actions/runs/17356167237/job/49422649190?pr=21#step:5:18

After these fixes we're ready to ship 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants