Skip to content

Quota Exceeded with Gemini 2.5 Flash and Browser MCP in Web Automation #1057

Open
@nikogamulin

Description

@nikogamulin

I am encountering a 429 RESOURCE_EXHAUSTED error when running a web automation agent using google.adk.agents.LlmAgent with the gemini-2.5-flash-preview-05-20 model and MCPToolset. Despite having a Tier 1 subscription (which I understand has a 1M TPM limit for this model), the quota appears to be exhausted very quickly after only a few browser interactions.

Steps to Reproduce:

  1. Environment Setup:
  • Python environment with google-adk and python-dotenv installed.
  • .env file configured with necessary API keys.
  • WEB_PROMPT is defined (content not provided, but assumes a general instruction for web Browse).

Code Used:

  1. Code Used:
import os

from dotenv import load_dotenv
load_dotenv()

from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StdioServerParameters

from remote_mcp_agent.prompt import WEB_PROMPT

# ---- MCP Library ----
# https://github.com/modelcontextprotocol/servers
# https://smithery.ai/

root_agent = LlmAgent(
    model='gemini-2.5-flash-preview-05-20',
    name='filesystem_assistant_agent',
    instruction=WEB_PROMPT,
    tools=[
        MCPToolset(
            connection_params=StdioServerParameters(
                command='npx',
                args=[
                    "-y",  # Argument for npx to auto-confirm install
                    "@browsermcp/mcp@latest"
                ],
            ),
        )
    ],
)
  1. Execution:
    The agent was instructed to find available apartments on Airbnb in the Dalmatia region, specifically near Zadar, Brela, or other kid-friendly locations near the beach, using the mcp browser tool.

Observed Behavior:

After initiating the agent, which involved several browser_click and browser_type operations (as seen in the attached log), the execution quickly halted with the following error:

{"error": "429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_paid_tier_input_token_count', 'quotaId': 'GenerateContentPaidTierInputTokensPerModelPerMinute', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-flash'}, 'quotaValue': '1000000'}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '40s'}]}}"}

The attached screenshots and log show that the agent performed actions such as:

  • Navigating to Airbnb.
  • Adjusting guest numbers (adults, children, infants).
  • Performing a search.
  • Clicking on "Filters" and then "Beachfront".

The input token count graph shows a sharp spike exceeding 1.5M tokens, while the request count remained low (around 20 requests). This indicates that the token consumption per request or per interaction is much higher than anticipated for a simple Browse session.

Expected Behavior:

Given a Tier 1 subscription with a 1,000,000 TPM (Tokens Per Minute) quota for gemini-2.5-flash, the agent should be able to perform basic web Browse and interaction without hitting a quota exhaustion error so quickly. The observed number of clicks and page loads does not seem to warrant such high token consumption.

Questions/Concerns:

  1. Token Consumption Analysis: What aspects of the web Browse automation with MCPToolset are consuming such a large number of input tokens? Is the entire page snapshot being sent as input for every interaction, even for minor clicks?
  2. Strategies for Limiting Token Usage: Are there recommended strategies or configurations within ADK or MCPToolset to reduce token consumption during web Browse automation? For example, can the prompt or tool output be made more concise, or can the frequency of full page snapshots be reduced?
  3. Discrepancy with Quota: How is it possible to exceed 1M TPM (1.5M observed in the graph) with only a few interactions when the stated limit is 1M? Is there a misunderstanding of how tokens are calculated for web Browse tools?

Image
Image

Metadata

Metadata

Assignees

Labels

modelsIssues about model supportquestionFurther information is requested

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions