Skip to content

romek-rozen/ai-overview-extractor

Repository files navigation

AI Overview Extractor - Browser Extension

πŸ” Extract AI Overview content from Google Search to Markdown format

Extension automatically detects AI Overview on Google results pages and enables exporting content along with sources to readable Markdown format.

Extension demo

πŸš€ Features

  • βœ… Automatic detection of AI Overview on Google Search (#m-x-content container)
  • πŸ“‹ Content extraction to Markdown format using TurndownService library
  • 🧹 Advanced cleaning - removes MSC elements, CSS, JavaScript and hidden elements
  • πŸ” Automatic extraction of search keyword from query
  • πŸ”— Source extraction with cleaned Google URLs
  • πŸ’Ύ Copy to clipboard with one click
  • πŸ“₯ Download as .md file with timestamp
  • πŸš€ Webhooks - automatic sending of data to external APIs
  • βš™οΈ Webhook configuration - easy URL setup and connection testing
  • 🎨 Clean interface with preview and notifications
  • πŸ”„ DOM Observer - automatic button addition for new results
  • πŸ€– Intelligent State Machine - Predictable automation flow control
  • πŸ›‘οΈ Circuit Breaker Protection - Automatic failure detection and recovery
  • πŸ”„ Unified Detection System - Robust container detection with multiple fallback strategies
  • βš™οΈ Advanced Debugging - Real-time system status monitoring and reset capabilities

πŸ“¦ Installation

Method 1: Chrome/Chromium - Developer Mode

  1. Download files - copy all files to ai-overview-extractor/ folder
  2. Open Chrome and navigate to chrome://extensions/
  3. Enable "Developer mode" (toggle in top right corner)
  4. Click "Load unpacked extension"
  5. Select ai-overview-extractor/ folder
  6. Done! Extension will be loaded

Method 2: Firefox - Developer Mode

  1. Download files - copy all files to ai-overview-extractor/ folder
  2. Open Firefox and navigate to about:debugging
  3. Click "This Firefox" in left menu
  4. Click "Load Temporary Add-on..."
  5. Select manifest.json file from extension folder
  6. Done! Extension will be loaded

Method 3: Firefox - Permanent Installation

  1. Go to about:config in Firefox
  2. Find xpinstall.signatures.required and set to false
  3. Pack extension folder to .zip file
  4. Change extension to .xpi
  5. Drag .xpi file to Firefox

🎯 Usage

Basic extraction

  1. Search for something on Google (e.g. "diabetes")
  2. Wait for AI Overview to appear
  3. Click "Show more" button
  4. Click "Show all"
  5. Click "πŸ“‹ Extract to Markdown" button
  6. Copy content or download as file

Webhook configuration

  1. Click "πŸ“‹ Extract to Markdown" button
  2. In "πŸ”— Webhook Configuration" section enter your API URL
  3. Test connection with "πŸ§ͺ Test" button
  4. Save configuration with "πŸ’Ύ Save" button
  5. Send data with "πŸš€ Send webhook" button

Example webhook URLs

https://your-api.com/ai-overview-webhook
https://example.com/webhook-endpoint
https://api.your-domain.com/receive-ai-data
http://localhost:5678/webhook/ai-overview-extractor  # n8n locally

πŸ”— n8n Integration

The extension is fully compatible with n8n and includes a ready-made template workflow for comprehensive automation!

πŸš€ Ready n8n Template

In the workflows_templates/ folder you'll find ready workflow AI_Overviews_Extractor_Plugin.json which contains:

πŸ“‹ Workflow features:

  1. Webhook endpoint - automatic data reception from extension
  2. HTML→Markdown processing - content conversion
  3. Google Sheets saving - automatic results storage
  4. AI Guidelines Generator - LLM generates SEO guidelines based on AI Overview
  5. Automation - scheduler every 15 minutes + manual trigger
  6. Page analysis - fetching and analyzing content from URLs

πŸ› οΈ Template installation:

  1. In n8n go to: Templates β†’ Import from JSON
  2. Load file: workflows_templates/AI_Overviews_Extractor_Plugin.json
  3. Configure nodes:
    • Google Sheets (OAuth connection)
    • OpenRouter Chat Model (API key)
    • Set Google Sheets URL in nodes
  4. Activate workflow
  5. Copy webhook URL (from Webhook node)

βš™οΈ Extension configuration:

  1. Webhook URL: http://localhost:5678/webhook/ai-overview-extractor
  2. Test connection - should return status 200
  3. Save configuration

πŸ“Š What the workflow does:

  • Receives data from extension (keyword, markdown, HTML, sources)
  • Saves to sheet all AI Overview data
  • Analyzes pages from Google Sheets (myURL column)
  • Generates SEO guidelines using AI (compares page content with AI Overview)
  • Updates sheet with generated guidelines
  • Automatic execution every 15 minutes for new tasks

🎯 Benefits:

  • Full automation - from extraction to analysis
  • Knowledge base - all AI Overviews in one place
  • SEO insights - AI guidelines on what to add to page
  • Scalability - batch processing of multiple URLs
  • Monitoring - tracking changes in AI Overview

πŸ”§ n8n Requirements:

  • n8n v1.95.3+ (locally or in cloud)
  • Google Sheets API (for data saving)
  • OpenRouter API (for AI guidelines) or other LLM provider
  • Webhook endpoint active on port 5678

πŸ“ File Structure

ai-overview-extractor/
β”œβ”€β”€ manifest.json      # Extension configuration (Manifest V3)
β”œβ”€β”€ styles.css         # User interface styles
β”œβ”€β”€ README.md          # This documentation
β”œβ”€β”€ LICENCE            # MIT License
β”œβ”€β”€ .gitignore         # Files ignored by Git
β”œβ”€β”€ AI_SUMMARY.md      # πŸ€– CRITICAL: Technical documentation for AI/LLM systems - main project overview
β”œβ”€β”€ src/              # Source files
β”‚   β”œβ”€β”€ content.js                    # Main orchestrator with AIOverviewExtractor class
β”‚   β”œβ”€β”€ automation-state-machine.js  # State machine for automation flow control
β”‚   β”œβ”€β”€ automation-circuit-breaker.js # Circuit breaker protection against failures
β”‚   β”œβ”€β”€ container-detection-manager.js # Unified container detection system
β”‚   β”œβ”€β”€ settings-manager.js           # Extension settings management
β”‚   β”œβ”€β”€ auto-expander-overviews.js    # Automatic AI overview expansion
β”‚   β”œβ”€β”€ auto-expander-sources.js      # Automatic source list expansion
β”‚   β”œβ”€β”€ auto-webhook.js               # Automatic webhook dispatch
β”‚   β”œβ”€β”€ extraction-orchestrator.js    # Manual extraction coordination
β”‚   β”œβ”€β”€ content-extractor.js          # Content and source extraction
β”‚   β”œβ”€β”€ markdown-generator.js         # Markdown conversion
β”‚   β”œβ”€β”€ ui-manager.js                 # In-page UI management
β”‚   β”œβ”€β”€ popup.js                      # Extension popup management
β”‚   β”œβ”€β”€ popup.html                    # Popup interface structure
β”‚   β”œβ”€β”€ popup.css                     # Popup interface styling
β”‚   β”œβ”€β”€ webhook-manager.js            # Webhook management and POST requests
β”‚   β”œβ”€β”€ turndown.js                   # HTMLβ†’Markdown conversion library
β”‚   └── README.md                     # Source code documentation
β”œβ”€β”€ icons/            # Extension icons
β”‚   β”œβ”€β”€ icon-16.png
β”‚   β”œβ”€β”€ icon-32.png  
β”‚   β”œβ”€β”€ icon-48.png
β”‚   β”œβ”€β”€ icon-96.png
β”‚   └── icon-128.png
β”œβ”€β”€ images/           # Documentation images
β”‚   β”œβ”€β”€ ai-overviews-extractor.gif
β”‚   β”œβ”€β”€ ai-overview-extractor-001.jpg
β”‚   └── ai_overviews_extractor_logo.png
β”œβ”€β”€ workflows_templates/  # Ready n8n workflow template
β”‚   β”œβ”€β”€ AI_Overviews_Extractor_Plugin.json  # Comprehensive n8n workflow
β”‚   └── README.md                            # Workflow documentation
β”œβ”€β”€ n8n-template-submission/  # n8n Template Store submission files
β”‚   β”œβ”€β”€ AI_Overviews_Extractor_Plugin.json  # Template workflow file
β”‚   β”œβ”€β”€ README.md                            # Template description
β”‚   β”œβ”€β”€ setup-instructions.md               # Installation instructions
β”‚   β”œβ”€β”€ template-description.md             # Detailed template description
β”‚   └── template-name.txt                   # Template name
└── docs/             # Publication and legal documentation
    β”œβ”€β”€ chrome-web-store-description.md         # Chrome Web Store description
    β”œβ”€β”€ chrome-web-store-privacy-justifications.md # Chrome privacy justifications
    β”œβ”€β”€ chrome-web-store-permission-justifications.md # Chrome permission justifications
    β”œβ”€β”€ chrome-web-store-appeal-response.md     # Chrome Store rejection response
    β”œβ”€β”€ firefox-release-notes.md                # Firefox Add-ons release notes
    β”œβ”€β”€ firefox-reviewer-notes.md               # Firefox reviewer notes
    └── privacy-policy.md                       # Privacy policy

βš™οΈ Requirements

  • Chrome/Chromium (latest version) or Firefox 58+ (Firefox Quantum)
  • Manifest V3 - modern extension standard
  • Page: google.com/search
  • Language: Works with Google interface in any language
  • Permissions: storage, host_permissions: *://www.google.com/*

πŸ”§ Configuration

Extension works automatically on:

  • *://www.google.com/search*

To add other Google domains, edit content_scripts.matches section in manifest.json:

"content_scripts": [
  {
    "matches": [
      "*://www.google.com/search*",
      "*://www.google.pl/search*",
      "*://www.google.de/search*"
    ],
    "js": ["src/turndown.js", "src/content.js"],
    "css": ["styles.css"],
    "run_at": "document_end"
  }
]

πŸ” How it works

AI Overview Detection

  • Looks for #m-x-content container on page
  • Uses MutationObserver to monitor DOM changes
  • Automatically adds button when container is found
  • State Machine Architecture - Predictable automation flow: IDLE β†’ EXPANDING_OVERVIEW β†’ EXPANDING_SOURCES β†’ SENDING_WEBHOOK β†’ COMPLETE

Content Extraction

  • Removes elements with data-subtree="msc" (MSC elements)
  • Removes elements with style="display:none" (hidden elements)
  • Removes sources container before conversion
  • Converts HTML to Markdown using TurndownService

Source Extraction

  • Finds sources container div[style="height: 100%;"]
  • Extracts links from visible list ul[class]
  • Improved button detection - Fixed jsaction selector for reliable expansion
  • Cleans Google URLs (removes /url? wrappers)
  • Filters duplicates and invalid links

Automation System

  • Circuit Breaker Protection - Prevents infinite loops and system failures
  • Unified Container Detection - Multiple fallback strategies with debouncing
  • Enhanced Debugging - getAutomationStatus() and resetAutomation() methods
  • Robust Error Handling - Automatic fallback to manual mode

Webhooks

  • Automatic sending of data to external APIs via POST method
  • UI configuration - easy webhook URL setup
  • Connection testing - check if webhook works
  • Secure storage - URL saved in chrome.storage
  • Complete payload - keyword, markdown, HTML and sources
  • Error handling - 5s timeout and informative messages

Webhook data format:

{
  "timestamp": "2025-01-06T12:30:00Z",
  "searchQuery": "search keyword",
  "aiOverview": {
    "content": "markdown content",
    "htmlContent": "cleaned HTML"
  },
  "sources": [
    {"title": "Title", "url": "https://url.com"}
  ],
  "metadata": {
    "googleSearchUrl": "https://google.com/search?q=...",
    "extractedAt": "2025-01-06T12:30:00Z",
    "userAgent": "Mozilla/5.0...",
    "extensionVersion": "1.0.8"
  }
}

πŸ› Troubleshooting

Button doesn't appear

  • Check if AI Overview is actually on the page
  • Open console (F12) and look for [AI Overview Extractor] logs
  • Check if #m-x-content element exists
  • Refresh page and wait for full loading

Sources not expanding automatically

  • Check console for [AutoExpanderSources] logs
  • Verify auto-expand sources is enabled in popup settings
  • Try refreshing page if button detection fails
  • Check for jsaction attribute changes in new Google updates

No content in markdown

  • AI Overview may not be fully loaded
  • Try again after a few seconds
  • Check console logs - they should show extraction process
  • Check for JavaScript errors

Copy error

  • Check if browser has clipboard permissions
  • Try downloading file instead of copying
  • Check if page is served over HTTPS

Source issues

  • Check console logs about found links
  • Some sources may be filtered (Google, support etc.)
  • URLs are automatically cleaned from Google wrappers

Automation issues

  • Use getAutomationStatus() in console to check system state
  • Use resetAutomation() to reset state machine if stuck
  • Check circuit breaker status in console logs
  • Verify settings are properly saved in popup

πŸ”„ Updates

To update the extension:

Chrome/Chromium:

  1. Download new files
  2. Replace old files in extension folder
  3. Go to chrome://extensions/
  4. Click "Reload" on the extension

Firefox:

  1. Download new files
  2. Replace old files in extension folder
  3. Go to about:debugging
  4. Click "Reload" on the extension

πŸ“ Changelog

v1.0.8 (current)

  • πŸ› Fixed auto-expand sources functionality - Fixed critical issue where sources were not expanding automatically
  • πŸ”§ Button selector bug - Fixed jsaction selector from ^="trigger" to *="trigger" to properly match jsaction="trigger.Wyhgxe"
  • ⏱️ Timing improvements - Increased delays for better DOM loading stability (1500ms, 1000ms, 3000ms)
  • πŸ” Expansion detection logic - Made areSourcesExpanded() more conservative to prevent false positives
  • πŸ›‘οΈ Button clicking prevention - Added buttonClicked flag to prevent multiple clicking attempts
  • πŸ”§ Reliable Source Expansion - Fixed button detection with improved jsaction selectors

v1.0.7

  • πŸ—οΈ Major Architecture Refactor - Replaced callback-based system with state machine for predictable automation flow
  • πŸ”§ State Machine Implementation - Clear automation states: IDLE β†’ EXPANDING_OVERVIEW β†’ EXPANDING_SOURCES β†’ SENDING_WEBHOOK β†’ COMPLETE
  • πŸ›‘οΈ Circuit Breaker Protection - Added circuit breakers to prevent infinite loops and system failures
  • 🎯 Unified Container Detection - Single manager for all container detection with debouncing and multiple strategies
  • πŸ› Fixed Automation Issues - Resolved multiple button clicks, infinite loops, and unpredictable behavior
  • πŸ“Š Enhanced Debugging - Added getAutomationStatus() and resetAutomation() methods for system monitoring
  • πŸ” Improved Reliability - Robust error handling with automatic fallback to manual mode
  • βš™οΈ Modular Design - Clean separation of concerns with well-defined module interfaces

v1.0.6

  • ✨ Auto-expand AI overviews - Automatically clicks "Show more" button on collapsed AI overviews
  • πŸ”— Auto-expand sources - Automatically expands collapsed source sections
  • πŸš€ Auto-webhook functionality - Automatically sends extracted data to configured endpoints
  • βš™οΈ Settings management - Persistent storage with Chrome storage API
  • 🎨 Extension popup interface - Modern settings panel accessible from browser toolbar
  • πŸ”„ Real-time synchronization - Changes applied immediately across all tabs

v1.0.5

  • πŸ”§ Chrome Web Store compliance - removed unnecessary activeTab permission
  • πŸ“ Documentation update - updated all permission justifications and descriptions
  • βœ… Verification - confirmed extension works with minimal permissions only
  • πŸͺ Store ready - prepared for Chrome Web Store resubmission

v1.0.4

  • 🌍 English translation - complete interface and documentation translation
  • 🎨 UI improvements - updated button text and user messages
  • πŸ“ Documentation - fully translated README and user guides
  • 🧹 Code cleanup - improved console messages and comments

v1.0.3

  • πŸš€ NEW: Webhooks - automatic data sending to external APIs
  • βš™οΈ Webhook configuration - UI for URL setup and testing
  • 🧹 Improved cleaning - removal of CSS, JavaScript and inline styles
  • πŸ’Ύ Chrome Storage - secure configuration storage
  • πŸ”’ HTTPS validation - webhook security
  • ⏱️ Timeout handling - error handling and timeouts (5s)

v1.0.2

  • πŸ”§ Stability and compatibility fixes
  • πŸ“± Manifest V3 support
  • 🌐 Chrome and Firefox compatibility

v1.0.1

  • πŸ› Source extraction bug fixes
  • ⚑ Performance optimizations
  • πŸ” Improved AI Overview detection

v1.0.0

  • ✨ First version
  • πŸ“‹ AI Overview extraction to Markdown with TurndownService
  • πŸ”— Source extraction with Google URL cleaning
  • 🧹 Advanced content filtering (MSC, hidden elements)
  • πŸ’Ύ Copy and download with timestamp
  • 🎨 User interface with notifications
  • πŸ”„ DOM observer for dynamic changes

🀝 Contributing

Project is open source! You can:

  • πŸ› Report bugs via Issues on GitHub
  • πŸ’‘ Suggest features
  • πŸ”§ Send Pull Requests
  • ⭐ Star if you like the project

GitHub: https://github.com/romek-rozen/ai-overview-extractor

πŸ‘¨β€πŸ’» Author

Roman Rozenberger

πŸ“„ License

MIT License - you can use, modify and distribute for free.


Useful? Leave a ⭐ and share with others!

Created with ❀️ for SEO and digital marketing community.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published