Skip to content

Yandex Search scraper offering a free Python tool for small-scale use and a powerful API for high-volume, real-time SERP data extraction.

Notifications You must be signed in to change notification settings

luminati-io/yandex-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yandex Search Scraper

Promo

This repository offers two reliable solutions for extracting data from Yandex Search Engine Results Pages (SERPs):

  • Free Yandex Scraper: A basic tool for scraping Yandex Search Results at small scale
  • Enterprise-grade Yandex SERP API: A scalable, production-ready solution for high-volume, real-time data extraction (part of Bright Data's SERP Scraper API)

Table of Contents

Free Yandex SERP Scraper

The free scraper provides a straightforward way to collect Yandex SERP data at a small scale. It's perfect for developers needing limited data for personal projects, research, or testing purposes.

free-yandex-serp-scraper

Setup Requirements

  • Python 3.9+
  • Required packages:
    • playwright for browser automation
    • BeautifulSoup for HTML parsing
pip install playwright beautifulsoup4
playwright install

New to web scraping? Explore our Beginner's Guide to Web Scraping with Python

Quick Start Guide

  1. Open yandex-search-results-scraper.py
  2. Customize the search terms and page count variables:
PAGES_PER_TERM = {
    "ergonomic office chair": 2,
}
  1. Run the script

Sample Output

yandex-scraper-output

Limitations

One of the biggest challenges when scraping Yandex is its aggressive CAPTCHA protection:

yandex-captcha-challenge

Yandex uses a strict and constantly evolving anti-bot system to prevent automated data extraction. Frequent CAPTCHA triggers can quickly lead to IP blocks, making it tough to maintain stable, long-running scrapers.

While the free scraper handles basic tasks, it has several important limitations:

  • High risk of IP blocking
  • Limited request volume
  • Constant CAPTCHA interruptions
  • Not suitable for production environments

For a scalable and stable solution, consider Bright Data's dedicated API detailed below. 👇

Yandex SERP Scraper API

The Yandex Search API is part of Bright Data’s SERP Scraping API suite. It leverages our industry-leading proxy infrastructure to deliver real-time Yandex search results with a single API call.

Key Benefits

  • Global Accuracy: Get tailored results for specific locations worldwide
  • Pay-Per-Success: Only pay for successful requests
  • Real-Time Data: Access up-to-date search results in seconds
  • Unlimited Scalability: Handle high-volume scraping effortlessly
  • Cost-Efficient: Eliminates the need for costly infrastructure
  • Reliable Performance: Built-in anti-blocking technology
  • 24/7 Expert Support: Access to technical assistance whenever needed

📌 Try Before You Buy: Test it for free in our SERP API Live Demo

bright-data-serp-api-playground

Getting Started

  1. Create a Bright Data account (new users receive a $5 credit)
  2. Generate your API key
  3. Follow our step-by-step guide to configure the SERP API

Implementation Methods

Direct API Access

The simplest way to use the API is by making a direct request to Bright Data's API endpoint.

cURL Example:

curl https://api.brightdata.com/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_TOKEN" \
  -d '{
        "zone": "ZONE_NAME",
        "url": "https://www.yandex.com/search/?text=apple+watch+series+10+review&lr=95&lang=en",
        "format": "raw"
      }'

Python Example:

import requests
import json

url = "https://api.brightdata.com/request"

headers = {"Content-Type": "application/json", "Authorization": "Bearer API_TOKEN"}

payload = {
    "zone": "ZONE_NAME",
    "url": "https://www.yandex.com/search/?text=apple+watch+series+10+review&lr=95&lang=en",
    "format": "raw",
}

response = requests.post(url, headers=headers, json=payload)

with open("yandex-scraper-api-result.html", "w", encoding="utf-8") as file:
    file.write(response.text)

print("Response saved!")

Native Proxy-Based Access

This alternative method uses proxy routing for direct access to search results.

cURL Example:

curl -i \
  --proxy brd.superproxy.io:33335 \
  --proxy-user brd-customer-<CUSTOMER_ID>-zone-<ZONE_NAME>:<ZONE_PASSWORD> \
  -k \
  "https://www.yandex.com/search/?text=apple+watch+series+10+review&lr=95&lang=en"

Python Example:

import requests
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

host = "brd.superproxy.io"
port = 33335
username = "brd-customer-<customer_id>-zone-<zone_name>"
password = "<zone_password>"
proxy_url = f"http://{username}:{password}@{host}:{port}"

proxies = {"http": proxy_url, "https": proxy_url}

url = "https://www.yandex.com/search/?text=apple+watch+series+10+review&lr=95&lang=en"
response = requests.get(url, proxies=proxies, verify=False)

with open("yandex-scraper-api-result.html", "w", encoding="utf-8") as file:
    file.write(response.text)

print("Response saved!")

Note: When using the native proxy approach, it's recommended to install Bright Data's SSL certificate for production use. Learn more in the SSL Certificate Guide.

👉 See the full HTML output

The query parameters like lr and lang are explained in the next section.

Yandex Search Query Parameters

Localization

Region (lr)

This parameter defines which geographic region or country to target for search results.

Region Code
Moscow 1
Saint-Petersburg 2
USA 84
Canada 95
China 134

Example - Check how "best wireless earbuds" ranks in the USA:

curl --proxy brd.superproxy.io:33335 \
     --proxy-user brd-customer-<id>-zone-<zone>:<password> \
     "https://www.yandex.com/search/?text=best+wireless+earbuds&lr=84"

Language (lang)

Sets the language preference using two-letter language codes:

  • lang=en - English
  • lang=es - Spanish
  • lang=fr - French

Example - Get sports news in Spanish:

https://www.yandex.com/search/?text=local+sports+news&lang=es

Pagination

Page Number (p)

Controls which page of results to display:

  • p=0 - First page (default)
  • p=1 - Second page
  • p=4 - Fifth page

Each Yandex SERP page typically returns 10 results.

Example - Scrape page 3 (results 21-30) for "nike running shoes":

https://www.yandex.com/search/?text=nike+running+shoes&p=2

Time Range

Time Period (within)

Limits results to a specific time period:

  • within=77 - Results from the past 24 hours
  • within=1 - Results from the past 2 weeks
  • within=[%pm] - Results from the past month

Example - Get "iPhone 15 review" results from the past 24 hours:

https://www.yandex.com/search/?text=iphone+15+review&within=77

Device Targeting

Device Type (brd_mobile)

Specifies which device type to simulate:

  • brd_mobile=0 or omitted - Random desktop user-agent
  • brd_mobile=1 - Random mobile user-agent
  • brd_mobile=ios or brd_mobile=iphone - iPhone user-agent
  • brd_mobile=ipad or brd_mobile=ios_tablet - iPad user-agent
  • brd_mobile=android - Android phone user-agent
  • brd_mobile=android_tablet - Android tablet user-agent

Example - Simulate an iPhone searching for responsive website testing:

https://www.yandex.com/search/?text=responsive+website+testing&brd_mobile=ios

Browser Type (brd_browser)

Defines which browser to simulate:

  • Default (omitted) - Random browser
  • brd_browser=chrome - Google Chrome
  • brd_browser=safari - Safari
  • brd_browser=firefox - Mozilla Firefox

Example - Simulate Safari browser searching for Python tutorials:

https://www.yandex.com/search/?text=how+to+learn+python&brd_browser=safari

Note: Don't combine brd_browser=firefox with brd_mobile=1 as they're incompatible.

Practical Example

For comprehensive targeting, you can combine multiple parameters:

https://www.yandex.com/search/?text=organic+skincare+products
&lr=95
&lang=en
&p=2
&within=1
&brd_mobile=ios
&brd_browser=safari

This search:

  • Targets Canadian users (lr=95)
  • Shows English results (lang=en)
  • Displays the second page (p=2)
  • Limits to the past 2 weeks (within=1)
  • Simulates an iPhone user (brd_mobile=ios)
  • Uses Safari browser (brd_browser=safari)

Perfect for a skincare company researching recent organic product trends in the Canadian market as viewed by iOS mobile users.

Support & Resources

About

Yandex Search scraper offering a free Python tool for small-scale use and a powerful API for high-volume, real-time SERP data extraction.

Topics

Resources

Stars

Watchers

Forks