GitHub - laurden/cleaner-adblock: detect dead and redirected domains from adblock lists

This project is currently under development and not ready for production use.

Several features and edge cases still require further refinement and testing.

This repository represents a combination of the ryanbr/cleaner-adblock project and a separate project independently developed by the author of this fork. It is not intended to compete with or replace the original project.

The goal of this effort is to collaboratively enhance, extend and learn from the existing work, encouraging community-driven improvement and maintaining the open-source spirit of shared development and knowledge.

Important

_{All content below this line is the README file of the upstream repository.

It has been preserved for reference and attribution to the original author, @ryanbr, without modification.

This will change in future commits to this repository.}

Minimal Domain Scanner

A Node.js tool that scans adblock filter lists to identify dead domains and redirecting domains, helping maintain clean and efficient filter lists.

Overview

This tool parses adblock filter lists, checks the status of domains found in various rule types, and categorizes them into:

Dead domains - Domains that don't resolve or return errors (should be removed)
Redirecting domains - Domains that redirect to different domains (should be reviewed)

Features

Multiple Rule Format Support: Handles uBlock Origin, Adguard, and network rules
Concurrent Processing: Checks multiple domains simultaneously for speed
Smart Domain Variants: Optionally checks both domain.com and www.domain.com
Similar Domain Filtering: Can ignore redirects to subdomains of the same base domain
Comprehensive Error Handling: Detects DNS failures, timeouts, HTTP errors
Debug Modes: Various debug levels for troubleshooting
Test Mode: Quick testing on a subset of domains

Installation

Prerequisites

Node.js (v14 or higher recommended)
npm (comes with Node.js)

Setup

# Clone or download the repository
git clone <your-repo-url>
cd <repo-directory>

# Install dependencies
npm install puppeteer

Usage

Basic Usage

node cleaner-adblock.js

This will scan the default file (easylist_specific_hide.txt) and generate two output files.

Command-Line Options

node cleaner-adblock.js [options]

Input Options

--input=<file> - Specify input file to scan (default: easylist_specific_hide.txt)

Domain Checking Options

--add-www - Check both domain.com and www.domain.com for bare domains
--ignore-similar - Ignore redirects to subdomains of same base domain

Debug Options

--debug - Enable basic debug output
--debug-verbose - Enable verbose debug output
--debug-network - Log network requests/responses
--debug-browser - Log browser events
--debug-all - Enable all debug options

Testing Options

--test-mode - Only test first 5 domains (quick testing)
--test-count=N - Only test first N domains

Help

--help or -h - Show help message

Examples

# Scan custom filter list
node cleaner-adblock.js --input=my_rules.txt

# Check both domain.com and www.domain.com variants
node cleaner-adblock.js --add-www

# Ignore subdomain redirects (reduces noise)
node cleaner-adblock.js --ignore-similar

# Combine options
node cleaner-adblock.js --input=my_rules.txt --add-www --ignore-similar

# Debug mode for troubleshooting
node cleaner-adblock.js --debug --test-mode

# Test first 10 domains with full debugging
node cleaner-adblock.js --debug-all --test-count=10

Supported Rule Types

uBlock Origin / Cosmetic Rules

domain.com##.selector           # Element hiding
domain.com##+js(scriptlet)      # Scriptlet injection
domain.com#@#.selector          # Exception rule

Adguard Rules

domain.com##selector            # Element hiding
domain.com#@#selector           # Exception
domain.com#$#selector           # CSS injection
domain.com#%#//scriptlet(...)   # Scriptlet
domain.com#?#selector           # Extended CSS
domain.com#@$?#selector         # Extended CSS exception
domain1.com,domain2.com##selector  # Multiple domains

Network Rules

/path$script,domain=example.com
||domain.com^$script,domain=site1.com|site2.com

Extracts domains from the domain= parameter.

Output Files

`dead_domains.txt`

Contains domains that should be removed from filter lists:

HTTP 404, 410, 5xx errors
DNS resolution failures
Connection timeouts
Network errors

Format:

# Dead/Non-Existent Domains
# Generated: 2025-11-08T10:30:00.000Z
# Total found: 15

example-dead.com # ERR_NAME_NOT_RESOLVED
old-site.net # 404 Not Found
timeout-site.org # Navigation timeout

`redirect_domains.txt`

Contains domains that redirect to different domains (review for potential rule updates):

Format:

# Redirecting Domains
# Generated: 2025-11-08T10:30:00.000Z
# Total found: 8

old-domain.com → new-domain.com # https://new-domain.com/
example.org → example.com # https://example.com/

How It Works

Parse Input File: Extracts unique domains from various filter rule formats
Validate Domains: Filters out .onion domains, IP addresses, and localhost
Expand Variants: Optionally creates domain variants with/without www
Browser-Based Checking: Uses Puppeteer to:
- Navigate to each domain
- Follow redirects
- Detect DNS failures
- Handle HTTP errors
- Capture timeouts
Categorize Results: Separates dead domains from redirecting domains
Generate Reports: Creates organized output files with explanations

Configuration

Default settings (can be modified in the code):

const TIMEOUT = 25000;              // Page load timeout (25 seconds)
const FORCE_CLOSE_TIMEOUT = 60000;  // Force-close timeout (60 seconds)
const CONCURRENCY = 12;              // Concurrent domain checks

Special Features

`--add-www` Behavior

domain.com → checks both domain.com AND www.domain.com
If either works, domain is marked as active
sub.domain.com → only checks sub.domain.com (no www added)
www.domain.com → only checks www.domain.com (already has www)

`--ignore-similar` Behavior

Reduces noise from internal subdomain redirects:

example.com → sub.example.com (ignored - same base domain)
example.com → different.com (flagged - different domain)

Useful for sites that redirect to CDN or regional subdomains.

Error Handling

The tool handles various error scenarios:

DNS failures (ERR_NAME_NOT_RESOLVED)
Connection errors (ERR_CONNECTION_REFUSED, ERR_CONNECTION_TIMED_OUT)
HTTP status codes (404, 410, 5xx)
SSL/Certificate errors (automatically ignored)
Page load timeouts
Navigation errors

Troubleshooting

Issue: "Cannot find module 'puppeteer'"

npm install puppeteer

Issue: Browser fails to launch

Try adding more Puppeteer args in the code:

args: [
  '--no-sandbox',
  '--disable-setuid-sandbox',
  '--disable-dev-shm-usage'
]

Issue: Too many timeouts

Increase the timeout value:

const TIMEOUT = 35000; // 35 seconds

Issue: Running out of memory

Reduce concurrency:

const CONCURRENCY = 6; // Lower concurrency

Performance Tips

Use --test-mode first to verify everything works
Adjust CONCURRENCY based on your system resources
Use --ignore-similar to reduce false positives
Monitor system resources during large scans
Consider splitting very large filter lists

Use Cases

Filter List Maintenance: Identify outdated domains in adblock lists
List Optimization: Remove dead domains to reduce list size
Rule Updates: Find domains that need rule updates due to redirects
Quality Assurance: Validate filter lists before distribution
Domain Research: Analyze domain status across multiple filter lists

License

[Specify your license here]

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

Acknowledgments

Built with Puppeteer for reliable browser automation and domain checking.

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
lib		lib
tests		tests
tools		tools
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
cleaner-adblock.js		cleaner-adblock.js
eslint.config.js		eslint.config.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

laurden/cleaner-adblock

Folders and files

Latest commit

History

Repository files navigation

This project is currently under development and not ready for production use.

Minimal Domain Scanner

Overview

Features

Installation

Prerequisites

Setup

Usage

Basic Usage

Command-Line Options

Input Options

Domain Checking Options

Debug Options

Testing Options

Help

Examples

Supported Rule Types

uBlock Origin / Cosmetic Rules

Adguard Rules

Network Rules

Output Files

dead_domains.txt

redirect_domains.txt

How It Works

Configuration

Special Features

--add-www Behavior

--ignore-similar Behavior

Error Handling

Troubleshooting

Issue: "Cannot find module 'puppeteer'"

Issue: Browser fails to launch

Issue: Too many timeouts

Issue: Running out of memory

Performance Tips

Use Cases

License

Contributing

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

`dead_domains.txt`

`redirect_domains.txt`

`--add-www` Behavior

`--ignore-similar` Behavior