Several features and edge cases still require further refinement and testing.
This repository represents a combination of the ryanbr/cleaner-adblock project and a separate project independently developed by the author of this fork. It is not intended to compete with or replace the original project.
The goal of this effort is to collaboratively enhance, extend and learn from the existing work, encouraging community-driven improvement and maintaining the open-source spirit of shared development and knowledge.
Important
All content below this line is the README file of the upstream repository.
It has been preserved for reference and attribution to the original author, @ryanbr, without modification.
This will change in future commits to this repository.
A Node.js tool that scans adblock filter lists to identify dead domains and redirecting domains, helping maintain clean and efficient filter lists.
This tool parses adblock filter lists, checks the status of domains found in various rule types, and categorizes them into:
- Dead domains - Domains that don't resolve or return errors (should be removed)
- Redirecting domains - Domains that redirect to different domains (should be reviewed)
- Multiple Rule Format Support: Handles uBlock Origin, Adguard, and network rules
- Concurrent Processing: Checks multiple domains simultaneously for speed
- Smart Domain Variants: Optionally checks both
domain.comandwww.domain.com - Similar Domain Filtering: Can ignore redirects to subdomains of the same base domain
- Comprehensive Error Handling: Detects DNS failures, timeouts, HTTP errors
- Debug Modes: Various debug levels for troubleshooting
- Test Mode: Quick testing on a subset of domains
- Node.js (v14 or higher recommended)
- npm (comes with Node.js)
# Clone or download the repository
git clone <your-repo-url>
cd <repo-directory>
# Install dependencies
npm install puppeteernode cleaner-adblock.jsThis will scan the default file (easylist_specific_hide.txt) and generate two output files.
node cleaner-adblock.js [options]--input=<file>- Specify input file to scan (default:easylist_specific_hide.txt)
--add-www- Check bothdomain.comandwww.domain.comfor bare domains--ignore-similar- Ignore redirects to subdomains of same base domain
--debug- Enable basic debug output--debug-verbose- Enable verbose debug output--debug-network- Log network requests/responses--debug-browser- Log browser events--debug-all- Enable all debug options
--test-mode- Only test first 5 domains (quick testing)--test-count=N- Only test first N domains
--helpor-h- Show help message
# Scan custom filter list
node cleaner-adblock.js --input=my_rules.txt
# Check both domain.com and www.domain.com variants
node cleaner-adblock.js --add-www
# Ignore subdomain redirects (reduces noise)
node cleaner-adblock.js --ignore-similar
# Combine options
node cleaner-adblock.js --input=my_rules.txt --add-www --ignore-similar
# Debug mode for troubleshooting
node cleaner-adblock.js --debug --test-mode
# Test first 10 domains with full debugging
node cleaner-adblock.js --debug-all --test-count=10domain.com##.selector # Element hiding
domain.com##+js(scriptlet) # Scriptlet injection
domain.com#@#.selector # Exception rule
domain.com##selector # Element hiding
domain.com#@#selector # Exception
domain.com#$#selector # CSS injection
domain.com#%#//scriptlet(...) # Scriptlet
domain.com#?#selector # Extended CSS
domain.com#@$?#selector # Extended CSS exception
domain1.com,domain2.com##selector # Multiple domains
/path$script,domain=example.com
||domain.com^$script,domain=site1.com|site2.com
Extracts domains from the domain= parameter.
Contains domains that should be removed from filter lists:
- HTTP 404, 410, 5xx errors
- DNS resolution failures
- Connection timeouts
- Network errors
Format:
# Dead/Non-Existent Domains
# Generated: 2025-11-08T10:30:00.000Z
# Total found: 15
example-dead.com # ERR_NAME_NOT_RESOLVED
old-site.net # 404 Not Found
timeout-site.org # Navigation timeout
Contains domains that redirect to different domains (review for potential rule updates):
Format:
# Redirecting Domains
# Generated: 2025-11-08T10:30:00.000Z
# Total found: 8
old-domain.com → new-domain.com # https://new-domain.com/
example.org → example.com # https://example.com/
- Parse Input File: Extracts unique domains from various filter rule formats
- Validate Domains: Filters out .onion domains, IP addresses, and localhost
- Expand Variants: Optionally creates domain variants with/without www
- Browser-Based Checking: Uses Puppeteer to:
- Navigate to each domain
- Follow redirects
- Detect DNS failures
- Handle HTTP errors
- Capture timeouts
- Categorize Results: Separates dead domains from redirecting domains
- Generate Reports: Creates organized output files with explanations
Default settings (can be modified in the code):
const TIMEOUT = 25000; // Page load timeout (25 seconds)
const FORCE_CLOSE_TIMEOUT = 60000; // Force-close timeout (60 seconds)
const CONCURRENCY = 12; // Concurrent domain checksdomain.com→ checks bothdomain.comANDwww.domain.com- If either works, domain is marked as active
sub.domain.com→ only checkssub.domain.com(no www added)www.domain.com→ only checkswww.domain.com(already has www)
Reduces noise from internal subdomain redirects:
example.com→sub.example.com(ignored - same base domain)example.com→different.com(flagged - different domain)
Useful for sites that redirect to CDN or regional subdomains.
The tool handles various error scenarios:
- DNS failures (ERR_NAME_NOT_RESOLVED)
- Connection errors (ERR_CONNECTION_REFUSED, ERR_CONNECTION_TIMED_OUT)
- HTTP status codes (404, 410, 5xx)
- SSL/Certificate errors (automatically ignored)
- Page load timeouts
- Navigation errors
npm install puppeteerTry adding more Puppeteer args in the code:
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage'
]Increase the timeout value:
const TIMEOUT = 35000; // 35 secondsReduce concurrency:
const CONCURRENCY = 6; // Lower concurrency- Use
--test-modefirst to verify everything works - Adjust
CONCURRENCYbased on your system resources - Use
--ignore-similarto reduce false positives - Monitor system resources during large scans
- Consider splitting very large filter lists
- Filter List Maintenance: Identify outdated domains in adblock lists
- List Optimization: Remove dead domains to reduce list size
- Rule Updates: Find domains that need rule updates due to redirects
- Quality Assurance: Validate filter lists before distribution
- Domain Research: Analyze domain status across multiple filter lists
[Specify your license here]
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Built with Puppeteer for reliable browser automation and domain checking.
For issues, questions, or suggestions, please open an issue on GitHub.