The project provides code examples for the anonymous web scraping using Node.js article. It covers two tiers of scraping techniques:
-
Tier 1: Scraping static websites with Cheerio and user-agent rotation.
-
Tier 2: Scraping dynamic websites using the Incogniton API alongside puppeteer, including pagination handling.
It also includes Incogniton fingerprint trustworthiness tests using tools like IPHey, FingerprintPro, and SannySoft, as well as a non-headless scraping mode option so you can see how the automation process in action.
To get started with this project, follow these steps:
- Clone the repository:
git clone https://github.com/HAYVENO/anonymous-scraper.git- Navigate to the project directory:
cd anonymous-scraper-
Install the necessary dependencies:
Ensure you have Node.js installed. Then, run:
npm install-
Run the scraper files:
Each scraper file can be executed using Node.js. For example, to run
anon-scraper2.js, use:
node anon-scraper2.jsFor files located inside a folder, such as the Tests, navigate to the tests directory and run the desired test file using Node.js. For example, to run test-file.js, use:
cd tests
node test-file.jsOr you can run the test file directly from the root folder:
node tests/test-file.jsReplace
test-file.jsoranon-scraper2.jswith the specific file name you wish to execute.
Note: Before running the scripts, review the code and any associated configuration files to ensure they are set up correctly for your target websites and comply with their terms of service.
| Test | Description |
|---|---|
| IPHey | Analyzes your browser's digital identity to determine its trustworthiness. |
| FingerprintPro | Provides browser fingerprinting demo to identify users even when you using Incognito mode. |
| SannySoft | Evaluates whether a browser is being controlled by automation tools like Puppeteer or Selenium. |
For more information and guidance, refer to the original article associated with this repository.