First, ensure that the config file exists at the following location (per your OS):
| OS | Path |
|---|---|
| Linux | $HOME/.config/Rustle/config.toml |
| MacOS | $HOME/Library/Application Support/Rustle/config.toml |
Then just use:
rustle
Example config.toml file:
origin_url = "https://example.com"
depth = 6
database_name = "crawler"-
To configure logging, this program uses the
RUST_LOGenvironment variable, with options:errorwarninfodebugtrace
-
Example:
RUST_LOG=info rustle
- Abstract code & functionality into structs & other files
- Use SQLite to store information about websites, instead of downloading HTML
- Recursion fix, specify depth
- config file parsing to specify origin url & depth
- Parallel / distributed crawling
- Obey
robots.txt