Web Crawler & Image Extractor

A comprehensive web crawling tool with image extraction capabilities and keyword-based content filtering. Built with Streamlit and the crawl4ai library, this application efficiently crawls websites, extracts relevant content and images based on keywords, and presents the results in an interactive web interface.

Features

Interactive Web UI: User-friendly Streamlit interface for configuring and running crawls
Keyword-Based Filtering: Extract only content relevant to specified keywords
Real-Time Progress Tracking: Live updates during the crawling process
Image Extraction: Automatically download and categorize images from websites
Content Analysis: Analyze keyword frequency and relevance across pages
Highlighted Content: View extracted content with keyword highlights
Result Organization: Structured storage of crawl results for easy access

Installation

Prerequisites

Python 3.8+
pip package manager
chromium browser

Setup without Docker

Clone the repository:

git clone https://github.com/lahiruramesh/web-snapper.git
cd web-snapper

Create and activate virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages

pip install -r requirements.txt

Starting the application

streamlit run app.py

Setup with Docker

docker build -t web-snapper .

docker run -p 8501:8501 -v $(pwd)/crawler_results:/app/crawler_results web-snapper

The application will open in your default web browser at http://localhost:8501

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
traefik		traefik
.env		.env
.gitignore		.gitignore
.gitigore		.gitigore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
treafik.yml		treafik.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler & Image Extractor

Features

Installation

Prerequisites

Setup without Docker

Setup with Docker

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lahiruramesh/web-snapper

Folders and files

Latest commit

History

Repository files navigation

Web Crawler & Image Extractor

Features

Installation

Prerequisites

Setup without Docker

Setup with Docker

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages