AI Web Crawler

This is a Python-based AI Web Crawler that crawls websites and collects data. It uses techniques to navigate web pages and extract valuable information, which can be used for various purposes such as data mining, SEO analysis, or gathering data for machine learning models.

Installation

Clone the repository:

git clone https://github.com/Siddharth11sehgal/AIPyWebCrawler.git

Navigate to the project folder:
```
cd AIPyWebCrawler
```
Set up a virtual environment (optional but recommended):
```
python3 -m venv venv
```
Activate the virtual environment:
- On macOS/Linux:
```
source venv/bin/activate
```
- On Windows:
```
.\venv\Scripts\activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Once you've installed the dependencies, you can run the web crawler script:
```
python app.py
```
Customize the script with your desired crawling settings, such as the URLs to start from, the depth of the crawl, and the type of data to scrape.

#2 Is not necessary, changes may be made if wanted.

Send a POST request with your url like this:

curl -X POST http://127.0.0.1:8000/summarize \
     -H "Content-Type: application/json" \
     -d '{"url": "https://your-url-here.com"}'

Example:

siddsehgal@111 ~ % curl -X POST http://127.0.0.1:8000/summarize \
     -H "Content-Type: application/json" \
     -d '{"url": "https://example.com"}'      
{
  "summary": "The content is about the use of the domain \"example.com\" for illustrative purposes in documents without needing prior permission. It states that the domain can be freely utilized in literature as an example without coordination or approval. Additional information is available for reference."
}

Contributing

If you'd like to contribute to this project, please fork the repository and create a pull request. Here’s how you can contribute:

Fork the repository
Create your feature branch (git checkout -b feature-name)
Commit your changes (git commit -m 'Add new feature')
Push to the branch (git push origin feature-name)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
crawl.py		crawl.py
requirements.txt		requirements.txt
summarize.py		summarize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Web Crawler

Installation

Usage

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SiddDevCS/AIPyWebCrawler

Folders and files

Latest commit

History

Repository files navigation

AI Web Crawler

Installation

Usage

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages