Web site agent using agentic RAG and scraping

Overview

This project implements a website agent inspired by Cole Medin's YouTube channel. It consists of three main scripts:

website_agent.py: This script contains the core logic of the agent. It uses agentic RAG (Retrieval-Augmented Generation) to process user queries, retrieve relevant information from a vector database, and generate intelligent responses.
streamlit_ui.py: This script creates a Streamlit-based web interface for interacting with the agent. It allows users to input queries and view the agent's responses in a user-friendly manner.
scrape_pages.py: This script is responsible for crawling documentation websites and extracting content from specified websites. It stores the extracted content in a vector database for later retrieval by the agent.

The agent works as follows:

The scrape_pages.py script crawls specified documentation websites and extracts their content.
The extracted content is stored in a vector database.
When a user enters a query through the Streamlit interface (streamlit_ui.py), the website_agent.py script retrieves relevant information from the vector database.
The agent uses the retrieved information to generate a response to the user's query.

Prerequisites

Python 3.11+
Supabase: Account and database
OpenAI: API key
Streamlit: For the web interface
crawl4ai: For web scraping
Pydantic AI: For data validation and AI models

Configuration

To configure the application, you need to create a .env file in the root directory. You can use the .env.example file as a template.

The following environment variables need to be configured:

LLM_API_PROVIDER: The LLM API provider to use. Example: OpenAI
LLM_API_MODEL: The LLM model to use. Example: gpt-4o-mini
LLM_API_KEY: The API key for the LLM provider. Only needed if you are not using local models.
SUPABASE_PROJECT_URL: The URL of your Supabase project.
SUPABASE_PROJECT_SERVICE_ROLE_SECRET: The service role secret of your Supabase project.
SCRAP_TARGET_NAME: The name of the target to scrape. Example: tourista
SCRAP_TARGET_SITEMAP_URL: The URL of the sitemap to scrape. Example: https://tourista.co/sitemap.xml
SCRAP_TARGET_BASE_URL: The base URL of the website to scrape. Example: https://tourista.co
KNOWLEDGE_BASE_NAME: The name of the table. Example: knowledge_base

Conda Environment Setup

Create a new conda environment

conda create -n website_agent

Activate the environment

conda activate website_agent

Export configuration to a file

conda env export > requirements.yml

Create a new conda environment from the config file

conda env create -f requirements.yml

Pip Environment Setup

Create a new virtual environment

python -m venv website_agent

Activate the environment

On Windows:
```
website_agent\Scripts\activate
```
On macOS/Linux:
```
source website_agent/bin/activate
```

Install dependencies from a requirements file

Create a requirements.txt file with your dependencies.
Run:
```
pip install -r requirements.txt
```

Export the environment configuration to a file

pip freeze > requirements.txt

for pip packages from a conda environment:

pip list --format=freeze > requirements.txt

Create a new virtual environment and install dependencies from the `requirements.txt` file

Create a new virtual environment as shown above.
Activate the environment.
Run:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
db_scripts		db_scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrape_pages.py		scrape_pages.py
streamlit_ui.py		streamlit_ui.py
website_agent.py		website_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web site agent using agentic RAG and scraping

Overview

Prerequisites

Configuration

Conda Environment Setup

Create a new conda environment

Activate the environment

Export configuration to a file

Create a new conda environment from the config file

Pip Environment Setup

Create a new virtual environment

Activate the environment

Install dependencies from a requirements file

Export the environment configuration to a file

Create a new virtual environment and install dependencies from the `requirements.txt` file

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

jaky1206/website-agent-using-agentic-rag

Folders and files

Latest commit

History

Repository files navigation

Web site agent using agentic RAG and scraping

Overview

Prerequisites

Configuration

Conda Environment Setup

Create a new conda environment

Activate the environment

Export configuration to a file

Create a new conda environment from the config file

Pip Environment Setup

Create a new virtual environment

Activate the environment

Install dependencies from a requirements file

Export the environment configuration to a file

Create a new virtual environment and install dependencies from the requirements.txt file

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Create a new virtual environment and install dependencies from the `requirements.txt` file

Packages