This project implements a website agent inspired by Cole Medin's YouTube channel. It consists of three main scripts:
website_agent.py
: This script contains the core logic of the agent. It uses agentic RAG (Retrieval-Augmented Generation) to process user queries, retrieve relevant information from a vector database, and generate intelligent responses.streamlit_ui.py
: This script creates a Streamlit-based web interface for interacting with the agent. It allows users to input queries and view the agent's responses in a user-friendly manner.scrape_pages.py
: This script is responsible for crawling documentation websites and extracting content from specified websites. It stores the extracted content in a vector database for later retrieval by the agent.
The agent works as follows:
- The
scrape_pages.py
script crawls specified documentation websites and extracts their content. - The extracted content is stored in a vector database.
- When a user enters a query through the Streamlit interface (
streamlit_ui.py
), thewebsite_agent.py
script retrieves relevant information from the vector database. - The agent uses the retrieved information to generate a response to the user's query.
- Python 3.11+
- Supabase: Account and database
- OpenAI: API key
- Streamlit: For the web interface
- crawl4ai: For web scraping
- Pydantic AI: For data validation and AI models
To configure the application, you need to create a .env
file in the root directory. You can use the .env.example
file as a template.
The following environment variables need to be configured:
LLM_API_PROVIDER
: The LLM API provider to use. Example: OpenAILLM_API_MODEL
: The LLM model to use. Example: gpt-4o-miniLLM_API_KEY
: The API key for the LLM provider. Only needed if you are not using local models.SUPABASE_PROJECT_URL
: The URL of your Supabase project.SUPABASE_PROJECT_SERVICE_ROLE_SECRET
: The service role secret of your Supabase project.SCRAP_TARGET_NAME
: The name of the target to scrape. Example: touristaSCRAP_TARGET_SITEMAP_URL
: The URL of the sitemap to scrape. Example: https://tourista.co/sitemap.xmlSCRAP_TARGET_BASE_URL
: The base URL of the website to scrape. Example: https://tourista.coKNOWLEDGE_BASE_NAME
: The name of the table. Example: knowledge_base
conda create -n website_agent
conda activate website_agent
conda env export > requirements.yml
conda env create -f requirements.yml
python -m venv website_agent
- On Windows:
website_agent\Scripts\activate
- On macOS/Linux:
source website_agent/bin/activate
- Create a
requirements.txt
file with your dependencies. - Run:
pip install -r requirements.txt
pip freeze > requirements.txt
for pip packages from a conda environment:
pip list --format=freeze > requirements.txt
- Create a new virtual environment as shown above.
- Activate the environment.
- Run:
pip install -r requirements.txt