Talk to your website

Overview

The project in this repository consists of a fullstack LLM chat system with retrieval-augmented generation (RAG) features. After giving a arbitrary username in the login page, you will be redirected to a simple chat interface, where you can create a new conversation with a LLM model for the context of a website of your choice. The LLM model will answer you questions based on the content provided by the website.

After the indexing of the website, you will be able to ask questions and to keep a conversation that evolves based on your interaction and follow-up questions. You can create multiple conversations about the same website, without mixing the conversation context. You can also make the login with different usernames to be presented to single-user session chats.

Implementation

Tech stack

LLM provider: OpenAI
- Default model: gpt-4o-mini
- Vector database: ChromaDB 0.5.23
Backend: Python 3.12.5
- Web server: Flask 3.0.3
- RAG: Langchain 0.2.13
Frontend: Node 20.16.0
- Framework: Nuxt.js 3.12.4
- UI components: Vuetify 3.6.14

Langchain

The communication with the LLM model, as well as the history, chat and RAG features, were implemented using the Langchain library for Python. In summary, we index the website contents in ChromaDB and then use a retrieval interface to get the documents context. Chat sessions and history are stored in an SQLite file.

To have unique ChromaDB collections for each website, I used SHA256 hashes of their URLs (referred as url_hashs) as indexes. Also, for identifying unique chat sessions, I used a combination of the url_hash, user_id and the session's database primary key to create session_ids for indexes.

Backend / API

This implementation provides two major endpoints, index_url and ask.

The POST /api/index_url endpoint expects an URL in the request body key url. This URL will be indexed by the model and stored in ChromaDB.
The POST /api/ask endpoint expects an URL in the request body key url. It also expects a message and a session_id.
Other then these, the GET /api/sessions, POST /api/sessions, GET /api/login and GET /api/messages endpoints were implemented because of the project needs.

Docker

The project was divided into three containers: backend, frontend and chromadb. By setting the .env file with your environment variables, as present in the example.env file, and running the docker compose up command, you are able to run it.

References

Langchain Docs [1] [2] [3] [4] [5] [6] [7] [8] [9]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
example.env		example.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Talk to your website

Overview

Implementation

Tech stack

Langchain

Backend / API

Docker

References

About

Uh oh!

Languages

pedroboechat/talk-to-website

Folders and files

Latest commit

History

Repository files navigation

Talk to your website

Overview

Implementation

Tech stack

Langchain

Backend / API

Docker

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages