Building and Deploying a RAG-enabled Chatbot on Raspberry Pi

Introduction to Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a powerful method that enhances Large Language Models (LLMs) by combining their generative capabilities with relevant information retrieved from external databases. RAG enables chatbots and similar applications to produce contextually accurate and up-to-date responses by fetching pertinent information from a knowledge base or documents at runtime.

In RAG, documents are embedded into vector representations and stored in a Vector Database (VectorDB). When a user poses a query, the system retrieves the most relevant document embeddings and provides them as context to the LLM, resulting in more informed and precise answers.

Overview

This tutorial demonstrates building a RAG-enabled chatbot optimized for Arm architecture using open-source technologies such as llama-cpp-python and FAISS. Specifically designed for Raspberry Pi 5 (8GB RAM, at least 32GB Disk), the chatbot integrates the Llama-3.1-8B model for document retrieval, leveraging llama-cpp-python's optimized backend for high-performance inference.

Getting Started

First, clone this repository to your Raspberry Pi:

cd ~
git clone https://github.com/jc2409/RAG_Raspberry_Pi5.git
cd RAG_Raspberry_Pi5

Installation

System Dependencies

Run the following commands to install necessary packages:

sudo apt update
sudo apt install python3-pip python3-venv cmake -y

Python Environment

Set up the virtual environment:

python3 -m venv rag-env
source rag-env/bin/activate
pip install -r requirements.txt

llama-cpp-python Installation

Install llama-cpp-python optimized for Arm CPUs in the RAG_Raspberry_Pi5 folder:

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Model Setup

Download LLM Model

Create a models directory in the RAG_Raspberry_Pi5 folder and download the model:

mkdir models
cd models
wget https://huggingface.co/chatpdflocal/llama3.1-8b-gguf/resolve/main/ggml-model-Q4_K_M.gguf

Build and Quantize the Model

Clone and build llama.cpp:

cd ~/RAG_Raspberry_Pi5
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-mcpu=native" -DCMAKE_C_FLAGS="-mcpu=native" -DLLAMA_CURL=OFF
cmake --build . -v --config Release -j $(nproc)

Quantize the model:

cd bin
./llama-quantize --allow-requantize ~/RAG_Raspberry_Pi5/models/ggml-model-Q4_K_M.gguf ~/RAG_Raspberry_Pi5/models/llama3.1-8b-instruct.Q4_0_arm.gguf Q4_0

Testing and Deployment

Test Basic LLM Inference

With your virtual environment active, run llm.py to verify basic inference:

source rag-env/bin/activate
python llm.py

RAG Setup

To set up RAG with your dataset:

Import Sample Data:

Ensure you have a Kaggle API token (Kaggle API instructions), then import data:

python import_data.py

Embed and Store Data in VectorDB:

Run the embedding script:

python vector_embedding.py

⚠️ Note: This step can take several hours, depending on the size of your dataset and hardware.

💡 Tip: You can reduce the time by:

Using a smaller dataset

Reducing the number of documents to embed

Lowering the embedding model size (e.g., switching to a smaller transformer model)

Run the RAG Application:

Test the chatbot with retrieval capabilities:

python rag.py

Your chatbot is now configured to generate informed responses using a combination of embedded documents and the LLM's generative strengths.

Test Results

We evaluated the performance of the RAG-enabled chatbot by comparing responses from two versions of the LLM—one without context (basic LLM) and one utilizing context (RAG-enabled LLM).

When the user asked the question, How long was Lincoln's formal education?, the basic LLM provided an incorrect response of 12 years due to a lack of accurate contextual information.

In contrast, the RAG-enabled LLM successfully retrieved relevant information from the VectorDB and provided an accurate response based on the retrieved context.

The data stored in the vector database containing information about Lincoln's education.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.gitignore		.gitignore
README.md		README.md
import_data.py		import_data.py
llm.py		llm.py
rag.py		rag.py
requirements.txt		requirements.txt
vector_embedding.py		vector_embedding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building and Deploying a RAG-enabled Chatbot on Raspberry Pi

Introduction to Retrieval Augmented Generation (RAG)

Overview

Getting Started

Installation

System Dependencies

Python Environment

llama-cpp-python Installation

Model Setup

Download LLM Model

Build and Quantize the Model

Testing and Deployment

Test Basic LLM Inference

RAG Setup

Test Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

jc2409/RAG_Raspberry_Pi5

Folders and files

Latest commit

History

Repository files navigation

Building and Deploying a RAG-enabled Chatbot on Raspberry Pi

Introduction to Retrieval Augmented Generation (RAG)

Overview

Getting Started

Installation

System Dependencies

Python Environment

llama-cpp-python Installation

Model Setup

Download LLM Model

Build and Quantize the Model

Testing and Deployment

Test Basic LLM Inference

RAG Setup

Test Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages