Skip to content

jc2409/RAG_Raspberry_Pi5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building and Deploying a RAG-enabled Chatbot on Raspberry Pi

Introduction to Retrieval Augmented Generation (RAG)

RAG Raspberry Pi

Retrieval Augmented Generation (RAG) is a powerful method that enhances Large Language Models (LLMs) by combining their generative capabilities with relevant information retrieved from external databases. RAG enables chatbots and similar applications to produce contextually accurate and up-to-date responses by fetching pertinent information from a knowledge base or documents at runtime.

In RAG, documents are embedded into vector representations and stored in a Vector Database (VectorDB). When a user poses a query, the system retrieves the most relevant document embeddings and provides them as context to the LLM, resulting in more informed and precise answers.

Overview

This tutorial demonstrates building a RAG-enabled chatbot optimized for Arm architecture using open-source technologies such as llama-cpp-python and FAISS. Specifically designed for Raspberry Pi 5 (8GB RAM, at least 32GB Disk), the chatbot integrates the Llama-3.1-8B model for document retrieval, leveraging llama-cpp-python's optimized backend for high-performance inference.

Getting Started

First, clone this repository to your Raspberry Pi:

cd ~
git clone https://github.com/jc2409/RAG_Raspberry_Pi5.git
cd RAG_Raspberry_Pi5

Installation

System Dependencies

Run the following commands to install necessary packages:

sudo apt update
sudo apt install python3-pip python3-venv cmake -y

Python Environment

Set up the virtual environment:

python3 -m venv rag-env
source rag-env/bin/activate
pip install -r requirements.txt

llama-cpp-python Installation

Install llama-cpp-python optimized for Arm CPUs in the RAG_Raspberry_Pi5 folder:

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Model Setup

Download LLM Model

Create a models directory in the RAG_Raspberry_Pi5 folder and download the model:

mkdir models
cd models
wget https://huggingface.co/chatpdflocal/llama3.1-8b-gguf/resolve/main/ggml-model-Q4_K_M.gguf

Build and Quantize the Model

Clone and build llama.cpp:

cd ~/RAG_Raspberry_Pi5
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-mcpu=native" -DCMAKE_C_FLAGS="-mcpu=native" -DLLAMA_CURL=OFF
cmake --build . -v --config Release -j $(nproc)

Quantize the model:

cd bin
./llama-quantize --allow-requantize ~/RAG_Raspberry_Pi5/models/ggml-model-Q4_K_M.gguf ~/RAG_Raspberry_Pi5/models/llama3.1-8b-instruct.Q4_0_arm.gguf Q4_0

Testing and Deployment

Test Basic LLM Inference

With your virtual environment active, run llm.py to verify basic inference:

source rag-env/bin/activate
python llm.py

RAG Setup

To set up RAG with your dataset:

  1. Import Sample Data:

Ensure you have a Kaggle API token (Kaggle API instructions), then import data:

python import_data.py
  1. Embed and Store Data in VectorDB:

Run the embedding script:

python vector_embedding.py

⚠️ Note: This step can take several hours, depending on the size of your dataset and hardware.

💡 Tip: You can reduce the time by:

  • Using a smaller dataset
  • Reducing the number of documents to embed
  • Lowering the embedding model size (e.g., switching to a smaller transformer model)
  1. Run the RAG Application:

Test the chatbot with retrieval capabilities:

python rag.py

Your chatbot is now configured to generate informed responses using a combination of embedded documents and the LLM's generative strengths.

Test Results

We evaluated the performance of the RAG-enabled chatbot by comparing responses from two versions of the LLM—one without context (basic LLM) and one utilizing context (RAG-enabled LLM).

When the user asked the question, How long was Lincoln's formal education?, the basic LLM provided an incorrect response of 12 years due to a lack of accurate contextual information. Basic LLM Prompt

Basic LLM response

Resource

In contrast, the RAG-enabled LLM successfully retrieved relevant information from the VectorDB and provided an accurate response based on the retrieved context.

RAG LLM Prompt

Vector DB The data stored in the vector database containing information about Lincoln's education.

RAG LLM Response

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages