Skip to content

hoanglong8/Chatbot-RAG

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 

Repository files navigation

RAG All-in-one

Hello there! πŸ‘‹

I'm Long Le, a Machine Learning Engineer passionate about building AI systems. This repository is my collection of RAG (Retrieval-Augmented Generation) resources to help you build powerful AI applications.

Feel free to connect with me on social media to discuss AI, machine learning, or this project:

LinkedIn | GitHub

Introduction

RAG All-in-one is a guide to building Retrieval-Augmented Generation (RAG) applications. It offers a collection of tools, libraries, and frameworks for RAG systems, with explanations of key components and recommendations for effective implementation.

RAG Architecture Diagram

RAG Architecture

RAG Components βœ…

Component Description
πŸ“„ Document Ingestor Tools for ingesting and processing raw documents. Document loaders, parsers, and preprocessing tools
πŸ€– Agent Framework End-to-end frameworks for building RAG applications. Unified solutions for RAG implementation
πŸ“€ Database Databases optimized for storing and searching vector embeddings. Vector storage, similarity search, and indexing
πŸ’» LLM Large Language Models for generating responses. LLM providers and frameworks
πŸ“ Embedding Models and services for creating text embeddings. Embedding models and APIs
πŸ–₯️ LLM Observability Tools for monitoring and analyzing LLM performance. Logging, tracing, and analytics
πŸ“• Prompt Techniques Methods for effective prompt engineering. Prompt templates and frameworks
πŸ€” Evaluation Tools for assessing RAG system performance. Metrics and evaluation frameworks

Document Ingestor

Tools and libraries for ingesting various document formats, extracting text, and preparing data for further processing.

Library Description Link GitHub Stars 🌟
LangChain Document Loaders Comprehensive set of document loaders for various file types GitHub GitHub stars
LlamaIndex Parser Flexible document parsing and chunking capabilities for various file formats GitHub GitHub stars
Docling Document processing tool that parses diverse formats with advanced PDF understanding and AI integrations GitHub GitHub stars
Unstructured Library for pre-processing and extracting content from raw documents Github GitHub stars
PyPDF Library for reading and manipulating PDF files GitHub GitHub stars
PyMuPDF A Python binding for MuPDF, offering fast PDF processing capabilities GitHub GitHub stars
MegaParse Versatile parser for text, PDFs, PowerPoint, and Word documents with lossless information extraction GitHub GitHub stars
Adobe PDF Extract A service provided by Adobe for extracting content from PDF documents Link
Azure AI Document Intelligence A service provided by Azure for extracting content including text, tables, images from PDF documents Link

Agent Framework

End-to-end frameworks that provide integrated solutions for building RAG applications.

Library Description Link GitHub Stars 🌟
LangChain Framework for building applications with LLMs and integrating with various data sources GitHub GitHub stars
LlamaIndex Data framework for building RAG systems with structured data GitHub GitHub stars
Haystack End-to-end framework for building NLP pipelines GitHub GitHub stars
SmolAgents A barebones library for agents GitHub GitHub stars
txtai Open-source embeddings database for semantic search and LLM workflows GitHub GitHub stars
Pydantic AI Agent Framework / shim to use Pydantic with LLMs GitHub GitHub stars
OpenAI Agent A lightweight, powerful framework for multi-agent workflows GitHub GitHub stars

Vector Database

Databases optimized for storing and efficiently searching vector embeddings/text documents.

Database Description Link GitHub Stars 🌟
FAISS Efficient similarity search library from Facebook AI Research GitHub GitHub stars
Milvus Open-source vector database GitHub GitHub stars
Qdrant Vector similarity search engine GitHub GitHub stars
Chroma Open-source embedding database designed for RAG applications GitHub GitHub stars
Weaviate Open-source vector search engine GitHub GitHub stars
LanceDB Developer-friendly, embedded retrieval engine for multimodal AI GitHub GitHub stars
Pinecone Managed vector database for semantic search Link
MongoDB General-purpose document database Link
Elasticsearch Search and analytics engine that can store documents Link

LLM

Large Language Models and platforms for generating responses based on retrieved context.

LLM Description Link
OpenAI API Access to GPT models through API Link
Claude Anthropic's Claude series of LLMs Link
Hugging Face LLM Models Platform for open-source NLP models Link
LLaMA Meta's open-source large language model Link
Mistral Open-source and commercial models Link
Cohere API access to generative and embedding models Link
DeepSeek Advanced large language models for various applications Link
Qwen Alibaba Cloud's large language model accessible via API Link
Ollama Run open-source LLMs locally Link

Embedding

Models and services for creating vector representations of text.

Embedding Solution Description Link
OpenAI Embeddings API for text-embedding-ada-002 and newer models Link
Sentence Transformers Python framework for state-of-the-art sentence embeddings Link
Cohere Embed Specialized embedding models API Link
Hugging Face Embeddings Various embedding models Link
E5 Embeddings Microsoft's text embeddings Link
BGE Embeddings BAAI general embeddings Link

LLM Observability

Tools for monitoring, analyzing, and improving LLM applications.

Library Description Link GitHub Stars 🌟
Langfuse Open source LLM engineering platform GitHub GitHub stars
Opik/Comet Debug, evaluate, and monitor LLM applications with tracing, evaluations, and dashboards GitHub GitHub stars
Phoenix/Arize Open-source observability for LLM applications GitHub GitHub stars
Helicone Open source LLM observability platform. One line of code to monitor, evaluate, and experiment GitHub GitHub stars
Openlit Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground GitHub GitHub stars
Lunary The production toolkit for LLMs. Observability, prompt management and evaluations. GitHub GitHub stars
Langtrace OpenTelemetry-based observability tool for LLM applications with real-time tracing and metrics GitHub GitHub stars

Prompt Techniques

Methods and frameworks for effective prompt engineering in RAG systems.

Open Source Prompt Engineering Tools

Library Description Link GitHub Stars 🌟
Prompt Engineering Guide Comprehensive guide to prompt engineering GitHub GitHub stars
DSPy Framework for programming language models instead of prompting GitHub GitHub stars
Guidance Language for controlling LLMs GitHub GitHub stars
LLMLingua Prompt compression library for faster LLM inference GitHub GitHub stars
Promptify NLP task prompt generator for GPT, PaLM and other models GitHub GitHub stars
PromptSource Toolkit for creating and sharing natural language prompts GitHub GitHub stars
Promptimizer Library for optimizing prompts GitHub GitHub stars
Selective Context Context compression tool for doubling LLM content processing GitHub GitHub stars
betterprompt Testing suite for LLM prompts before production GitHub GitHub stars

Documentation & Services

Resource Description Link
OpenAI Prompt Engineering Official guide to prompt engineering from OpenAI Link
LangChain Prompts Templates and composition tools for prompts Link
PromptPerfect Tool for optimizing prompts Link

Evaluation

Tools and frameworks for assessing and improving RAG system performance.

Library Description Link Github Stars 🌟
FastChat Open platform for training, serving, and evaluating LLM-based chatbots Github GitHub stars
OpenAI Evals Framework for evaluating LLMs and LLM systems GitHub GitHub stars
RAGAS Ultimate toolkit for evaluating and optimizing RAG systems GitHub GitHub stars
Promptfoo Open-source tool for testing and evaluating prompts GitHub GitHub stars
DeepEval Comprehensive evaluation library for LLM applications GitHub GitHub stars
Giskard Open-source evaluation and testing for ML & LLM systems Github GitHub stars
PromptBench Unified evaluation framework for large language models Github GitHub stars
TruLens Evaluation and tracking for LLM experiments with RAG-specific metrics GitHub GitHub stars
EvalPlus Rigorous evaluation framework for LLM4Code Github GitHub stars
LightEval All-in-one toolkit for evaluating LLMs Github GitHub stars
LangTest Test suite for comparing LLM models on accuracy, bias, fairness and robustness Github GitHub stars
AgentEvals Evaluators and utilities for measuring agent performance Github GitHub stars

About

🧠 Guide to Building RAG (Retrieval-Augmented Generation) Applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published