I'm Long Le, a Machine Learning Engineer passionate about building AI systems. This repository is my collection of RAG (Retrieval-Augmented Generation) resources to help you build powerful AI applications.
Feel free to connect with me on social media to discuss AI, machine learning, or this project:
RAG All-in-one is a guide to building Retrieval-Augmented Generation (RAG) applications. It offers a comprehensive collection of tools, libraries, and frameworks for RAG systems, organized by key components of the RAG architecture. This resource serves as a centralized directory to help you discover the most relevant technologies for each part of your RAG pipeline.
| Component | Description |
|---|---|
| π Courses and Learning Materials | Comprehensive courses and learning resources for mastering RAG systems |
| π Document Ingestor | Tools for ingesting and processing raw documents. Document loaders, parsers, and preprocessing tools |
| βοΈ Chunking Techniques | Methods and tools for breaking down documents into manageable pieces for processing and retrieval |
| π Retrieval | Advanced techniques and methods for retrieving relevant information in RAG systems using LlamaIndex |
| π Query Transform | Advanced techniques for improving query quality and retrieval effectiveness in RAG systems |
| π€ Agent Framework | End-to-end frameworks for building RAG applications. Unified solutions for RAG implementation |
| π Database | Databases optimized for storing and searching vector embeddings. Vector storage, similarity search, and indexing |
| π» LLM | Large Language Models for generating responses. LLM providers and frameworks |
| π Embedding | Models and services for creating text embeddings. Embedding models and APIs |
| π§ Fine-tuning | Tools and techniques for customizing LLMs to specific domains or tasks |
| π₯οΈ LLM Observability | Tools for monitoring and analyzing LLM performance. Logging, tracing, and analytics |
| π Prompt Techniques | Methods for effective prompt engineering. Prompt templates and frameworks |
| π€ Evaluation | Tools for assessing RAG system performance. Metrics and evaluation frameworks |
| πΊ User Interface | Tools for building interactive AI interfaces. UI frameworks for RAG applications |
| π Complete RAG Applications | Ready-to-use, comprehensive RAG systems that integrate various components of the RAG stack |
Comprehensive courses and learning resources for mastering RAG systems.
| Course Name | Platform | Description | Link | Level |
|---|---|---|---|---|
| Building and Evaluating Advanced RAG Applications | Advanced retrieval methods, sentence-window retrieval, auto-merging retrieval, and evaluation metrics | Link | Beginner | |
| Learn RAG with LLMWare | Fundamentals of RAG, parsing, embeddings, prompting, and semantic querying | Link | Beginner to Intermediate | |
| Learn AI Skills: RAG Course | Basics to end-to-end RAG system creation with code examples | Link | Beginner to Advanced | |
| Building Multimodal Search and RAG | Multimodal RAG systems using contrastive learning for images, audio, video alongside text data | Link | Intermediate | |
| AI Enhancement with Knowledge Graphs - Mastering RAG Systems | Integrating Knowledge Graphs with RAG systems for improved contextual understanding | Link | Intermediate | |
| Introduction to RAG | Building an end-to-end RAG system with Pandas, SentenceTransformers, Qdrant, and LLMs | Link | Intermediate | |
| Retrieval Augmented Generation (RAG) with LangChain | Using LangChain for integrating external data into LLMs, text splitting, embeddings | Link | Intermediate | |
| Retrieval Augmented Generation (RAG) for Developers | Modular RAGs, optimizing retrieval techniques, relevance ranking | Link | Intermediate | |
| Retrieval Augmented Generation LlamaIndex & LangChain Course | Production-oriented RAG systems using LlamaIndex and Deep Lake | Link | Intermediate to Advanced | |
| The RAG Bootcamp | Retrieval systems, multimodal RAG applications, and hands-on projects | Link | Intermediate to Advanced | |
| AI: Advanced RAG | Enterprise-grade RAG techniques, embedding strategies, document processing | Link | Advanced | |
| Advanced Retrieval-Augmented Generation (RAG) for Large Language Models | Advanced embedding strategies, hybrid search systems, sparse indexing | Link | Advanced |
| Book | Description | Link |
|---|---|---|
| Building LLMs in Production | Covers building reliable and scalable LLM applications with a focus on RAG systems. Includes techniques for fine-tuning, evaluation, and deployment of production-ready LLM applications. | Link |
| Hands-on RAG Development | Focuses on using LangChain and Python to create RAG applications. Covers foundational concepts, practical tutorials, and real-world applications. Includes topics like packaging, deployment, and the future of RAG systems. | Link |
| Enterprise RAG | Aimed at building scalable, production-ready RAG systems for enterprise use. Discusses advanced techniques like agent-based retrieval, query rewriting, and handling hallucinations. Includes practical advice for deploying and maintaining RAG systems. | Link |
| Knowledge Graph-Enhanced RAG | Focuses on integrating knowledge graphs into RAG systems for improved accuracy and traceability. Offers hands-on techniques for building tools like vector similarity search and knowledge graphs. | Link |
| A Simple Guide to Retrieval Augmented Generation | Designed for beginners, this book simplifies RAG concepts and provides step-by-step guidance. Covers modular approaches, LangChain integration, and reducing hallucinations in LLMs. | Link |
| RAG-Driven Generative AI | Comprehensive guide to building RAG systems with practical implementations and real-world use cases. | Link |
Tools and libraries for ingesting various document formats, extracting text, and preparing data for further processing.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| LangChain Document Loaders | Comprehensive set of document loaders for various file types | GitHub | |
| LlamaIndex Reader | Flexible document parsing and chunking capabilities for various file formats | GitHub | |
| Docling | Document processing tool that parses diverse formats with advanced PDF understanding and AI integrations | GitHub | |
| Unstructured | Library for pre-processing and extracting content from raw documents | GitHub | |
| PyPDF | Library for reading and manipulating PDF files | GitHub | |
| PyMuPDF | A Python binding for MuPDF, offering fast PDF processing capabilities | GitHub | |
| MegaParse | Versatile parser for text, PDFs, PowerPoint, and Word documents with lossless information extraction | GitHub | |
| Adobe PDF Extract | A service provided by Adobe for extracting content from PDF documents | Link | |
| Azure AI Document Intelligence | A service provided by Azure for extracting content including text, tables, images from PDF documents | Link |
Methods and tools for breaking down documents into manageable pieces for processing and retrieval.
| Technique | Description | Link | Code Example |
|---|---|---|---|
| Fixed Size Chunking | Splits text into chunks of specified character length. Simple and computationally efficient. Key concepts: chunk size, overlap, separator. | Link | Code Example |
| Recursive Chunking | Hierarchically divides text using multiple separators in sequence. Respects text structure by recursively applying different separators. | Link | Code Example |
| Document Based Chunking | Splits content according to document's inherent structure (headers, code blocks, tables, etc.). Format-aware chunking for Markdown, Python, JS, etc. | Link | Code Example |
| Semantic Chunking | Creates chunks based on semantic similarity rather than size. Keeps related content together by analyzing embedding similarity at potential breakpoints. | Link | Code Example |
| Agentic Chunking | Uses LLM-based agents to intelligently determine chunk boundaries based on context and content. Can identify standalone propositions for optimal chunking. | Link | Code Example |
Advanced techniques and methods for retrieving relevant information in RAG systems using LlamaIndex.
| Technique | Description | Implementation | Link |
|---|---|---|---|
| Fusion Retrieval | Combines different retrieval methods for more comprehensive results | - Hybrid retriever combining keyword and vector search - Customizable weighting between retrieval methods |
Link |
| Intelligent Reranking | Advanced scoring mechanisms to improve relevance ranking | - LLM-based reranking with custom prompts - Metadata-aware reranking - Cross-encoder reranking |
Link |
| Multi-faceted Filtering | Various filtering techniques to refine results | - Metadata filtering with custom filters - Similarity threshold filtering - Content-based filtering |
|
| Hierarchical Indices | Multi-tiered system for efficient information navigation | - Summary index for high-level overview - Document index for detailed retrieval - Recursive retrieval across indices |
Link |
| Ensemble Retrieval | Combines multiple retrieval models for robust results | - Multiple embedding models - Customizable ensemble strategies - Weighted voting mechanisms |
Link |
| Dartboard Retrieval | Optimizes for both relevance and diversity | - Combined scoring function - Direct optimization for information gain |
Link |
| Multi-modal Retrieval | Handles diverse data types for richer responses | - Image-to-text retrieval - Multi-modal embeddings - Cross-modal similarity search |
Link |
Advanced techniques for improving query quality and retrieval effectiveness in RAG systems.
| Technique | Description | Implementation | Link |
|---|---|---|---|
| Query Transformation | Transforms user queries to improve retrieval effectiveness by generating multiple variations or reformulations of the original query | - Multi-query generation - Query rewriting - RAG-Fusion with reciprocal rank fusion |
Link |
| Hypothetical Questions (HyDE) | Generates hypothetical documents that answer the query, then uses these for similarity search to improve retrieval | - LLM-based hypothetical document generation - Embedding-based similarity search - Combined retrieval with original query |
Link |
End-to-end frameworks that provide integrated solutions for building RAG applications.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| LangChain | Framework for building applications with LLMs and integrating with various data sources | GitHub | |
| LlamaIndex | Data framework for building RAG systems with structured data | GitHub | |
| Haystack | End-to-end framework for building NLP pipelines | GitHub | |
| SmolAgents | A barebones library for agents | GitHub | |
| txtai | Open-source embeddings database for semantic search and LLM workflows | GitHub | |
| Pydantic AI | Agent Framework / shim to use Pydantic with LLMs | GitHub | |
| OpenAI Agent | A lightweight, powerful framework for multi-agent workflows | GitHub |
Databases optimized for storing and efficiently searching vector embeddings/text documents.
| Database | Description | Link | GitHub Stars π |
|---|---|---|---|
| FAISS | Efficient similarity search library from Facebook AI Research | GitHub | |
| Milvus | Open-source vector database | GitHub | |
| Qdrant | Vector similarity search engine | GitHub | |
| Chroma | Open-source embedding database designed for RAG applications | GitHub | |
| pgvector | Open-source vector similarity search for Postgres | GitHub | |
| Weaviate | Open-source vector search engine | GitHub | |
| LanceDB | Developer-friendly, embedded retrieval engine for multimodal AI | GitHub | |
| Pinecone | Managed vector database for semantic search | Link | |
| MongoDB | General-purpose document database | Link | |
| Elasticsearch | Search and analytics engine that can store documents | Link |
Large Language Models and platforms for generating responses based on retrieved context.
| LLM | Description | Link |
|---|---|---|
| OpenAI API | Access to GPT models through API | Link |
| Claude | Anthropic's Claude series of LLMs | Link |
| Hugging Face LLM Models | Platform for open-source NLP models | Link |
| LLaMA | Meta's open-source large language model | Link |
| Mistral | Open-source and commercial models | Link |
| Cohere | API access to generative and embedding models | Link |
| DeepSeek | Advanced large language models for various applications | Link |
| Qwen | Alibaba Cloud's large language model accessible via API | Link |
| Ollama | Run open-source LLMs locally | Link |
Models and services for creating vector representations of text.
| Embedding Solution | Description | Link |
|---|---|---|
| OpenAI Embeddings | API for text-embedding-ada-002 and newer models | Link |
| Sentence Transformers | Python framework for state-of-the-art sentence embeddings | Link |
| Cohere Embed | Specialized embedding models API | Link |
| Hugging Face Embeddings | Various embedding models | Link |
| E5 Embeddings | Microsoft's text embeddings | Link |
| BGE Embeddings | BAAI general embeddings | Link |
Tools and techniques for customizing LLMs to specific domains or tasks.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| OpenAI Fine-tuning | API for fine-tuning OpenAI models on custom datasets | Link | |
| LLaMA Factory | Unified fine-tuning framework for LLMs with various methods and datasets | GitHub | |
| Unsloth | Efficient fine-tuning for LLMs with 2-5x speedup and reduced memory usage | GitHub | |
| PEFT | Parameter-Efficient Fine-Tuning methods like LoRA, QLoRA, and Adapters | GitHub | |
| TRL | Transformer Reinforcement Learning library for RLHF, DPO, and PPO fine-tuning | GitHub | |
| LitGPT | Lightweight implementation for fine-tuning LLMs with PyTorch Lightning | GitHub | |
| Axolotl | User-friendly tool for fine-tuning LLMs with support for multiple architectures | GitHub | |
| Mergekit | Tools for merging multiple fine-tuned LLMs into a single model | GitHub |
Tools for monitoring, analyzing, and improving LLM applications.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| MLflow | Platform for managing the ML lifecycle, including tracking experiments, packaging code, and model deployment with LLM tracking capabilities | GitHub | |
| Langfuse | Open source LLM engineering platform | GitHub | |
| Opik/Comet | Debug, evaluate, and monitor LLM applications with tracing, evaluations, and dashboards | GitHub | |
| Phoenix/Arize | Open-source observability for LLM applications | GitHub | |
| Helicone | Open source LLM observability platform. One line of code to monitor, evaluate, and experiment | GitHub | |
| Openlit | Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground | GitHub | |
| Lunary | The production toolkit for LLMs. Observability, prompt management and evaluations. | GitHub | |
| Langtrace | OpenTelemetry-based observability tool for LLM applications with real-time tracing and metrics | GitHub |
Methods and frameworks for effective prompt engineering in RAG systems.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| Prompt Engineering Guide | Comprehensive guide to prompt engineering | GitHub | |
| DSPy | Framework for programming language models instead of prompting | GitHub | |
| Guidance | Language for controlling LLMs | GitHub | |
| LLMLingua | Prompt compression library for faster LLM inference | GitHub | |
| Promptify | NLP task prompt generator for GPT, PaLM and other models | GitHub | |
| PromptSource | Toolkit for creating and sharing natural language prompts | GitHub | |
| Promptimizer | Library for optimizing prompts | GitHub | |
| Selective Context | Context compression tool for doubling LLM content processing | GitHub | |
| betterprompt | Testing suite for LLM prompts before production | GitHub |
| Resource | Description | Link |
|---|---|---|
| OpenAI Prompt Engineering | Official guide to prompt engineering from OpenAI | Link |
| LangChain Prompts | Templates and composition tools for prompts | Link |
| PromptPerfect | Tool for optimizing prompts | Link |
Tools and frameworks for assessing and improving RAG system performance.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| FastChat | Open platform for training, serving, and evaluating LLM-based chatbots | GitHub | |
| OpenAI Evals | Framework for evaluating LLMs and LLM systems | GitHub | |
| RAGAS | Ultimate toolkit for evaluating and optimizing RAG systems | GitHub | |
| Promptfoo | Open-source tool for testing and evaluating prompts | GitHub | |
| DeepEval | Comprehensive evaluation library for LLM applications | GitHub | |
| Giskard | Open-source evaluation and testing for ML & LLM systems | GitHub | |
| PromptBench | Unified evaluation framework for large language models | GitHub | |
| TruLens | Evaluation and tracking for LLM experiments with RAG-specific metrics | GitHub | |
| EvalPlus | Rigorous evaluation framework for LLM4Code | GitHub | |
| LightEval | All-in-one toolkit for evaluating LLMs | GitHub | |
| LangTest | Test suite for comparing LLM models on accuracy, bias, fairness and robustness | GitHub | |
| AgentEvals | Evaluators and utilities for measuring agent performance | GitHub |
Tools and frameworks for building interactive user interfaces for RAG applications.
| Library | Description | Link | GitHub Stars π |
|---|---|---|---|
| Streamlit | Turn data scripts into shareable web apps in minutes | GitHub | |
| Gradio | Build and share user interfaces for machine learning models | GitHub | |
| Chainlit | Build Python LLM apps with minimal effort | GitHub | |
| SimpleAIChat | Lightweight Python package for creating AI chat interfaces | GitHub |
Ready-to-use, comprehensive RAG applications that integrate various components of the RAG stack.
| Application | Description | Link | GitHub Stars π |
|---|---|---|---|
| RAGFlow | Open-source RAG engine based on deep document understanding for truthful question-answering with citations | GitHub | |
| AnythingLLM | All-in-one Desktop & Docker AI application with built-in RAG, AI agents, and a no-code agent builder | GitHub | |
| Kotaemon | Clean & customizable RAG UI for chatting with documents, built for both end users and developers | GitHub | |
| Verba | Fully-customizable personal assistant utilizing RAG for querying and interacting with your data, powered by Weaviate | GitHub |
