RAG-based chatbot implementation using Graph RAG (Retrieval Augmented Generation) with Streamlit UI. This implementation uses Microsoft Research's GraphRAG approach, providing superior context awareness and reasoning capabilities compared to traditional RAG systems.
Traditional RAG systems, while useful, face several limitations when dealing with complex information retrieval and reasoning tasks. GraphRAG addresses these limitations through a structured, hierarchical approach:
- Limited Connection Synthesis: Basic RAG struggles to "connect the dots" between related pieces of information that aren't explicitly linked.
- Poor Holistic Understanding: Traditional approaches have difficulty comprehending and summarizing broad semantic concepts across large document collections.
- Context Loss: Simple vector similarity search can miss important contextual relationships.
-
Knowledge Graph Structure:
- Creates an LLM-generated knowledge graph from your input corpus.
- Captures complex relationships between entities.
- Enables traversal of related concepts through shared attributes.
-
Hierarchical Understanding:
- Uses the Leiden technique for hierarchical clustering.
- Generates community-level summaries.
- Provides both granular and holistic views of information.
-
Advanced Query Processing:
- Global Search: For reasoning about holistic questions across the entire corpus.
- Local Search: For a detailed exploration of specific entities and their relationships.
-
Enhanced Context Window:
- Use community summaries for better context.
- Maintains relationship awareness during queries.
- Improves synthesis of information from multiple sources.
- Python 3.8 or higher
- Git
- OpenAI API key
- Clone the repository
git clone [email protected]:Saifullah3711/graph_rag.git
cd graph_rag
- Install required dependencies
pip install -r requirements.txt
- Set up
secrets.toml
- Create a
.streamlit
folder in the root directory. - Inside
.streamlit
, create asecrets.toml
file. - Add your OpenAI API key to the
secrets.toml
file:
[general]
GRAPHRAG_API_KEY = "your_openai_api_key"
This setup is especially useful for deploying the app to Streamlit Cloud.
To deploy the app on Streamlit Cloud for demo purposes:
- Ensure your
secrets.toml
file is correctly set up as described above. - Push your code to a GitHub repository.
- Link your repository to Streamlit Cloud and deploy the app.
To run the chatbot locally:
streamlit run st_chatbot.py
This will launch the Streamlit interface in your default web browser.
To use your own data with the chatbot, follow these steps:
-
Prepare Your Data:
- Create an
input
folder in the root directory. - Place your text documents (
.txt
files) in this folder.
- Create an
-
Initialize GraphRAG:
python -m graphrag.index --init --root .
- Index Your Documents:
python -m graphrag.index --root .
- Add More Documents (Optional):
- Add new
.txt
files to theinput
folder. - Re-run the indexing process:
- Add new
python -m graphrag.index --root .
- Launch the Chatbot:
streamlit run st_chatbot.py
GraphRAG follows a sophisticated process:
-
Indexing Phase:
- Document Slicing: Breaks down documents into manageable TextUnits.
- Entity Extraction: Identifies key entities, relationships, and claims.
- Clustering: Groups related information using the Leiden technique.
- Summary Generation: Creates hierarchical summaries of communities.
-
Query Phase:
- Global Search: For corpus-wide understanding.
- Local Search: For entity-specific exploration.
- Context Enhancement: Uses community structures for better responses.
- Interactive Streamlit UI.
- Graph-based retrieval.
- Hierarchical information processing.
- Community-aware responses.
- Source tracking.
- Context-rich answers.
- Entity relationship mapping.
- Currently supports only
.txt
files. - Requires OpenAI API key, which could be costly for large documents.
- Initial processing time for large documents.
- Resource intensive for very large datasets.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any issues or have questions, please open an issue in the GitHub repository.