This repository contains a GraphRAG (Graph-enhanced Retrieval-Augmented Generation) implementation for a healthcare company's product catalog using the official Microsoft GraphRAG package. The implementation automatically extracts entities and relationships from unstructured text documents to build a knowledge graph, which is then used to enhance retrieval and answer questions.
GraphRAG is an approach that combines the strengths of knowledge graphs with retrieval-augmented generation. It addresses limitations of traditional RAG systems by:
- Automatically extracting entities and relationships from documents
- Building a knowledge graph to represent structured information
- Using the graph structure to enhance retrieval beyond simple vector similarity
- Integrating graph-based and vector-based retrieval for more comprehensive answers
This implementation uses the official Microsoft GraphRAG CLI to:
- Index documents: The system processes text files in the
input
directory, extracting entities and relationships to build a knowledge graph. - Query the graph: The system supports both global and local search methods to answer questions about the healthcare products.
- Create statistics: The system generates statistics about the knowledge graph, such as the number of entities and relationships.
- Visualize the knowledge graph: The system generates a visual representation of the entities and relationships in the knowledge graph.
pipeline.py
: The main implementation file that contains the GraphRAGPipeline class for indexing and queryinganalyzer.py
: Contains the GraphRAGAnalyzer class for analyzing and visualizing the knowledge graphmain.py
: Example script demonstrating the GraphRAG functionalityinput/
: Directory containing the input text filesoutput/
: Directory where GraphRAG stores its output files (entities, relationships, etc., will be created by GraphRAG)logs/
: Directory for log files (will be created by GraphRAG)cache/
: Directory for cached data (will be created by GraphRAG)
- Python 3.10 or higher
- GraphRAG CLI installed and configured
- Dependencies listed in the Pipfile
- Install pipenv if you don't have it
pip install pipenv
- Install dependencies from Pipfile
pipenv install
- Initialize GraphRAG project
pipenv run graphrag init --root ./
- Add API key (here OpenAI key)
GRAPHRAG_API_KEY=<API_KEY>
- Approve environment settings
direnv allow
Run the example script:
pipenv run python main.py
The script will:
- Run the indexing process
- Execute example search queries
- Generate statistics and visualization of the knowledge graph
- CLI-based interaction: Uses the GraphRAG CLI for indexing and querying
- Graph analysis: Provides statistics and insights about the knowledge graph
- Knowledge graph visualization: Creates visual representations of entities and relationships