GraphRAG Implementation for Healthcare Product Catalog

This repository contains a GraphRAG (Graph-enhanced Retrieval-Augmented Generation) implementation for a healthcare company's product catalog using the official Microsoft GraphRAG package. The implementation automatically extracts entities and relationships from unstructured text documents to build a knowledge graph, which is then used to enhance retrieval and answer questions.

What is GraphRAG?

GraphRAG is an approach that combines the strengths of knowledge graphs with retrieval-augmented generation. It addresses limitations of traditional RAG systems by:

Automatically extracting entities and relationships from documents
Building a knowledge graph to represent structured information
Using the graph structure to enhance retrieval beyond simple vector similarity
Integrating graph-based and vector-based retrieval for more comprehensive answers

How It Works

This implementation uses the official Microsoft GraphRAG CLI to:

Index documents: The system processes text files in the input directory, extracting entities and relationships to build a knowledge graph.
Query the graph: The system supports both global and local search methods to answer questions about the healthcare products.
Create statistics: The system generates statistics about the knowledge graph, such as the number of entities and relationships.
Visualize the knowledge graph: The system generates a visual representation of the entities and relationships in the knowledge graph.

Project Structure

pipeline.py: The main implementation file that contains the GraphRAGPipeline class for indexing and querying
analyzer.py: Contains the GraphRAGAnalyzer class for analyzing and visualizing the knowledge graph
main.py: Example script demonstrating the GraphRAG functionality
input/: Directory containing the input text files
output/: Directory where GraphRAG stores its output files (entities, relationships, etc., will be created by GraphRAG)
logs/: Directory for log files (will be created by GraphRAG)
cache/: Directory for cached data (will be created by GraphRAG)

Requirements

Python 3.10 or higher
GraphRAG CLI installed and configured
Dependencies listed in the Pipfile

Installation

Install pipenv if you don't have it

pip install pipenv

Install dependencies from Pipfile

pipenv install

Project Setup

Initialize GraphRAG project

pipenv run graphrag init --root ./

Add API key (here OpenAI key)

GRAPHRAG_API_KEY=<API_KEY>

Approve environment settings

direnv allow

Usage

Run the example script:

pipenv run python main.py

The script will:

Run the indexing process
Execute example search queries
Generate statistics and visualization of the knowledge graph

Features

CLI-based interaction: Uses the GraphRAG CLI for indexing and querying
Graph analysis: Provides statistics and insights about the knowledge graph
Knowledge graph visualization: Creates visual representations of entities and relationships

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
input		input
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
analyzer.py		analyzer.py
main.py		main.py
pipeline.py		pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraphRAG Implementation for Healthcare Product Catalog

What is GraphRAG?

How It Works

Project Structure

Requirements

Installation

Project Setup

Usage

Features

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mayflower/sc-graph-rag

Folders and files

Latest commit

History

Repository files navigation

GraphRAG Implementation for Healthcare Product Catalog

What is GraphRAG?

How It Works

Project Structure

Requirements

Installation

Project Setup

Usage

Features

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages