RAG (Retrieval-Augmented Generation) Playground

This repository contains experiments and code for working with Retrieval-Augmented Generation (RAG), vector embeddings, and document ingestion/parsing using Python and popular libraries such as LangChain, HuggingFace, FAISS, and ChromaDB.

Features

Data Ingestion & Parsing:
Notebooks and scripts for parsing and ingesting documents (PDF, DOCX, etc.) for downstream NLP tasks.
Vector Embeddings & Databases:
Examples of generating vector embeddings for text using HuggingFace models and storing/querying them with vector databases like FAISS and ChromaDB.
Jupyter Notebooks:
Interactive notebooks for step-by-step experimentation and visualization.

Project Structure

.
├── main.py
├── requirements.txt
├── pyproject.toml
├── .env
├── .gitignore
├── data/
│   └── attention .pdf
├── DataIngestionParsing/
│   └── dataingestion.ipynb
├── VectorEmbeddingAndDatabases/
│   └── embedding.ipynb

main.py: Entry point for running the project.
requirements.txt / pyproject.toml: Python dependencies.
data/: Example data files.
DataIngestionParsing/: Notebooks for document ingestion and parsing.
VectorEmbeddingAndDatabases/: Notebooks for embedding and vector database experiments.

Setup

Python Version:
Requires Python 3.13 (see .python-version).
Install Dependencies:
You can use pip or your preferred environment manager:
```
pip install -r requirements.txt
```
Environment Variables:
Create a .env file for any required API keys or configuration.
Run Notebooks:
Open the notebooks in VS Code or Jupyter Lab and run the cells interactively.
Run Main Script:
```
python main.py
```

Notebooks

Data Ingestion:
See DataIngestionParsing/dataingestion.ipynb
Vector Embedding & Visualization:
See VectorEmbeddingAndDatabases/embedding.ipynb

Dependencies

Key libraries used:

See requirements.txt and pyproject.toml for the full list.

License

MIT (add a LICENSE file if you want to specify)

This repo is a playground for learning and experimenting with RAG, embeddings, and vector databases. Contributions and suggestions are welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG (Retrieval-Augmented Generation) Playground

Features

Project Structure

Setup

Notebooks

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DataIngestionParsing		DataIngestionParsing
Vector Stores		Vector Stores
VectorEmbeddingAndDatabases		VectorEmbeddingAndDatabases
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Hamzakhan001/RAG-Production-Pipelines

Folders and files

Latest commit

History

Repository files navigation

RAG (Retrieval-Augmented Generation) Playground

Features

Project Structure

Setup

Notebooks

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages