Developed with the software and tools below.
Developers: Sabrina Zaki Hansen and Amos Blanton
This project implements a Retrieval-Augmented Generation (RAG) with a large langauge model, to assist with questions and exploration of meeting transcripts, summaries, and project data.
The code is developed in collaboration and for the EER project. The chatbot integrates:
- Document Retrieval: Retrieves data from meeting transcripts, summaries, and related documents.
- Conversation History: References from past chatbot interactions.
- LLM-Powered Summaries: Uses a Large Language Model (LLM) to generate summaries of transcripts.
For a full code walkthrough, check the src\RAG_tutorial.ipynb.
Project.
├── .venv # Virtual environment directory
├── data # Directory for storing input data (transcripts PDFs)
├── src # Source code directory
│ ├── preprocessimg
│ │ ├── reformatting_data.py # Transcript reformatting scripts
│ │ └──data_chunking.py # Data processing and chunking logic
│ ├── streamlit_rag_chatbot # Directory for TimescaleDB integration
│ │ ├── main.py # Core chatbot pipeline
│ │ └── streamlit_app.py # Streamlit app for the chatbot
│ └── upserting_transcripts # Scripts for upserting transcripts to database
│ ├── a2t.py # Incomplete script for the pipeline; currently focuses on adding transcripts to Pinecone
│ └── streamlit_a2t.py # Streamlit app interface for managing the pipeline
├── .env # Environment variables (API keys for HuggingFace and Pinecone)
├── .gitignore # Excluded files and directories
├── Dockerfile # Docker configuration for deploying the app
├── LICENSE.txt # License for the project
├── README.md # Readme file
└── requirements.txt # Dependencies for the project
File | Summary |
---|---|
src/streamlit_rag_chatbot/main.py |
Sets up the chatbot pipeline, integrating document retrieval with Pinecone and HuggingFace embeddings for advanced querying and summarization. |
src/streamlit_rag_chatbot/streamlit_app.py |
Implements the Streamlit-based user interface, enabling interaction with the chatbot and meeting summaries and referenced data. |
src/preprocessing/reformatting_data.py |
Automates cleaning and reformatting raw transcript files into a structured format (CSV), making them suitable for further processing. |
src/preprocessing/data_chunking.py |
Splits transcripts into manageable chunks and prepares them for storage in the vector store with metadata enrichment. |
- Python 3.11.9
- API keys for:
- HuggingFace
- Pinecone
-
Clone the repository:
git clone https://github.com/sabszh/EER-chatbot-UI/
-
Navigate to the project directory:
cd EER-chatbot-UI
-
Set up a virtual environment (optional but recommended):
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables: Create a
.env
file in the root directory and add your API keys:HUGGINGFACE_API_KEY=your_huggingface_api_key PINECONE_API_KEY=your_pinecone_api_key
To launch the Streamlit app:
streamlit run src/streamlit_rag_chatbot/streamlit_app.py
-
Build the Docker image:
docker build -t eer-chatbot .
-
Run the Docker container:
docker run -p 8501:8501 eer-chatbot
Fetch concise summaries of past meetings, filtered by specific dates. Summaries highlight discussion points, action items, and speaker lists.
View data sources referenced by the chatbot in its answers, including meeting transcripts and related documents.
Explore connections between your queries and those from other users, using past conversations for context-aware insights.
This project is licensed under the GNU General Public License (GPL).
TL;DR
- Anyone can copy, modify and distribute this software.
- You have to include the license and copyright notice with each and every distribution.
- You can use this software privately.
- You can use this software for commercial purposes.
- If you dare build your business solely from this code, you risk open-sourcing the whole code base.
- If you modify it, you have to indicate changes made to the code.
- Any modifications of this code base MUST be distributed with the same license, GPLv3.
- This software is provided without warranty.
- The software author or license can not be held liable for any damages inflicted by the software.
- Access the full license text in the LICENSE.txt file.
For more details on the terms of this license, please visit GNU Licenses.