Multimodal RAG Tutorials

Description

This repository contains the complete code and tutorials for implementing a multimodal retrieval-augmented generation (RAG) system capable of processing, storing, and retrieving video content. The system uses BridgeTower for multimodal embeddings, LanceDB as the vector store, and Pixtral as the conversation LLM.

Installation

To install the necessary dependencies, run the following command:

pip install -r requirements.txt

Tutorials

mm_rag.ipynb: Complete end-to-end implementation of the multimodal RAG system
embedding_creation.ipynb: Deep dive into generating multimodal embeddings using BridgeTower
vector_store.ipynb: Detailed guide on setting up and populating LanceDB for vector storage
preprocessing_video.ipynb: Comprehensive coverage of video preprocessing techniques, including:
- Frame extraction
- Transcript processing
- Handling videos without transcripts
- Transcript optimization strategies

Required API Keys

You'll need to set up the following API keys:

MISTRAL_API_KEY for PixTral model access

Data

The tutorial uses a sample video about a space expedition. You can replace it with any video of your choice, but make sure to:

Include a transcript file (.vtt format)
Or generate transcripts using Whisper
Or use vision language models for caption generation

Contributing

Contributions are welcome! Some areas for improvement include:

Adding chat history support
Prompt engineering refinements
Alternative retrieval strategies
Testing different VLMs and embedding models

To contribute:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal RAG Tutorials

Description

Installation

Tutorials

Required API Keys

Data

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Resh-97/multimodal-rag-tutorials

Folders and files

Latest commit

History

Repository files navigation

Multimodal RAG Tutorials

Description

Installation

Tutorials

Required API Keys

Data

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages