This project consists of two main components: a server part (LLaVA bridge) running on Gaudi 2, and a client part running on a laptop with a webcam.
The LLaVA bridge is the server component of this project. It runs on Gaudi 2 hardware and is responsible for processing requests from the client. This component utilizes the LLaVA (Large Language and Vision Assistant) model to analyze images and generate responses.
Key features:
- Runs on Gaudi 2 hardware for high-performance computing
- Processes image analysis requests
- Utilizes the LLaVA and BridgeTower models for advanced image understanding and response generation
The client component runs on a laptop equipped with a webcam. It captures images from the webcam and sends them to the server for processing.
Key features:
- Runs on a laptop with a webcam
- Captures frames from the webcam
- Encodes and sends image data to the server
- Displays responses received from the server
- The client captures a frame from the webcam.
- The image is encoded and sent to the LLaVA bridge server.
- The server processes the image using the LLaVA model.
- The server sends back a response based on the image analysis.
- The client displays the response to the user.
Follow these steps to set up the project and run the Streamlit app.
Create a virtual environment named venv
.
python3 -m venv venv
Activate the virtual environment. Use the following command for Unix-based systems (Linux and macOS):
source venv/bin/activate
Install the required dependencies listed in required_client.txt.
pip install -r requirements_client.txt
Run the Streamlit app using the following command:
streamlit run client.py