ConsumerBench is a comprehensive benchmarking framework that evaluates the runtime performance of user-defined GenAI applications under realistic conditions on end-user devices.
# Clone the repository
git clone https://github.com/your-org/ConsumerBench.git
cd ConsumerBench
# Set up environment
conda create -n consumerbench python=3.10
conda activate consumerbench
pip install -r requirements.txtFollow instructions mentioned in applications/
Add your own yml workflow in configs/
Run the benchmark using the command
python src/scripts/run_consumerbench.py --config <path-to-config>
The benchmark has been tested on the following hardware:
- Setup 1:
- CPU: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
- GPU: NVIDIA RTX 6000
- System Memory: 32GB
- CPU cores: 12
- Setup 2:
- Macbook Pro M1
- Unified Memory: 32GB
ConsumerBench/
├── src/ # Source code
├── inference_backends/ # Inference backends
├── models/ # GenAI models
├── applications/ # Applications
├── configs/ # Example user configurations & workflows
└── scripts/ # Result processing and plotting scripts
Text-to-text generation for chat and Q&A with:
- Local backend mimicking OpenAI API
- Powered by llama.cpp for efficient CPU-GPU co-execution
- Located in
applications/Chatbot
Agent-based reasoning for complex fact gathering:
- Built on open-deep-research framework
- Served via LiteLLM
- Located in
applications/DeepResearch
Text-to-image generation optimized for edge devices:
- Utilizes stable-diffusion-webui in API mode
- Located in
applications/ImageGen
Audio-to-text transcription for real-time and offline use:
- Whisper-based backend over HTTP
- Located in
applications/LiveCaptions
Run the script:
./scripts/run_benchmark.sh configs/workflow_imagegen.yml 0This script collects:
- GPU metrics - Compute/memory bandwidth (DCGM)
- CPU utilization - Via
statutility - CPU memory bandwidth - Via
pcm-memoryutility - GPU power - Via
NVMLutility - CPU power - Via
RAPLutility
Results are saved in the results directory with timestamps. PDF plots are automatically generated.
To modify Service Level Objectives (SLOs):
- Chatbot:
scripts/result_processing/parse-results-chatbot-log.py - DeepResearch:
scripts/result_processing/parse-results-deepresearch-log.py - ImageGen:
scripts/result_processing/parse-results-imagegen-log.py - LiveCaptions:
scripts/result_processing/parse-results-whisper-log.py
| Application | Config |
|---|---|
| Chatbot | configs/workflow_chatbot.yml |
| LiveCaptions | configs/workflow_live_captions.yml |
| ImageGen | configs/workflow_imagegen.yml |
CPU-only: Change
devicefrom "gpu" to "cpu" in the configs.
- Greedy allocation:
configs/workflow_chatbot_imagegen_live_captions.yml - GPU partitioning:
configs/workflow_chatbot_imagegen_live_captions_mps.yml
- Config:
configs/workflow_chatbot_deep_research.yml - Edit
example_workflow/llamacpp_server.shto add-c 128000 -nkvofor Chatbot-KVCache-CPU
- Greedy allocation:
configs/workflow_content_creation.yml - GPU partitioning:
configs/workflow_content_creation_mps.yml