Skip to content

mitdbg/multimodal-analytics

Repository files navigation

Multimodal Analytics

A comprehensive framework for evaluating analytical queries over multimodal data, containing the experiments for the paper "Analytical queries over multimodal data".

Overview

This repository provides a modular evaluation framework for comparing different reasoning systems on multimodal question-answering tasks. The framework supports various Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and evaluation metrics to analyze performance on analytical queries across different data modalities.

Installation

Using Conda (Recommended)

# Create conda environment from environment.yml
conda env create -f environment.yml
conda activate multimodal-analytics

Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Setup

Create a .env file in the root directory with your API keys:

OPENAI_API_KEY=your_openai_api_key_here
TOGETHER_API_KEY=your_together_api_key_here
# Add other API keys as needed

Reproducibility

To reproduce the results from the paper, you can use the pre-computed results available in the output/results/ directory. This allows you to verify the evaluation metrics without needing to rerun the entire evaluation process. In the plots directory, you can find a notebook rag_cardinality.ipynb to generate the plots from the pre-computed results. Edit the notebook to point to the correct results directory if needed.

Quick Start

The main entry point is evaluate.py. Here are some example commands:

# Run vanilla RAG evaluation on stirpot-300 dataset
python evaluate.py --method vanilla_rag --usecase stirpot-300 --evaluator f1 --topk 50

# Evaluate GPT-4 with verbose output
python evaluate.py --method gpt4 --usecase stirpot-300 --evaluator llm --verbose

# If you only want to run the evaluation, run with system cache enabled to reuse previous results
python evaluate.py --method llama3_1_8b --usecase stirpot-300 --system-cache --verbose

# Limit queries for quick testing
python evaluate.py --method mixtral --usecase stirpot-300 --query-limit 10

Available Systems

System Description
vanilla_rag Basic Retrieval-Augmented Generation
canvas_retrieve_context Contextual RAG system
gpt3 GPT-3.5 Turbo
gpt4 GPT-4
llama3_1_8b Llama 3.1 8B parameter model
llama3_1_70b Llama 3.1 70B parameter model
llama3_3 Llama 3.3 model
gemma3 Google Gemma 3
mixtral Mixtral model

Evaluators

Evaluator Description
f1 F1 Score evaluation
precision Precision metric
recall Recall metric
breakdown Detailed breakdown analysis
llm LLM-based evaluation

Supported Datasets

  • stirpot-300: A dataset with 300 questions for multimodal analytics

Usage Examples

Command Line Options

  • --method: Choose the reasoning system to evaluate
  • --usecase: Select the dataset/workload
  • --evaluator: Choose evaluation metric (default: llm)
  • --topk: Number of top-k documents for RAG retrieval (default: 50)
  • --query-limit: Limit number of queries to process
  • --system-cache: Enable system-level caching
  • --result-cache: Use cached evaluation results if available
  • --verbose: Enable detailed output

Results

Evaluation results are automatically saved in the output/results/{system}/ directory. Each result file contains:

  • Question-answer pairs with predictions
  • Evaluation metrics and scores
  • Timestamp and configuration details

Results are saved in JSON format with filenames like:

{usecase}_{system}_{timestamp}.json

For RAG systems, the top-k parameter is included:

{usecase}_{system}_topk{k}_{timestamp}.json

Architecture

The framework is organized into several key components:

Systems (systems/)

  • Reasoners: Core reasoning systems (LLMs, RAG)
  • Evaluators: Evaluation metrics and methods
  • Retrievers: Document retrieval systems
  • Chunkers: Text chunking strategies
  • Featurizers: Feature extraction methods
  • Storage: Vector storage backends (FAISS, etc.)

Data Processing

The framework supports various data formats:

  • PDF documents (via pdf_chunker.py)
  • CSV files (via csv_chunker.py)
  • Markdown documents (via markdown_chunker.py)
  • Contextual chunking to improve retrieval

Caching

  • System Cache: Caches intermediate results from reasoning systems
  • Result Cache: Stores final evaluation results for reuse
  • Automatic cache management with timestamp-based organization

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this framework in your research, please cite:

@article{multimodal_analytics,
  title={Analytical queries over multimodal data},
  author={MIT DB Group},
  year={2025}
}

Contact

For questions or issues, please open a GitHub issue or contact [email protected]

About

A repository containing the experiments for the paper "Analytical queries over multimodal data"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published