Multimodal Analytics

A comprehensive framework for evaluating analytical queries over multimodal data, containing the experiments for the paper "Analytical queries over multimodal data".

Overview

This repository provides a modular evaluation framework for comparing different reasoning systems on multimodal question-answering tasks. The framework supports various Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, and evaluation metrics to analyze performance on analytical queries across different data modalities.

Installation

Using Conda (Recommended)

# Create conda environment from environment.yml
conda env create -f environment.yml
conda activate multimodal-analytics

Using pip

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Setup

Create a .env file in the root directory with your API keys:

OPENAI_API_KEY=your_openai_api_key_here
TOGETHER_API_KEY=your_together_api_key_here
# Add other API keys as needed

Reproducibility

To reproduce the results from the paper, you can use the pre-computed results available in the output/results/ directory. This allows you to verify the evaluation metrics without needing to rerun the entire evaluation process. In the plots directory, you can find a notebook rag_cardinality.ipynb to generate the plots from the pre-computed results. Edit the notebook to point to the correct results directory if needed.

Quick Start

The main entry point is evaluate.py. Here are some example commands:

# Run vanilla RAG evaluation on stirpot-300 dataset
python evaluate.py --method vanilla_rag --usecase stirpot-300 --evaluator f1 --topk 50

# Evaluate GPT-4 with verbose output
python evaluate.py --method gpt4 --usecase stirpot-300 --evaluator llm --verbose

# If you only want to run the evaluation, run with system cache enabled to reuse previous results
python evaluate.py --method llama3_1_8b --usecase stirpot-300 --system-cache --verbose

# Limit queries for quick testing
python evaluate.py --method mixtral --usecase stirpot-300 --query-limit 10

Available Systems

System	Description
`vanilla_rag`	Basic Retrieval-Augmented Generation
`canvas_retrieve_context`	Contextual RAG system
`gpt3`	GPT-3.5 Turbo
`gpt4`	GPT-4
`llama3_1_8b`	Llama 3.1 8B parameter model
`llama3_1_70b`	Llama 3.1 70B parameter model
`llama3_3`	Llama 3.3 model
`gemma3`	Google Gemma 3
`mixtral`	Mixtral model

Evaluators

Evaluator	Description
`f1`	F1 Score evaluation
`precision`	Precision metric
`recall`	Recall metric
`breakdown`	Detailed breakdown analysis
`llm`	LLM-based evaluation

Supported Datasets

stirpot-300: A dataset with 300 questions for multimodal analytics

Usage Examples

Command Line Options

--method: Choose the reasoning system to evaluate
--usecase: Select the dataset/workload
--evaluator: Choose evaluation metric (default: llm)
--topk: Number of top-k documents for RAG retrieval (default: 50)
--query-limit: Limit number of queries to process
--system-cache: Enable system-level caching
--result-cache: Use cached evaluation results if available
--verbose: Enable detailed output

Results

Evaluation results are automatically saved in the output/results/{system}/ directory. Each result file contains:

Question-answer pairs with predictions
Evaluation metrics and scores
Timestamp and configuration details

Results are saved in JSON format with filenames like:

{usecase}_{system}_{timestamp}.json

For RAG systems, the top-k parameter is included:

{usecase}_{system}_topk{k}_{timestamp}.json

Architecture

The framework is organized into several key components:

Systems (`systems/`)

Reasoners: Core reasoning systems (LLMs, RAG)
Evaluators: Evaluation metrics and methods
Retrievers: Document retrieval systems
Chunkers: Text chunking strategies
Featurizers: Feature extraction methods
Storage: Vector storage backends (FAISS, etc.)

Data Processing

The framework supports various data formats:

PDF documents (via pdf_chunker.py)
CSV files (via csv_chunker.py)
Markdown documents (via markdown_chunker.py)
Contextual chunking to improve retrieval

Caching

System Cache: Caches intermediate results from reasoning systems
Result Cache: Stores final evaluation results for reuse
Automatic cache management with timestamp-based organization

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this framework in your research, please cite:

@article{multimodal_analytics,
  title={Analytical queries over multimodal data},
  author={MIT DB Group},
  year={2025}
}

Contact

For questions or issues, please open a GitHub issue or contact [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Analytics

Overview

Installation

Using Conda (Recommended)

Using pip

Environment Setup

Reproducibility

Quick Start

Available Systems

Evaluators

Supported Datasets

Usage Examples

Command Line Options

Results

Architecture

Systems (`systems/`)

Data Processing

Caching

License

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
output/results		output/results
plots		plots
scripts/scramble-hotpot		scripts/scramble-hotpot
systems		systems
workloads		workloads
.envexample		.envexample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
evaluate.py		evaluate.py
requirements.txt		requirements.txt

License

mitdbg/multimodal-analytics

Folders and files

Latest commit

History

Repository files navigation

Multimodal Analytics

Overview

Installation

Using Conda (Recommended)

Using pip

Environment Setup

Reproducibility

Quick Start

Available Systems

Evaluators

Supported Datasets

Usage Examples

Command Line Options

Results

Architecture

Systems (systems/)

Data Processing

Caching

License

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Systems (`systems/`)

Packages