CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

CLEAR (Comprehensive LLM Error Analysis and Reporting) is an interactive, open-source package for LLM-based error analysis. It helps surface meaningful, recurring issues in model outputs by combining automated evaluation with powerful visualization tools.

The workflow consists of two main phases:

Analysis
Generates textual feedback for each instance; Identifies system-level error categories from these critiques and quantifies their frequencies.
Interactive Dashboard
An intuitive dashboard provides a comprehensive view of model behavior. Users can:
- Explore aggregate visualizations of identified issues
- Apply dynamic filters to focus on specific error types or score ranges
- Drill down into individual examples that illustrate specific failure patterns

CLEAR makes it easier to diagnose model shortcomings and prioritize targeted improvements.

You can run CLEAR as a full pipeline, or reuse specific stages (generation, evaluation, or just UI).

🚀 Quickstart

Requires Python 3.10+ and the necessary credentials for a supported provider.

1. Installation

Option 1 (Recommended for development): Clone the repo and set up a virtual environment:

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

📦 Option 2: Install via pip (Latest Release)

pip install clear-eval

` 2. ### Set provider type and credentials CLEAR requires a supported LLM provider and credentials to run analysis. See supported providers ↓

⚠️ Using a private proxy or openai deployment? You must configure your model names explicitly (see below). Otherwise, default model names will be used automatically for supported providers.

Run on sample data:

The sample dataset is a small subset of the GSM8K math problems. For running on the sample data and default configuration, you simpy have to set your provider and run

run-clear-eval-analysis --provider=openai # or rits, watsonx

This will:

Run the full CLEAR pipeline
Save results under: results/gsm8k/sample_output/

View results in the interactive dashboard:

run-clear-eval-dashboard

Or set the port with

run-clear-eval-dashboard --port <port>

Then:

Upload the generated ZIP file from results/gsm8k/sample_output/
Explore issues, scores, filters, and drill into examples

To explore the dashboard without running any analysis:

Run the dashboard:

run-clear-eval-dashboard

Then you can load the pre-generated sample output zip. you can manually upload a sample .zip file located at:

<your-env>/site-packages/clear_eval/sample_data/gsm8k/analysis_results_gsm8k_default.zip

📁 Or just download it directly from the GitHub repo.

📂 Analyzing your own data

📄 Input Data Format

CLEAR takes a CSV file as input, with each row representing a single instance to be evaluated.

Required Columns

Column	Used When	Description
`id`	Always	Unique identifier for the instance
`model_input`	Always	Prompt provided to the generation model
`response`	Using pre-generated responses	Pre-generated model response (ignored if generation is enabled)
`ground_truth`	Performing reference based analysis	Ground-truth answer for evaluation (optional)
others	`--input_columns` is used	Additional input columns to show in dashboard (e.g. `question`)

🚀 Running the analysis

CLEAR can be run via the CLI or Python API.

Option 1: CLI commands

Each stage has its own entry point:

run-clear-eval-analysis --config_path path/to/config.yaml    # run full pypeline
run-clear-eval-generation --config_path path/to/config.yaml  # run generation only
run-clear-eval-evaluation --config_path path/to/config.yaml  # Assume generation responses are given, run evaluation

If --config_path is specified, all parameters are taken from the config unless explicitly overridden
CLI flags passed directly override corresponding config values

Option 2: Python API

from clear_eval.analysis_runner import run_clear_eval_analysis, run_clear_eval_generation, run_clear_eval_evaluation

run_clear_eval_analysis(
    config_path="configs/sample_run_config.yaml"
)

You may also pass overrides instead of using a config file:

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_data",
    provider="openai",
    data_path="my_data.csv",
    gen_model_name="gpt-3.5-turbo",
    eval_model_name="gpt-4",
    output_dir="results/gsm8k/",
    perform_generation=False,
    input_columns=["question"]
)

📊 Launching the Dashboard

run-clear-eval-dashboard

Upload the ZIP file generated in your --output-dir when prompted.

🎛 Supported CLI Arguments

Arguments can be provided via:

A YAML config file (--config_path)
CLI flags
Python function parameters (when using the API)

⚠️ Boolean arguments (perform_generation, is_reference_based, resume_enabled)
These must be set explicitly to true or false in YAML, CLI, or Python.
On the CLI, use --flag True or --flag False (case-insensitive).

⚠️ Naming Convention
Parameter names use snake_case in YAML and Python, but use --kebab-case in CLI.
For example:

YAML: perform_generation: true

Python: perform_generation=True

CLI: --perform-generation True

Argument	Description	Default
`--config_path`	Path to a YAML config file (all values loaded unless overridden by CLI args)
`--run_name`	Unique run name (used in result file names)
`--data_path`	Path to input CSV file
`--output_dir`	Output directory to write results
`--provider`	Model provider: `openai`, `watsonx`, `rits`
`--eval_model_name`	Name of judge model (e.g. `gpt-4o`)
`--gen_model_name`	Name of the generator model to evaluate. If not running generations - the generator name to display.
`--perform_generation`	Whether to generate responses or use existing `response` column	True
`--is_reference_based`	Use reference-based evaluation (requires `ground_truth` column in input)	False
`--resume_enabled`	Whether to reuse intermediate outputs from previous runs stored in output_dir	True
`--evaluation_criteria`	Custom criteria dictionary for scoring individual records: `{"criteria_name1":"criteria_desc1", ...}`supported for yaml config and python.	None
`--input_columns`	Comma-separated list of additional input fields (other than `model_input`) to appear in the results and dashboard (e.g. `question`)	None

🔑Supported providers and credentials

Depending on your selected --provider:

Provider	Required Environment Variables
`openai`	`OPENAI_API_KEY`, [`OPENAI_API_BASE` if using proxy ]
`watsonx`	`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_SPACE_ID` or `PROJECT_ID`
`rits`	`RITS_API_KEY`

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
results/input_for_ui		results/input_for_ui
src/clear_eval		src/clear_eval
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

🚀 Quickstart

1. Installation

Option 1 (Recommended for development): Clone the repo and set up a virtual environment:

📦 Option 2: Install via pip (Latest Release)

Run on sample data:

View results in the interactive dashboard:

To explore the dashboard without running any analysis:

📂 Analyzing your own data

📄 Input Data Format

Required Columns

🚀 Running the analysis

Option 1: CLI commands

Option 2: Python API

📊 Launching the Dashboard

🎛 Supported CLI Arguments

🔑Supported providers and credentials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

IBM/CLEAR

Folders and files

Latest commit

History

Repository files navigation

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

🚀 Quickstart

1. Installation

Option 1 (Recommended for development): Clone the repo and set up a virtual environment:

📦 Option 2: Install via pip (Latest Release)

Run on sample data:

View results in the interactive dashboard:

To explore the dashboard without running any analysis:

📂 Analyzing your own data

📄 Input Data Format

Required Columns

🚀 Running the analysis

Option 1: CLI commands

Option 2: Python API

📊 Launching the Dashboard

🎛 Supported CLI Arguments

🔑Supported providers and credentials

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages