⚖️ LegalEvalHub: Benchmarking LLMs on Legal Tasks

LegalEvalHub is a simple leaderboard-centric website for tracking and sharing LLM performance on different legal tasks. The platform is intended to be open: please contribute either tasks or evaluation runs. You can access the website here.

To contribute an evaluation run, a new task, or a new leaderboard, please refer to the CONTRIBUTING.md file.

📁 Repository Structure

.
├── tasks/                      # Community-defined task metadata
│   └── <task_id>.json
├── eval_runs/                 # Community-submitted eval run metadata
│   └── <task_id>/             # One folder per task
│       └── <submission_id>.json
├── utils/                     # Validation utilities (coming soon)
│   └── validate_task.py
│   └── validate_eval_run.py
├── web/                       # Flask web interface
│   ├── app.py                 # Main Flask application
│   ├── templates/             # HTML templates
│   │   ├── base.html          # Base template with navigation
│   │   ├── index.html         # Home page with project overview
│   │   ├── home.html          # Tasks listing page
│   │   ├── task_detail.html   # Individual task page
│   │   ├── benchmarks.html    # Aggregate leaderboards overview
│   │   ├── preset_leaderboard.html  # Individual aggregate leaderboard
│   │   ├── faq.html           # Frequently asked questions
│   │   └── resources.html     # Resources and documentation
│   ├── static/
│   │   └── css/
│   │       └── style.css      # Wikipedia-style minimal CSS
│   └── task_presets.json      # Aggregate leaderboard configurations
├── requirements.txt           # Python dependencies
├── README.md
└── CONTRIBUTING.md

🚀 Quick Start

Running the Web Interface

Clone the repository:

git clone https://github.com/yourusername/LegalEvalHub.git
cd LegalEvalHub

Install dependencies:
```
pip install -r requirements.txt
```
Run the Flask application:
```
cd web
python app.py
```
Open your browser: Navigate to http://localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
eval_runs		eval_runs
tasks		tasks
web		web
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Procfile		Procfile
README.md		README.md
SITEMAP.md		SITEMAP.md
railway.json		railway.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚖️ LegalEvalHub: Benchmarking LLMs on Legal Tasks

📁 Repository Structure

🚀 Quick Start

Running the Web Interface

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

neelguha/LegalEvalHub

Folders and files

Latest commit

History

Repository files navigation

⚖️ LegalEvalHub: Benchmarking LLMs on Legal Tasks

📁 Repository Structure

🚀 Quick Start

Running the Web Interface

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages