LegalEvalHub is a simple leaderboard-centric website for tracking and sharing LLM performance on different legal tasks. The platform is intended to be open: please contribute either tasks or evaluation runs. You can access the website here.
To contribute an evaluation run, a new task, or a new leaderboard, please refer to the CONTRIBUTING.md file.
.
├── tasks/ # Community-defined task metadata
│ └── <task_id>.json
├── eval_runs/ # Community-submitted eval run metadata
│ └── <task_id>/ # One folder per task
│ └── <submission_id>.json
├── utils/ # Validation utilities (coming soon)
│ └── validate_task.py
│ └── validate_eval_run.py
├── web/ # Flask web interface
│ ├── app.py # Main Flask application
│ ├── templates/ # HTML templates
│ │ ├── base.html # Base template with navigation
│ │ ├── index.html # Home page with project overview
│ │ ├── home.html # Tasks listing page
│ │ ├── task_detail.html # Individual task page
│ │ ├── benchmarks.html # Aggregate leaderboards overview
│ │ ├── preset_leaderboard.html # Individual aggregate leaderboard
│ │ ├── faq.html # Frequently asked questions
│ │ └── resources.html # Resources and documentation
│ ├── static/
│ │ └── css/
│ │ └── style.css # Wikipedia-style minimal CSS
│ └── task_presets.json # Aggregate leaderboard configurations
├── requirements.txt # Python dependencies
├── README.md
└── CONTRIBUTING.md-
Clone the repository:
git clone https://github.com/yourusername/LegalEvalHub.git cd LegalEvalHub -
Install dependencies:
pip install -r requirements.txt
-
Run the Flask application:
cd web python app.py -
Open your browser: Navigate to
http://localhost:5000