EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 10.3k

Code
Issues 491
Pull requests 154
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

154 Open 1,643 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix PIL image hashing to use actual bytes instead of object repr

#3331 opened Oct 7, 2025 by tboerstad

Loading…

Add support for Titulm Bangla MMLU dataset

#3317 opened Sep 29, 2025 by Ismail-Hossain-1

Loading…

feat: Add support for accelerate-wrapped models in simple_evaluate()

#3313 opened Sep 26, 2025 by DhruvaKashyap

Loading…

Add MATH500

#3311 opened Sep 26, 2025 by jannalulu

Loading…

Support empty response for Completions and ChatCompletions API

#3309 opened Sep 22, 2025 by tboerstad

Loading…

Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark

#3305 opened Sep 20, 2025 by Ahmad21Omar

Loading…

Support torchrun vllm DP

#3304 opened Sep 19, 2025 by luccafong

Loading…

Gemini evaluation support

#3300 opened Sep 15, 2025 by IsraelAbebe

Loading…

Fix lambada_multilingual_stablelm

#3294 opened Sep 11, 2025 by jmichaelov

Loading…

[feature] add support for Moore Threads GPU family

#3290 opened Sep 10, 2025 by houchen-li

Loading…

Adding SPaRC to lm eval harness

#3262 opened Aug 25, 2025 by lkaesberg

Loading…

Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)

#3256 opened Aug 21, 2025 by Mariani-code

Loading…

fix gsm8k normalization

#3254 opened Aug 20, 2025 by huaanrui

Loading…

Main

#3250 opened Aug 20, 2025 by seongtaehong

Loading…

Adding 3LM to lm eval harness

#3241 opened Aug 14, 2025 by GeorgeSherif

Loading…

Trim thinking content from model output in IFEval

#3240 opened Aug 14, 2025 by davideguidobene

Loading…

Adding support for evaluating with Mistral and Pixtral models

#3235 opened Aug 13, 2025 by LearnerSXH

Loading…

Adding support for Structured Generation with XGrammar

#3232 opened Aug 12, 2025 by ceferisbarov

Loading…

5 tasks

Pass dataset_kwargs for Unitxt tasks

#3230 opened Aug 11, 2025 by mprahl

Loading…

Fix the Unitxt init method to set the task name

#3225 opened Aug 8, 2025 by mprahl

Loading…

Fix: respect target_delimiter when using a gen_prefix on multiple-choice tasks

#3220 opened Aug 7, 2025 by karanikolopoulos

Loading…

Update openai_completions.py

#3215 opened Aug 7, 2025 by phseidl

Loading…

Support for DDP+MP with native torch and no accelerate

#3205 opened Aug 3, 2025 by xgal

Loading…

feat: COT trace response handling in evaluator and model classes

#3204 opened Aug 3, 2025 by hhh2210

Loading…

Add new task: kmmlu_pro, kmmlu_redux

#3198 opened Aug 1, 2025 by jeonghodot

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!