Revert extraction setting for `IndicesExtractionConfig` #998

cmpatino · 2025-09-29T19:24:51Z

Revert try_extract_without_anchor to True in IndicesExtractionConfig to recover behavior in gpqa:diamond eval.

The setting was changed to try_extract_without_anchor = False to avoid false positives in GPQA. However, we ran an experiment and noticed that answers in the format \boxed{X} were being graded as incorrect even if the model gave the correct answer.

GPQA is an eval focused on evaluating the model's knowledge in science and math, so answers in the \boxed{X} format should also be valid.

Below is a summary of the effects of the change in Qwen and SmolLM3 models.

Model	Mode	try_extract_without_anchor: True	try_extract_without_anchor: False
Qwen_Qwen3-0.6B_main	/no_think	26.26	15.97
Qwen_Qwen3-1.7B_main	/no_think	31.76	24.31
Qwen_Qwen3-4B_main	/no_think	44.38	46.59
Qwen_Qwen3-0.6B_main	/think	28.16	22.54
Qwen_Qwen3-1.7B_main	/think	39.90	38.26
Qwen_Qwen3-4B_main	/think	55.30	53.66
HuggingFaceTB_SmolLM3-3B_main	/think	41.70	28.70
HuggingFaceTB_SmolLM3-3B_main	/no_think	35.70	22.35

Revert `try_extract_without_anchor` to True in `IndicesExtractionConfig` to avoid issues in `gpqa:diamond` eval

HuggingFaceDocBuilderDev · 2025-09-29T19:27:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

LGTM but let's wait for approval from a core maintainer before merging

clefourrier · 2025-09-30T09:05:29Z

Can you change it in GPQA specifically? We observed we got a number of false positives in other metrics with this setting

clefourrier

LGTM, you can merge once tests pass

clefourrier · 2025-09-30T09:38:56Z

(you will have to update the tests for GPQA since you're changing the metric)

Revert extraction setting for IndicesExtractionConfig

d722e78

Revert `try_extract_without_anchor` to True in `IndicesExtractionConfig` to avoid issues in `gpqa:diamond` eval

lewtun approved these changes Sep 29, 2025

View reviewed changes

Change try_extract_without_anchor only for GPQA

6c5af42

clefourrier approved these changes Sep 30, 2025

View reviewed changes

cmpatino added 2 commits September 30, 2025 11:44

Fix style

5b213cb

Update GPQA test to reflect the extract setting

1b63544

cmpatino merged commit b1d45e3 into huggingface:main Sep 30, 2025
4 checks passed

cmpatino deleted the indices-extraction-setting branch September 30, 2025 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert extraction setting for `IndicesExtractionConfig` #998

Revert extraction setting for `IndicesExtractionConfig` #998

Uh oh!

cmpatino commented Sep 29, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 29, 2025

Uh oh!

lewtun left a comment

Uh oh!

clefourrier commented Sep 30, 2025

Uh oh!

clefourrier left a comment

Uh oh!

clefourrier commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Revert extraction setting for IndicesExtractionConfig #998

Revert extraction setting for IndicesExtractionConfig #998

Uh oh!

Conversation

cmpatino commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 29, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

clefourrier commented Sep 30, 2025

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

clefourrier commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Revert extraction setting for `IndicesExtractionConfig` #998

Revert extraction setting for `IndicesExtractionConfig` #998

cmpatino commented Sep 29, 2025 •

edited

Loading