Add hipho physics dataset #1318

MarkovChain-why · 2025-11-19T04:57:28Z

update some requirements and refactor the code

- Remove custom LogBuffer class and thread-safe logging - Replace safe_print with standard print statements - Remove threading and datetime imports - Simplify build_prompt function by removing verbose debug output - Update dataset URL from haiyuanwan/HiPhO to HY-Wan/HiPhO - Reduce code from 899 to 803 lines (10.7% reduction) - Maintain all core functionality: evaluation logic, prompt building, hipho_verifier integration

- Remove complex parallel evaluation using track_progress_rich - Simplify to sequential evaluation for better stability and debugging - Remove multiprocessing and parallel task management dependencies - Rename functions to remove '_with_buffer' suffix and log_buffer parameters - Remove nproc parameter handling and temporary file management - Reduce code from 803 to 774 lines (additional 3.6% reduction) - Maintain all core evaluation logic: fine/coarse-grained scoring, hipho_verifier integration - Sequential evaluation is sufficient for physics olympiad problem counts

Major improvements: - Remove 6 unnecessary try-except blocks that were hiding errors - Standardize judge model initialization to follow VLMEvalKit conventions - Move all prompt templates to utils/prompt_inference.py for better organization - Remove redundant count statistics (fine_grained_count, coarse_grained_count, total_count) - Remove unused fallback functions (_simple_answer_matching, _extract_prediction_for_display) - Fix multi-image base64 processing bug - Correct dataset name display in summary output - Remove verbose debugging output and unnecessary comments Code reduction: 899 → 604 lines (32.8% reduction) Eliminated potential bugs and improved maintainability while preserving all core functionality

…iguration - Translate all Chinese comments to English in hipho.py, hipho_verifier.py, and prompt_inference.py - Simplify comments while maintaining technical accuracy - Replace hardcoded verifier model configuration with environment variables - Use VLMEvalKit standard environment variable approach for better flexibility - Add support for HIPHO_VERIFIER_* environment variables for model configuration - Improve code maintainability and international accessibility

- Add datasets: for HuggingFace dataset loading - Add scikit-learn: for machine learning utilities - Add pylatexenc==2.10: for LaTeX text processing - Add math-verify: for mathematical answer verification These dependencies are required for the HiPhO physics olympiad dataset evaluation and verification functionality.

FangXinyu-0913 · 2025-11-19T08:49:58Z

Hi @MarkovChain-why, it seems this pull request is duplicate with #1293. Can I close the previous one?

vlmeval/dataset/utils/hipho_prompt_inference.py

MarkovChain-why and others added 9 commits October 28, 2025 19:41

Initial commit fot HiPhO

bec4d65

update

c10e637

update

20f332e

Merge branch 'main' into main

e46701f

mzr1996 reviewed Nov 20, 2025

View reviewed changes

vlmeval/dataset/utils/hipho_prompt_inference.py Show resolved Hide resolved

mzr1996 added 3 commits November 21, 2025 18:15

Merge branch 'main' into add-hipho-physics-dataset

685d540

Add hipho_prompt_inference.py utility file

72fba10

Update import statement for prompt inference module

a27e71f

mzr1996 approved these changes Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hipho physics dataset #1318

Add hipho physics dataset #1318

MarkovChain-why commented Nov 19, 2025

Uh oh!

FangXinyu-0913 commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add hipho physics dataset #1318

Are you sure you want to change the base?

Add hipho physics dataset #1318

Conversation

MarkovChain-why commented Nov 19, 2025

Uh oh!

FangXinyu-0913 commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants