Skip to content

Conversation

@MarkovChain-why
Copy link

update some requirements and refactor the code

MarkovChain-why and others added 9 commits October 28, 2025 19:41
- Remove custom LogBuffer class and thread-safe logging
- Replace safe_print with standard print statements
- Remove threading and datetime imports
- Simplify build_prompt function by removing verbose debug output
- Update dataset URL from haiyuanwan/HiPhO to HY-Wan/HiPhO
- Reduce code from 899 to 803 lines (10.7% reduction)
- Maintain all core functionality: evaluation logic, prompt building, hipho_verifier integration
- Remove complex parallel evaluation using track_progress_rich
- Simplify to sequential evaluation for better stability and debugging
- Remove multiprocessing and parallel task management dependencies
- Rename functions to remove '_with_buffer' suffix and log_buffer parameters
- Remove nproc parameter handling and temporary file management
- Reduce code from 803 to 774 lines (additional 3.6% reduction)
- Maintain all core evaluation logic: fine/coarse-grained scoring, hipho_verifier integration
- Sequential evaluation is sufficient for physics olympiad problem counts
Major improvements:
- Remove 6 unnecessary try-except blocks that were hiding errors
- Standardize judge model initialization to follow VLMEvalKit conventions
- Move all prompt templates to utils/prompt_inference.py for better organization
- Remove redundant count statistics (fine_grained_count, coarse_grained_count, total_count)
- Remove unused fallback functions (_simple_answer_matching, _extract_prediction_for_display)
- Fix multi-image base64 processing bug
- Correct dataset name display in summary output
- Remove verbose debugging output and unnecessary comments

Code reduction: 899 → 604 lines (32.8% reduction)
Eliminated potential bugs and improved maintainability while preserving all core functionality
…iguration

- Translate all Chinese comments to English in hipho.py, hipho_verifier.py, and prompt_inference.py
- Simplify comments while maintaining technical accuracy
- Replace hardcoded verifier model configuration with environment variables
- Use VLMEvalKit standard environment variable approach for better flexibility
- Add support for HIPHO_VERIFIER_* environment variables for model configuration
- Improve code maintainability and international accessibility
- Add datasets: for HuggingFace dataset loading
- Add scikit-learn: for machine learning utilities
- Add pylatexenc==2.10: for LaTeX text processing
- Add math-verify: for mathematical answer verification

These dependencies are required for the HiPhO physics olympiad dataset
evaluation and verification functionality.
@FangXinyu-0913
Copy link
Collaborator

Hi @MarkovChain-why, it seems this pull request is duplicate with #1293. Can I close the previous one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants