-
Notifications
You must be signed in to change notification settings - Fork 219
Create sample dataset using llm #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…arator instances - Add _MetricDefinition class to separate metric type (definition) from metric value (measurement) - Implement comparator caching via _get_shared_comparator() to ensure only one comparator object exists per unique (method, target, epsilon) combination - All solutions using the same comparison method now share the same comparator instance, reducing memory usage - Maintain full backward compatibility - no changes to Metric class API - Remove TODO comment as the refactoring addresses the concern about mixing metric and metric value concerns
- Add epsilon to __eq__ method to ensure metrics with different epsilon values are correctly differentiated - Add epsilon to __hash__ method to maintain hash contract (must include all fields used in __eq__) - Fix redundant condition in Metric.__gt__ method This fixes critical bugs identified in code review that could cause incorrect metric comparisons.
- Reformat code to comply with black formatting standards - Fixes CI formatting check failure
Greptile OverviewGreptile SummaryThis PR enhances sample data generation to use LLM-based realistic values instead of simple synthetic patterns. The main functional change is in Key changes:
The implementation is solid with proper error handling, though the hardcoded provider string could be made configurable to align with patterns used elsewhere in the codebase. Confidence Score: 4/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 1 comment
| SampleModel = type("SampleModel", (BaseModel,), schema_fields) | ||
|
|
||
| # Use LLM to generate sensible sample values based on field names and types | ||
| provider = Provider("openai/gpt-4o-mini") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: hardcoded openai/gpt-4o-mini provider prevents users from configuring model choice
other tools in this module receive llm_to_use parameter (see get_training_code_generation_tool, get_select_target_metric). consider adding provider parameter or getting from config
| provider = Provider("openai/gpt-4o-mini") | |
| # Use LLM to generate sensible sample values based on field names and types | |
| # TODO: make provider configurable via parameter | |
| provider = Provider("openai/gpt-4o-mini") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: plexe/tools/datasets.py
Line: 135:135
Comment:
**style:** hardcoded `openai/gpt-4o-mini` provider prevents users from configuring model choice
other tools in this module receive `llm_to_use` parameter (see `get_training_code_generation_tool`, `get_select_target_metric`). consider adding provider parameter or getting from config
```suggestion
# Use LLM to generate sensible sample values based on field names and types
# TODO: make provider configurable via parameter
provider = Provider("openai/gpt-4o-mini")
```
How can I resolve this? If you propose a fix, please make it concise.- Fix black formatting issues for CI compliance - Format both files to match project style standards
Completed a TODO where llm is called for dataset generation keeping the orginal manual logic there as fallback