Skip to content

Conversation

@mairin
Copy link
Collaborator

@mairin mairin commented Oct 10, 2025

This is a user experience improvement. Time-stamp based eval output is difficult to navigate. This names eval output runs based on the eval data filename as a fallback, based on a field in the yaml for the eval data if provided, and provides a command-line flag that overrides the other two. This should make it easier to find results or annotate specific runs.

Summary by CodeRabbit

  • New Features
    • Add optional run name for evaluation runs, set via CLI flag (--run-name) or auto-derived when omitted.
    • Enforce a maximum run name length with robust sanitization to ensure safe filenames.
    • Prefix generated output filenames with the run name when provided.
    • Include the run name in both JSON and text summaries.
  • Tests
    • Add comprehensive unit tests for run name sanitization, covering whitespace handling, unsafe characters, length limits, control characters, and diverse Unicode inputs.

mairin and others added 7 commits October 10, 2025 15:33
- Add MAX_RUN_NAME_LENGTH constant (100 chars) to constants.py
- Add optional run_name field to EvaluationData model with max length validation
- Prepares foundation for customizable evaluation run naming

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Create new utils package with sanitize_run_name function
- Sanitizes filesystem-unsafe characters (/, \, :, *, ?, ", ', `, <, >, |, control chars)
- Collapses multiple spaces/underscores into single underscore
- Enforces MAX_RUN_NAME_LENGTH (100 chars) limit
- Handles edge cases (empty strings, truncation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Add logic to automatically set run_name from evaluation data YAML filename
- Only applies when run_name is not explicitly provided in YAML
- Uses sanitize_run_name to ensure filesystem safety
- Example: rh124_filesystem_basics.yaml → run_name: "rh124_filesystem_basics"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Add --run-name CLI argument to evaluation runner
- Implement priority: CLI override > YAML value > filename default
- Pass run_name through runner to DataValidator
- Update DataValidator to accept run_name_override parameter
- All run names sanitized for filesystem safety

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Extract run_name from evaluation data in runner
- Pass run_name to OutputHandler constructor
- Prepend run_name to output filenames when provided
- Format: {run_name}_{base_filename}_{timestamp}
- Backwards compatible when run_name is not provided

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Include run_name in JSON summary output
- Add run_name to text summary report (conditionally)
- Improves traceability of evaluation runs in output files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
- Test basic alphanumeric strings and edge cases
- Test filesystem-unsafe character replacement
- Test whitespace/underscore collapsing
- Test max length enforcement with trailing underscore removal
- Test Unicode support (emojis, Japanese kanji, Chinese characters)
- Test Unicode + unsafe character combinations
- All 15 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Máirín Duffy <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

Introduces a run name concept across evaluation: adds MAX_RUN_NAME_LENGTH, a sanitize_run_name utility, a run_name field on EvaluationData, optional run_name plumbing through validator, runner, and output generator, adjusts filename/summary generation, and extends the CLI with --run-name.

Changes

Cohort / File(s) Summary of edits
Constants
src/lightspeed_evaluation/core/constants.py
Added public constant MAX_RUN_NAME_LENGTH = 100.
Models: EvaluationData
src/lightspeed_evaluation/core/models/data.py
Added optional field run_name with max_length=MAX_RUN_NAME_LENGTH; imported the constant.
Output handling
src/lightspeed_evaluation/core/output/generator.py
OutputHandler accepts optional run_name; prefixes filenames with run_name when provided; includes run_name in JSON and text summaries.
Validation flow
src/lightspeed_evaluation/core/system/validator.py
load_evaluation_data now accepts run_name_override; sets data_dict["run_name"] with priority: override (sanitized) > YAML value > filename (sanitized).
Utils: sanitize export
src/lightspeed_evaluation/core/utils/__init__.py
Re-exported sanitize_run_name via all.
Utils: sanitizer
src/lightspeed_evaluation/core/utils/sanitize.py
Added sanitize_run_name implementing trimming, unsafe char replacement, underscore collapsing/stripping, control-char handling, and MAX_RUN_NAME_LENGTH enforcement.
Runner and CLI
src/lightspeed_evaluation/runner/evaluation.py
run_evaluation accepts optional run_name; passes override to DataValidator; propagates run_name to OutputHandler; CLI adds --run-name with help mentioning MAX_RUN_NAME_LENGTH.
Tests: sanitizer
tests/unit/core/test_sanitize.py
New tests covering sanitize_run_name across ASCII, whitespace, unsafe chars, control chars, Unicode, and max-length truncation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as CLI User
  participant CLI as CLI Parser
  participant Runner as run_evaluation
  participant Validator as DataValidator
  participant Utils as sanitize_run_name
  participant Model as EvaluationData
  participant Output as OutputHandler

  User->>CLI: lightspeed-eval ... --run-name "<name>"
  CLI->>Runner: run_evaluation(..., run_name)
  Runner->>Validator: load_evaluation_data(path, run_name_override=run_name)
  alt override provided
    Validator->>Utils: sanitize_run_name(override)
    Utils-->>Validator: sanitized_name
    Validator->>Model: construct EvaluationData(run_name=sanitized_name)
  else YAML has run_name
    Validator->>Model: construct EvaluationData(run_name=yaml_value)
  else derive from filename
    Validator->>Utils: sanitize_run_name(filename_stem)
    Utils-->>Validator: sanitized_stem
    Validator->>Model: construct EvaluationData(run_name=sanitized_stem)
  end
  Validator-->>Runner: list[EvaluationData]
  Runner->>Output: OutputHandler(..., run_name=first_item.run_name)
  Output->>Output: generate_reports(prefix with run_name, include in summaries)
  Output-->>Runner: artifacts
  Runner-->>CLI: summary/status
  CLI-->>User: results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I typed a name, neat as a pin,
Sanitized, trimmed—now safe to begin.
Filenames hum with ordered grace,
Reports wear titles in every place.
Thump goes my paw—tests all green;
Hop, hop! The cleanest run you’ve seen. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the primary feature of adding support for naming evaluation runs, directly reflecting the main change in the pull request. It follows conventional commit style with a clear “feat:” prefix and avoids unnecessary detail or jargon. This makes it easy for team members to understand the intent at a glance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mairin mairin changed the title feature: support for adding names to eval runs feat: support for adding names to eval runs Oct 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 928901e and ade1fe7.

📒 Files selected for processing (8)
  • src/lightspeed_evaluation/core/constants.py (1 hunks)
  • src/lightspeed_evaluation/core/models/data.py (2 hunks)
  • src/lightspeed_evaluation/core/output/generator.py (4 hunks)
  • src/lightspeed_evaluation/core/system/validator.py (3 hunks)
  • src/lightspeed_evaluation/core/utils/__init__.py (1 hunks)
  • src/lightspeed_evaluation/core/utils/sanitize.py (1 hunks)
  • src/lightspeed_evaluation/runner/evaluation.py (4 hunks)
  • tests/unit/core/test_sanitize.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/lightspeed_evaluation/core/system/validator.py (2)
src/lightspeed_evaluation/core/utils/sanitize.py (1)
  • sanitize_run_name (8-49)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationData (141-193)
src/lightspeed_evaluation/core/utils/__init__.py (1)
src/lightspeed_evaluation/core/utils/sanitize.py (1)
  • sanitize_run_name (8-49)
tests/unit/core/test_sanitize.py (1)
src/lightspeed_evaluation/core/utils/sanitize.py (1)
  • sanitize_run_name (8-49)
src/lightspeed_evaluation/runner/evaluation.py (2)
src/lightspeed_evaluation/core/system/validator.py (1)
  • load_evaluation_data (86-154)
src/lightspeed_evaluation/core/output/generator.py (1)
  • OutputHandler (23-310)
🔇 Additional comments (16)
src/lightspeed_evaluation/core/constants.py (1)

22-22: LGTM!

The MAX_RUN_NAME_LENGTH = 100 constant is well-placed and provides a reasonable constraint for filesystem-safe run names. This value balances usability with filesystem path length limits.

src/lightspeed_evaluation/core/utils/__init__.py (1)

1-5: LGTM!

The module correctly exports sanitize_run_name with proper __all__ declaration, following Python best practices for package-level exports.

src/lightspeed_evaluation/core/models/data.py (1)

154-158: LGTM!

The run_name field is well-integrated into the EvaluationData model with proper Pydantic validation using max_length=MAX_RUN_NAME_LENGTH. The dynamic description string provides clear guidance to users.

src/lightspeed_evaluation/core/utils/sanitize.py (1)

8-49: LGTM!

The sanitize_run_name function is well-implemented with comprehensive sanitization logic:

  • Handles empty input and whitespace correctly
  • Replaces filesystem-unsafe characters (including control characters) with underscores
  • Collapses multiple spaces/underscores
  • Enforces maximum length with trailing underscore cleanup
  • Preserves Unicode characters appropriately

The implementation aligns with the extensive test coverage provided.

tests/unit/core/test_sanitize.py (1)

9-114: LGTM!

The test suite for sanitize_run_name is comprehensive and well-structured, covering:

  • Basic alphanumeric passthrough
  • Edge cases (empty strings, whitespace)
  • Filesystem-unsafe character replacement
  • Space/underscore collapsing
  • Length enforcement with trailing underscore cleanup
  • Control character handling
  • Unicode preservation and mixed Unicode/ASCII scenarios
  • Real-world YAML filename patterns

Excellent test coverage for the sanitization utility.

src/lightspeed_evaluation/core/output/generator.py (4)

31-44: LGTM!

The OutputHandler correctly integrates run_name support with proper initialization and storage.


53-56: LGTM!

The filename construction logic correctly prepends run_name when provided, maintaining backward compatibility when it's not present.


195-195: LGTM!

Including run_name in the JSON summary provides useful metadata for tracking evaluation runs.


242-243: LGTM!

The conditional inclusion of run_name in the text summary is clean and user-friendly.

src/lightspeed_evaluation/runner/evaluation.py (7)

9-9: LGTM!

Importing MAX_RUN_NAME_LENGTH for the CLI help text provides clear guidance to users about the constraint.


14-25: LGTM!

The function signature is properly updated to support run_name with clear documentation.


57-59: LGTM!

Correctly passing run_name as run_name_override to the data validator maintains the priority chain: CLI → YAML → filename.


84-84: LGTM!

Correctly passing run_name to the OutputHandler for consistent labeling across all generated reports.


133-136: LGTM!

The CLI argument is well-documented with the length constraint clearly communicated to users.


140-142: LGTM!

Correctly plumbing the CLI argument through to the evaluation function.


76-79: Verify run_name presence and consistency in evaluation YAML. I couldn’t locate any run_name fields in config/evaluation_data.yaml; please confirm whether run_name entries exist and, if there are multiple, enforce a single value per file or document that only the first is used.

Comment on lines +119 to +127
# Set run_name with priority: CLI override > YAML value > filename
if run_name_override is not None:
# CLI override takes highest priority
data_dict["run_name"] = sanitize_run_name(run_name_override)
elif "run_name" not in data_dict or data_dict["run_name"] is None:
# Default to YAML filename if not provided
yaml_filename = Path(data_path).stem
data_dict["run_name"] = sanitize_run_name(yaml_filename)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitize YAML-provided run_name values.

The current logic sanitizes the CLI override and filename fallback but doesn't sanitize run_name values provided in the YAML file. This could allow filesystem-unsafe characters to pass through validation.

Apply this diff to ensure all run_name values are sanitized:

                # Set run_name with priority: CLI override > YAML value > filename
                if run_name_override is not None:
                    # CLI override takes highest priority
                    data_dict["run_name"] = sanitize_run_name(run_name_override)
                elif "run_name" not in data_dict or data_dict["run_name"] is None:
                    # Default to YAML filename if not provided
                    yaml_filename = Path(data_path).stem
                    data_dict["run_name"] = sanitize_run_name(yaml_filename)
+               else:
+                   # Sanitize YAML-provided run_name
+                   data_dict["run_name"] = sanitize_run_name(data_dict["run_name"])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Set run_name with priority: CLI override > YAML value > filename
if run_name_override is not None:
# CLI override takes highest priority
data_dict["run_name"] = sanitize_run_name(run_name_override)
elif "run_name" not in data_dict or data_dict["run_name"] is None:
# Default to YAML filename if not provided
yaml_filename = Path(data_path).stem
data_dict["run_name"] = sanitize_run_name(yaml_filename)
# Set run_name with priority: CLI override > YAML value > filename
if run_name_override is not None:
# CLI override takes highest priority
data_dict["run_name"] = sanitize_run_name(run_name_override)
elif "run_name" not in data_dict or data_dict["run_name"] is None:
# Default to YAML filename if not provided
yaml_filename = Path(data_path).stem
data_dict["run_name"] = sanitize_run_name(yaml_filename)
else:
# Sanitize YAML-provided run_name
data_dict["run_name"] = sanitize_run_name(data_dict["run_name"])
🤖 Prompt for AI Agents
In src/lightspeed_evaluation/core/system/validator.py around lines 119 to 127,
the code sanitizes the CLI override and filename fallback but leaves a
YAML-provided data_dict["run_name"] unsanitized; update the logic so that after
handling CLI override and filename default, any existing YAML-provided run_name
is passed through sanitize_run_name before assignment (i.e., when
run_name_override is None and "run_name" exists and is not None, replace
data_dict["run_name"] with sanitize_run_name(data_dict["run_name"]) so all
run_name sources are sanitized).

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature, thanks! I'd prefer existing filename sanitize solution, other than that looks very good.

from lightspeed_evaluation.core.constants import MAX_RUN_NAME_LENGTH


def sanitize_run_name(run_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pathvalidate, the advantage would be that we don't need maintain+test this func.

@VladimirKadlec
Copy link
Contributor

Please fix the checks, they should run automatically on your new commits/PRs from now.
I know our Contribution Guide is outdated, you can run all the checks locally:

uv run black --check .  (remove --check to reformat)
uv run pydocstyle -v .
make pyright
make pylint
make ruff
make check-types
uv run python -m pytest tests --cov=src --cov-report term-missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants