feat: support for adding names to eval runs #80

mairin · 2025-10-10T22:34:29Z

This is a user experience improvement. Time-stamp based eval output is difficult to navigate. This names eval output runs based on the eval data filename as a fallback, based on a field in the yaml for the eval data if provided, and provides a command-line flag that overrides the other two. This should make it easier to find results or annotate specific runs.

Summary by CodeRabbit

New Features
- Add optional run name for evaluation runs, set via CLI flag (--run-name) or auto-derived when omitted.
- Enforce a maximum run name length with robust sanitization to ensure safe filenames.
- Prefix generated output filenames with the run name when provided.
- Include the run name in both JSON and text summaries.
Tests
- Add comprehensive unit tests for run name sanitization, covering whitespace handling, unsafe characters, length limits, control characters, and diverse Unicode inputs.

- Add MAX_RUN_NAME_LENGTH constant (100 chars) to constants.py - Add optional run_name field to EvaluationData model with max length validation - Prepares foundation for customizable evaluation run naming 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Create new utils package with sanitize_run_name function - Sanitizes filesystem-unsafe characters (/, \, :, *, ?, ", ', `, <, >, |, control chars) - Collapses multiple spaces/underscores into single underscore - Enforces MAX_RUN_NAME_LENGTH (100 chars) limit - Handles edge cases (empty strings, truncation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Add logic to automatically set run_name from evaluation data YAML filename - Only applies when run_name is not explicitly provided in YAML - Uses sanitize_run_name to ensure filesystem safety - Example: rh124_filesystem_basics.yaml → run_name: "rh124_filesystem_basics" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Add --run-name CLI argument to evaluation runner - Implement priority: CLI override > YAML value > filename default - Pass run_name through runner to DataValidator - Update DataValidator to accept run_name_override parameter - All run names sanitized for filesystem safety 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Extract run_name from evaluation data in runner - Pass run_name to OutputHandler constructor - Prepend run_name to output filenames when provided - Format: {run_name}_{base_filename}_{timestamp} - Backwards compatible when run_name is not provided 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Include run_name in JSON summary output - Add run_name to text summary report (conditionally) - Improves traceability of evaluation runs in output files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

- Test basic alphanumeric strings and edge cases - Test filesystem-unsafe character replacement - Test whitespace/underscore collapsing - Test max length enforcement with trailing underscore removal - Test Unicode support (emojis, Japanese kanji, Chinese characters) - Test Unicode + unsafe character combinations - All 15 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Máirín Duffy <[email protected]>

coderabbitai · 2025-10-10T22:34:52Z

Walkthrough

Introduces a run name concept across evaluation: adds MAX_RUN_NAME_LENGTH, a sanitize_run_name utility, a run_name field on EvaluationData, optional run_name plumbing through validator, runner, and output generator, adjusts filename/summary generation, and extends the CLI with --run-name.

Changes

Cohort / File(s)	Summary of edits
Constants `src/lightspeed_evaluation/core/constants.py`	Added public constant MAX_RUN_NAME_LENGTH = 100.
Models: EvaluationData `src/lightspeed_evaluation/core/models/data.py`	Added optional field run_name with max_length=MAX_RUN_NAME_LENGTH; imported the constant.
Output handling `src/lightspeed_evaluation/core/output/generator.py`	OutputHandler accepts optional run_name; prefixes filenames with run_name when provided; includes run_name in JSON and text summaries.
Validation flow `src/lightspeed_evaluation/core/system/validator.py`	load_evaluation_data now accepts run_name_override; sets data_dict["run_name"] with priority: override (sanitized) > YAML value > filename (sanitized).
Utils: sanitize export `src/lightspeed_evaluation/core/utils/__init__.py`	Re-exported sanitize_run_name via all.
Utils: sanitizer `src/lightspeed_evaluation/core/utils/sanitize.py`	Added sanitize_run_name implementing trimming, unsafe char replacement, underscore collapsing/stripping, control-char handling, and MAX_RUN_NAME_LENGTH enforcement.
Runner and CLI `src/lightspeed_evaluation/runner/evaluation.py`	run_evaluation accepts optional run_name; passes override to DataValidator; propagates run_name to OutputHandler; CLI adds --run-name with help mentioning MAX_RUN_NAME_LENGTH.
Tests: sanitizer `tests/unit/core/test_sanitize.py`	New tests covering sanitize_run_name across ASCII, whitespace, unsafe chars, control chars, Unicode, and max-length truncation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as CLI User
  participant CLI as CLI Parser
  participant Runner as run_evaluation
  participant Validator as DataValidator
  participant Utils as sanitize_run_name
  participant Model as EvaluationData
  participant Output as OutputHandler

  User->>CLI: lightspeed-eval ... --run-name "<name>"
  CLI->>Runner: run_evaluation(..., run_name)
  Runner->>Validator: load_evaluation_data(path, run_name_override=run_name)
  alt override provided
    Validator->>Utils: sanitize_run_name(override)
    Utils-->>Validator: sanitized_name
    Validator->>Model: construct EvaluationData(run_name=sanitized_name)
  else YAML has run_name
    Validator->>Model: construct EvaluationData(run_name=yaml_value)
  else derive from filename
    Validator->>Utils: sanitize_run_name(filename_stem)
    Utils-->>Validator: sanitized_stem
    Validator->>Model: construct EvaluationData(run_name=sanitized_stem)
  end
  Validator-->>Runner: list[EvaluationData]
  Runner->>Output: OutputHandler(..., run_name=first_item.run_name)
  Output->>Output: generate_reports(prefix with run_name, include in summaries)
  Output-->>Runner: artifacts
  Runner-->>CLI: summary/status
  CLI-->>User: results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I typed a name, neat as a pin,
Sanitized, trimmed—now safe to begin.
Filenames hum with ordered grace,
Reports wear titles in every place.
Thump goes my paw—tests all green;
Hop, hop! The cleanest run you’ve seen. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely describes the primary feature of adding support for naming evaluation runs, directly reflecting the main change in the pull request. It follows conventional commit style with a clear “feat:” prefix and avoids unnecessary detail or jargon. This makes it easy for team members to understand the intent at a glance.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 928901e and ade1fe7.

📒 Files selected for processing (8)

src/lightspeed_evaluation/core/constants.py (1 hunks)
src/lightspeed_evaluation/core/models/data.py (2 hunks)
src/lightspeed_evaluation/core/output/generator.py (4 hunks)
src/lightspeed_evaluation/core/system/validator.py (3 hunks)
src/lightspeed_evaluation/core/utils/__init__.py (1 hunks)
src/lightspeed_evaluation/core/utils/sanitize.py (1 hunks)
src/lightspeed_evaluation/runner/evaluation.py (4 hunks)
tests/unit/core/test_sanitize.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

src/lightspeed_evaluation/core/system/validator.py (2)

src/lightspeed_evaluation/core/utils/sanitize.py (1)

sanitize_run_name (8-49)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationData (141-193)

src/lightspeed_evaluation/core/utils/__init__.py (1)

src/lightspeed_evaluation/core/utils/sanitize.py (1)

sanitize_run_name (8-49)

tests/unit/core/test_sanitize.py (1)

src/lightspeed_evaluation/core/utils/sanitize.py (1)

sanitize_run_name (8-49)

src/lightspeed_evaluation/runner/evaluation.py (2)

src/lightspeed_evaluation/core/system/validator.py (1)

load_evaluation_data (86-154)

src/lightspeed_evaluation/core/output/generator.py (1)

OutputHandler (23-310)

🔇 Additional comments (16)

src/lightspeed_evaluation/core/constants.py (1)

22-22: LGTM!

The MAX_RUN_NAME_LENGTH = 100 constant is well-placed and provides a reasonable constraint for filesystem-safe run names. This value balances usability with filesystem path length limits.

src/lightspeed_evaluation/core/utils/__init__.py (1)

1-5: LGTM!

The module correctly exports sanitize_run_name with proper __all__ declaration, following Python best practices for package-level exports.

src/lightspeed_evaluation/core/models/data.py (1)

154-158: LGTM!

The run_name field is well-integrated into the EvaluationData model with proper Pydantic validation using max_length=MAX_RUN_NAME_LENGTH. The dynamic description string provides clear guidance to users.

src/lightspeed_evaluation/core/utils/sanitize.py (1)

8-49: LGTM!

The sanitize_run_name function is well-implemented with comprehensive sanitization logic:

Handles empty input and whitespace correctly

Replaces filesystem-unsafe characters (including control characters) with underscores

Collapses multiple spaces/underscores

Enforces maximum length with trailing underscore cleanup

Preserves Unicode characters appropriately

The implementation aligns with the extensive test coverage provided.

tests/unit/core/test_sanitize.py (1)

9-114: LGTM!

The test suite for sanitize_run_name is comprehensive and well-structured, covering:

Basic alphanumeric passthrough

Edge cases (empty strings, whitespace)

Filesystem-unsafe character replacement

Space/underscore collapsing

Length enforcement with trailing underscore cleanup

Control character handling

Unicode preservation and mixed Unicode/ASCII scenarios

Real-world YAML filename patterns

Excellent test coverage for the sanitization utility.

src/lightspeed_evaluation/core/output/generator.py (4)

31-44: LGTM!

The OutputHandler correctly integrates run_name support with proper initialization and storage.

53-56: LGTM!

The filename construction logic correctly prepends run_name when provided, maintaining backward compatibility when it's not present.

195-195: LGTM!

Including run_name in the JSON summary provides useful metadata for tracking evaluation runs.

242-243: LGTM!

The conditional inclusion of run_name in the text summary is clean and user-friendly.

src/lightspeed_evaluation/runner/evaluation.py (7)

9-9: LGTM!

Importing MAX_RUN_NAME_LENGTH for the CLI help text provides clear guidance to users about the constraint.

14-25: LGTM!

The function signature is properly updated to support run_name with clear documentation.

57-59: LGTM!

Correctly passing run_name as run_name_override to the data validator maintains the priority chain: CLI → YAML → filename.

84-84: LGTM!

Correctly passing run_name to the OutputHandler for consistent labeling across all generated reports.

133-136: LGTM!

The CLI argument is well-documented with the length constraint clearly communicated to users.

140-142: LGTM!

Correctly plumbing the CLI argument through to the evaluation function.

76-79: Verify run_name presence and consistency in evaluation YAML. I couldn’t locate any run_name fields in config/evaluation_data.yaml; please confirm whether run_name entries exist and, if there are multiple, enforce a single value per file or document that only the first is used.

coderabbitai · 2025-10-10T22:38:45Z

src/lightspeed_evaluation/core/system/validator.py

+                # Set run_name with priority: CLI override > YAML value > filename
+                if run_name_override is not None:
+                    # CLI override takes highest priority
+                    data_dict["run_name"] = sanitize_run_name(run_name_override)
+                elif "run_name" not in data_dict or data_dict["run_name"] is None:
+                    # Default to YAML filename if not provided
+                    yaml_filename = Path(data_path).stem
+                    data_dict["run_name"] = sanitize_run_name(yaml_filename)
+


⚠️ Potential issue | 🟠 Major

Sanitize YAML-provided run_name values.

The current logic sanitizes the CLI override and filename fallback but doesn't sanitize run_name values provided in the YAML file. This could allow filesystem-unsafe characters to pass through validation.

Apply this diff to ensure all run_name values are sanitized:

# Set run_name with priority: CLI override > YAML value > filename if run_name_override is not None: # CLI override takes highest priority data_dict["run_name"] = sanitize_run_name(run_name_override) elif "run_name" not in data_dict or data_dict["run_name"] is None: # Default to YAML filename if not provided yaml_filename = Path(data_path).stem data_dict["run_name"] = sanitize_run_name(yaml_filename) + else: + # Sanitize YAML-provided run_name + data_dict["run_name"] = sanitize_run_name(data_dict["run_name"])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Set run_name with priority: CLI override > YAML value > filename

if run_name_override is not None:

# CLI override takes highest priority

data_dict["run_name"] = sanitize_run_name(run_name_override)

elif "run_name" not in data_dict or data_dict["run_name"] is None:

# Default to YAML filename if not provided

yaml_filename = Path(data_path).stem

data_dict["run_name"] = sanitize_run_name(yaml_filename)

# Set run_name with priority: CLI override > YAML value > filename

if run_name_override is not None:

# CLI override takes highest priority

data_dict["run_name"] = sanitize_run_name(run_name_override)

elif "run_name" not in data_dict or data_dict["run_name"] is None:

# Default to YAML filename if not provided

yaml_filename = Path(data_path).stem

data_dict["run_name"] = sanitize_run_name(yaml_filename)

else:

# Sanitize YAML-provided run_name

data_dict["run_name"] = sanitize_run_name(data_dict["run_name"])

🤖 Prompt for AI Agents

In src/lightspeed_evaluation/core/system/validator.py around lines 119 to 127, the code sanitizes the CLI override and filename fallback but leaves a YAML-provided data_dict["run_name"] unsanitized; update the logic so that after handling CLI override and filename default, any existing YAML-provided run_name is passed through sanitize_run_name before assignment (i.e., when run_name_override is None and "run_name" exists and is not None, replace data_dict["run_name"] with sanitize_run_name(data_dict["run_name"]) so all run_name sources are sanitized).

VladimirKadlec

Great feature, thanks! I'd prefer existing filename sanitize solution, other than that looks very good.

VladimirKadlec · 2025-10-13T06:51:15Z

src/lightspeed_evaluation/core/utils/sanitize.py

+from lightspeed_evaluation.core.constants import MAX_RUN_NAME_LENGTH
+
+
+def sanitize_run_name(run_name: str) -> str:


Consider pathvalidate, the advantage would be that we don't need maintain+test this func.

VladimirKadlec · 2025-10-15T11:09:06Z

Please fix the checks, they should run automatically on your new commits/PRs from now.
I know our Contribution Guide is outdated, you can run all the checks locally:

uv run black --check .  (remove --check to reformat)
uv run pydocstyle -v .
make pyright
make pylint
make ruff
make check-types
uv run python -m pytest tests --cov=src --cov-report term-missing

mairin and others added 7 commits October 10, 2025 15:33

mairin changed the title ~~feature: support for adding names to eval runs~~ feat: support for adding names to eval runs Oct 10, 2025

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

VladimirKadlec reviewed Oct 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support for adding names to eval runs #80

feat: support for adding names to eval runs #80

Uh oh!

mairin commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 10, 2025

Uh oh!

VladimirKadlec left a comment

Uh oh!

VladimirKadlec Oct 13, 2025

Uh oh!

VladimirKadlec commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from lightspeed_evaluation.core.constants import MAX_RUN_NAME_LENGTH


		def sanitize_run_name(run_name: str) -> str:

feat: support for adding names to eval runs #80

Are you sure you want to change the base?

feat: support for adding names to eval runs #80

Uh oh!

Conversation

mairin commented Oct 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mairin commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2025 •

edited

Loading