Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
2faa752
buld update: eval-factory -> nemo-evaluator (PR #264) + using_contain…
marta-sd Oct 10, 2025
0bab9d4
bulk update: set container version to 25.09.1 (this is what expect to…
marta-sd Oct 10, 2025
6f9e3a5
add docs on trt-llm deployment (PR #278)
marta-sd Oct 10, 2025
d37cce1
unify nvidia api key name (PR #276)
marta-sd Oct 10, 2025
f62f22a
describe extra_docker_args (PR #286)
marta-sd Oct 10, 2025
0af27a1
update exporters (PR #282)
marta-sd Oct 10, 2025
01ec6f0
bulk update: nv-eval -> nemo-evaluator-launcher
marta-sd Oct 10, 2025
9bc7275
fix fdf docs
marta-sd Oct 10, 2025
1fc5cc4
move ToolTalk and BFCL to the right category
marta-sd Oct 10, 2025
d3064aa
fix extending FDF
marta-sd Oct 10, 2025
539cbc5
add trt-llm and generic to list
marta-sd Oct 10, 2025
dc258bb
interceptors: caching
marta-sd Oct 10, 2025
894c1a4
interceptors: request logging
marta-sd Oct 10, 2025
fa29b57
fix tab in exporters/mlflow
marta-sd Oct 10, 2025
d5582e4
remove redundant info from overview
marta-sd Oct 10, 2025
0b960ec
fix caching
marta-sd Oct 10, 2025
c5a4ad2
minor fixes for request logging
marta-sd Oct 10, 2025
4f73b17
add response logging
marta-sd Oct 10, 2025
b8a3d83
endpoint interceptor
marta-sd Oct 10, 2025
546a854
changes for remaining interceptors
marta-sd Oct 10, 2025
d512b5f
add docs for raising errors and response stats interceptors
marta-sd Oct 12, 2025
c93145d
add cli for client error interceptor
marta-sd Oct 13, 2025
52d67c4
fix typos
marta-sd Oct 13, 2025
90f4732
add cli for response stats
marta-sd Oct 13, 2025
a9f8d43
interceptor docs fixes
marta-sd Oct 13, 2025
471bde2
'container' workflow is really a CLI workflow
marta-sd Oct 13, 2025
9921117
minor fixes for configuration
marta-sd Oct 13, 2025
08c1bba
launcher cli
marta-sd Oct 13, 2025
6742677
fixes for evaluator
marta-sd Oct 13, 2025
8b0b8a4
fixes for cli
marta-sd Oct 14, 2025
33fdb4a
list all pypi packages
marta-sd Oct 14, 2025
fb9e156
add quickstart for nemo fw
marta-sd Oct 14, 2025
ec14435
fix icons, silence pydantic autodoc issue
lbliii Oct 14, 2025
40e8515
tabs
lbliii Oct 14, 2025
c6efd7d
minor edits
lbliii Oct 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/about/concepts/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ graph LR
Use the launcher to handle both model deployment and evaluation:

```bash
nv-eval run \
nemo-evaluator-launcher run \
--config-dir examples \
--config-name local_llama_3_1_8b_instruct \
-o deployment.checkpoint_path=/path/to/model \
Expand All @@ -112,7 +112,7 @@ nv-eval run \
Point the launcher to an existing API endpoint:

```bash
nv-eval run \
nemo-evaluator-launcher run \
--config-dir examples \
--config-name local_llama_3_1_8b_instruct \
-o target.api_endpoint.url=http://localhost:8080/v1/completions \
Expand Down
4 changes: 2 additions & 2 deletions docs/about/concepts/evaluation-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ NeMo Evaluator supports several evaluation approaches through containerized harn
- **Function Calling**: Models generate structured outputs for tool use and API interaction scenarios.
- **Safety & Security**: Evaluation against adversarial prompts and safety benchmarks to test model alignment and robustness.

One or more evaluation harnesses implement each approach. To discover available tasks for each approach, use `nv-eval ls tasks`.
One or more evaluation harnesses implement each approach. To discover available tasks for each approach, use `nemo-evaluator-launcher ls tasks`.

## Endpoint Compatibility

Expand All @@ -25,7 +25,7 @@ NeMo Evaluator targets OpenAI-compatible API endpoints. The platform supports th
- **`vlm`**: Vision-language model endpoints supporting image inputs.
- **`embedding`**: Embedding generation endpoints for retrieval evaluation.

Each evaluation task specifies which endpoint types it supports. Verify compatibility using `nv-eval ls tasks`.
Each evaluation task specifies which endpoint types it supports. Verify compatibility using `nemo-evaluator-launcher ls tasks`.

## Metrics

Expand Down
2 changes: 1 addition & 1 deletion docs/about/concepts/framework-definition-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Framework Definition Files

::::{note}
**Who needs this?** This documentation is for framework developers and organizations creating custom evaluation frameworks. If you're running existing evaluation tasks using {ref}`nv-eval <lib-launcher>` (NeMo Evaluator Launcher CLI) or {ref}`eval-factory <nemo-evaluator-cli>` (NeMo Evaluator CLI), you don't need to create FDFs—they're already provided by framework packages.
**Who needs this?** This documentation is for framework developers and organizations creating custom evaluation frameworks. If you're running existing evaluation tasks using {ref}`nemo-evaluator-launcher <lib-launcher>` (NeMo Evaluator Launcher CLI) or {ref}`nemo-evaluator <nemo-evaluator-cli>` (NeMo Evaluator CLI), you don't need to create FDFs—they're already provided by framework packages.
::::

A Framework Definition File (FDF) is a YAML configuration file that serves as the single source of truth for integrating evaluation frameworks into the NeMo Evaluator ecosystem. FDFs define how evaluation frameworks are configured, executed, and integrated with the Eval Factory system.
Expand Down
30 changes: 15 additions & 15 deletions docs/about/key-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,24 @@ Run evaluations anywhere with unified configuration and monitoring:

```bash
# Single command, multiple backends
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct
nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct
nv-eval run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct
nemo-evaluator-launcher run --config-dir examples --config-name local_llama_3_1_8b_instruct
nemo-evaluator-launcher run --config-dir examples --config-name slurm_llama_3_1_8b_instruct
nemo-evaluator-launcher run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct
```

### 100+ Benchmarks Across 17 Harnesses
Access comprehensive benchmark suite with single CLI:

```bash
# Discover available benchmarks
nv-eval ls tasks
nemo-evaluator-launcher ls tasks

# Run academic benchmarks
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
nemo-evaluator-launcher run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o 'evaluation.tasks=["mmlu_pro", "gsm8k", "arc_challenge"]'

# Run safety evaluation
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
nemo-evaluator-launcher run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o 'evaluation.tasks=["aegis_v2", "garak"]'
```

Expand All @@ -43,13 +43,13 @@ First-class integration with MLOps platforms:

```bash
# Export to MLflow
nv-eval export <invocation_id> --dest mlflow
nemo-evaluator-launcher export <invocation_id> --dest mlflow

# Export to Weights & Biases
nv-eval export <invocation_id> --dest wandb
nemo-evaluator-launcher export <invocation_id> --dest wandb

# Export to Google Sheets
nv-eval export <invocation_id> --dest gsheets
nemo-evaluator-launcher export <invocation_id> --dest gsheets
```

## **Core Evaluation Engine (NeMo Evaluator Core)**
Expand Down Expand Up @@ -313,7 +313,7 @@ Built-in safety assessment through specialized containers:

```bash
# Run safety evaluation suite
nv-eval run \
nemo-evaluator-launcher run \
--config-dir examples \
--config-name local_llama_3_1_8b_instruct \
-o 'evaluation.tasks=["aegis_v2", "garak"]'
Expand All @@ -331,22 +331,22 @@ Monitor evaluation progress across all backends:

```bash
# Check evaluation status
nv-eval status <invocation_id>
nemo-evaluator-launcher status <invocation_id>

# Kill running evaluations
nv-eval kill <invocation_id>
nemo-evaluator-launcher kill <invocation_id>
```

### Result Export and Analysis
Export evaluation results to MLOps platforms for downstream analysis:

```bash
# Export to MLflow for experiment tracking
nv-eval export <invocation_id> --dest mlflow
nemo-evaluator-launcher export <invocation_id> --dest mlflow

# Export to Weights & Biases for visualization
nv-eval export <invocation_id> --dest wandb
nemo-evaluator-launcher export <invocation_id> --dest wandb

# Export to Google Sheets for sharing
nv-eval export <invocation_id> --dest gsheets
nemo-evaluator-launcher export <invocation_id> --dest gsheets
```
116 changes: 64 additions & 52 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
import sys

# Add custom extensions directory to Python path
sys.path.insert(0, os.path.abspath('_extensions'))
sys.path.insert(0, os.path.abspath("_extensions"))

project = "NeMo Evaluator SDK"
copyright = "2025, NVIDIA Corporation"
Expand All @@ -43,7 +43,7 @@
"sphinx_copybutton", # For copy button in code blocks,
"sphinx_design", # For grid layout
"sphinx.ext.ifconfig", # For conditional content
"content_gating", # Unified content gating extension
"content_gating", # Unified content gating extension
"myst_codeblock_substitutions", # Our custom MyST substitutions in code blocks
"json_output", # Generate JSON output for each page
"search_assets", # Enhanced search assets extension
Expand All @@ -54,26 +54,41 @@

templates_path = ["_templates"]
exclude_patterns = [
"_build",
"Thumbs.db",
"_build",
"Thumbs.db",
".DS_Store",
"_extensions/*/README.md", # Exclude README files in extension directories
"_extensions/README.md", # Exclude main extensions README
"_extensions/*/__pycache__", # Exclude Python cache directories
"_extensions/*/*/__pycache__", # Exclude nested Python cache directories
"_extensions/*/README.md", # Exclude README files in extension directories
"_extensions/README.md", # Exclude main extensions README
"_extensions/*/__pycache__", # Exclude Python cache directories
"_extensions/*/*/__pycache__", # Exclude nested Python cache directories
]

# -- Options for Intersphinx -------------------------------------------------
# Cross-references to external NVIDIA documentation
intersphinx_mapping = {
"ctk": ("https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest", None),
"gpu-op": ("https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest", None),
"ctk": (
"https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest",
None,
),
"gpu-op": (
"https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest",
None,
),
"ngr-tk": ("https://docs.nvidia.com/nemo/guardrails/latest", None),
"nim-cs": ("https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/", None),
"nim-tc": ("https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/", None),
"nim-cs": (
"https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/",
None,
),
"nim-tc": (
"https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/",
None,
),
"nim-jd": ("https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/", None),
"nim-llm": ("https://docs.nvidia.com/nim/large-language-models/latest/", None),
"driver-linux": ("https://docs.nvidia.com/datacenter/tesla/driver-installation-guide", None),
"driver-linux": (
"https://docs.nvidia.com/datacenter/tesla/driver-installation-guide",
None,
),
"nim-op": ("https://docs.nvidia.com/nim-operator/latest", None),
}

Expand All @@ -83,7 +98,7 @@
# -- Options for JSON Output -------------------------------------------------
# Configure the JSON output extension for comprehensive search indexes
json_output_settings = {
'enabled': True,
"enabled": True,
}

# -- Options for AI Assistant -------------------------------------------------
Expand All @@ -103,8 +118,8 @@
"deflist", # Supports definition lists with term: definition format
"fieldlist", # Enables field lists for metadata like :author: Name
"tasklist", # Adds support for GitHub-style task lists with [ ] and [x]
"attrs_inline", # Enables inline attributes for markdown
"substitution", # Enables substitution for markdown
"attrs_inline", # Enables inline attributes for markdown
"substitution", # Enables substitution for markdown
]

myst_heading_anchors = 5 # Generates anchor links for headings up to level 5
Expand All @@ -121,18 +136,14 @@
"support_email": "update-me",
"min_python_version": "3.8",
"recommended_cuda": "12.0+",
"docker_compose_latest": "25.08.1"
"docker_compose_latest": "25.09.1",
}

# Enable figure numbering
numfig = True

# Optional: customize numbering format
numfig_format = {
'figure': 'Figure %s',
'table': 'Table %s',
'code-block': 'Listing %s'
}
numfig_format = {"figure": "Figure %s", "table": "Table %s", "code-block": "Listing %s"}

# Optional: number within sections
numfig_secnum_depth = 1 # Gives you "Figure 1.1, 1.2, 2.1, etc."
Expand All @@ -141,9 +152,10 @@
# Suppress expected warnings for conditional content builds
suppress_warnings = [
"toc.not_included", # Expected when video docs are excluded from GA builds
"toc.no_title", # Expected for helm docs that include external README files
"docutils", # Expected for autodoc2-generated content with regex patterns and complex syntax
"ref.python", # Expected for ambiguous autodoc2 cross-references (e.g., multiple 'Params' classes)
"toc.no_title", # Expected for helm docs that include external README files
"docutils", # Expected for autodoc2-generated content with regex patterns and complex syntax
"ref.python", # Expected for ambiguous autodoc2 cross-references (e.g., multiple 'Params' classes)
"myst.xref_missing", # Expected for Pydantic BaseModel docstrings that reference Pydantic's own documentation
]

# -- Options for Autodoc2 ---------------------------------------------------
Expand All @@ -153,9 +165,7 @@
# Conditional autodoc2 configuration - only enable if packages exist
# Note: We point to the parent package rather than individual subpackages because
# the subpackages have relative imports between them (e.g., api imports from core)
autodoc2_packages_list = [
"../packages/nemo-evaluator/src/nemo_evaluator/"
]
autodoc2_packages_list = ["../packages/nemo-evaluator/src/nemo_evaluator/"]

# Check if any of the packages actually exist before enabling autodoc2
autodoc2_packages = []
Expand All @@ -168,37 +178,37 @@
if autodoc2_packages:
if "autodoc2" not in extensions:
extensions.append("autodoc2")

autodoc2_render_plugin = "myst" # Use MyST for rendering docstrings
autodoc2_output_dir = "apidocs" # Output directory for autodoc2 (relative to docs/)

# ==================== GOOD DEFAULTS FOR CLEANER DOCS ====================

# Hide implementation details - good defaults for cleaner docs
autodoc2_hidden_objects = [
"dunder", # Hide __methods__ like __init__, __str__, etc.
"private", # Hide _private methods and variables
"dunder", # Hide __methods__ like __init__, __str__, etc.
"private", # Hide _private methods and variables
"inherited", # Hide inherited methods to reduce clutter
]

# Enable module summaries for better organization
autodoc2_module_summary = True

# Sort by name for consistent organization
autodoc2_sort_names = True

# Enhanced docstring processing for better formatting
autodoc2_docstrings = "all" # Include all docstrings for comprehensive docs

# Include class inheritance information - useful for users
autodoc2_class_inheritance = True

# Handle class docstrings properly (merge __init__ with class doc)
autodoc2_class_docstring = "merge"

# Better type annotation handling
autodoc2_type_guard_imports = True

# Replace common type annotations for better readability
autodoc2_replace_annotations = [
("typing.Union", "Union"),
Expand All @@ -207,23 +217,25 @@
("typing.Dict", "Dict"),
("typing.Any", "Any"),
]

# Don't require __all__ to be defined - document all public members
autodoc2_module_all_regexes = [] # Empty list means don't require __all__

# Skip common test and internal modules - customize for your project
autodoc2_skip_module_regexes = [
r".*\.tests?.*", # Skip test modules
r".*\.test_.*", # Skip test files
r".*\._.*", # Skip private modules
r".*\.conftest", # Skip conftest files
r".*\.tests?.*", # Skip test modules
r".*\.test_.*", # Skip test files
r".*\._.*", # Skip private modules
r".*\.conftest", # Skip conftest files
]

# Load index template from external file for better maintainability
template_path = os.path.join(os.path.dirname(__file__), "_templates", "autodoc2_index.rst")
template_path = os.path.join(
os.path.dirname(__file__), "_templates", "autodoc2_index.rst"
)
with open(template_path) as f:
autodoc2_index_template = f.read()

# This is a workaround that uses the parser located in autodoc2_docstrings_parser.py to allow autodoc2 to
# render google style docstrings.
# Related Issue: https://github.com/sphinx-extensions2/sphinx-autodoc2/issues/33
Expand Down Expand Up @@ -272,10 +284,10 @@
},
}

# Add our static files directory
# Add our static files directory
# html_static_path = ["_static"]

html_extra_path = ["project.json", "versions1.json"]

# Note: JSON output configuration has been moved to the consolidated
# json_output_settings dictionary above for better organization and new features!
# Note: JSON output configuration has been moved to the consolidated
# json_output_settings dictionary above for better organization and new features!
Loading