feat: Add Gecko image and video evaluation notebooks #1956

gregbreard · 2025-04-09T20:28:21Z

Description

Contributing notebooks that demonstrate text-to-image and text-to-video evaluations with Gecko using the Vertex Evaluation SDK.

Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

No issue fix for this PR.

gemini-code-assist

Hello @gregbreard, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces two new Colab notebooks for evaluating image and video generation using the Gecko evaluation method within the Vertex AI environment. The notebooks provide a step-by-step guide on setting up the environment, defining helper functions and prompt templates, preparing the dataset, and running the evaluation. The core of the evaluation revolves around generating rubrics (QA pairs) based on user prompts and then validating the generated content against these rubrics to compute an overall score.

Highlights

Image Evaluation Notebook: The evaluate_images_with_gecko.ipynb notebook demonstrates how to evaluate images generated from text prompts using the Vertex AI evaluation service and the Gecko metric. It includes custom parsing logic to handle the sophisticated outputs of Gecko, along with helper functions for computing scores and presenting the output in a human-readable format.
Video Evaluation Notebook: The evaluate_videos_with_gecko.ipynb notebook provides a similar framework for evaluating video generation. It adapts the Gecko evaluation method to the video modality, including specific prompt templates and dataset examples tailored for video content. The notebook also includes helper functions and custom parsing logic.
Gecko Evaluation Method: Both notebooks leverage the Gecko evaluation method, which involves generating question-answer pairs (rubrics) based on the prompt and then validating the generated image or video against these rubrics. This approach allows for a more fine-grained and customizable metric compared to standard rubric-based evaluations.
Custom Parsing Logic: The pull request introduces custom parsing functions (parse_json_to_qa_records and parse_rubric_results) to handle the specific output format of the Gecko evaluation service. These functions convert the text output of rubric generation into structured QARecord objects and extract answers from the validation step.

Changelog

Click here to see the changelog

gemini/evaluation/evaluate_images_with_gecko.ipynb
- Adds a new Colab notebook demonstrating text-to-image evaluations with Gecko using the Vertex Evaluation SDK.
- Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
- Provides code for preparing the dataset, generating rubrics, and evaluating images with rubrics.
- Implements custom parsing logic to handle Gecko outputs.
- Includes helper functions for computing scores and pretty printing results.
gemini/evaluation/evaluate_videos_with_gecko.ipynb
- Adds a new Colab notebook demonstrating text-to-video evaluations with Gecko using the Vertex Evaluation SDK.
- Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
- Provides code for preparing the dataset, generating rubrics, and evaluating videos with rubrics.
- Implements custom parsing logic to handle Gecko outputs.
- Includes helper functions for computing scores and pretty printing results.
- Uses multiple-choice questions for video evaluation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A model's creation,
Evaluated with Gecko's precision,
Rubrics arise,
'Neath judging eyes,
Quality's truth, a clear decision.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces two new notebooks for evaluating image and video generation using the Gecko evaluation method. The notebooks are well-structured and provide a clear explanation of the evaluation process. However, there are some inconsistencies between the two notebooks that should be addressed to ensure consistency and maintainability.

Summary of Findings

Copyright Year: The copyright year in the notebooks is set to 2025. Please confirm that this is the correct year.
Missing Question Type in QARecord: The QARecord dataclass in evaluate_videos_with_gecko.ipynb is missing the question_type attribute, which is present in evaluate_images_with_gecko.ipynb. This inconsistency should be addressed.
Inconsistent Error Handling in parse_json_to_qa_records: The parse_json_to_qa_records function in evaluate_videos_with_gecko.ipynb returns empty lists for qa_records in case of errors, while evaluate_images_with_gecko.ipynb returns the json_response itself. This inconsistency should be addressed.
Inconsistent Keys in _qa_records_to_html_table: The _qa_records_to_html_table function in evaluate_videos_with_gecko.ipynb uses different keys (question, answer_choices, gt_answer) compared to evaluate_images_with_gecko.ipynb (question, gt_answer). This inconsistency should be addressed.
Inconsistent Prompt Template Description: The markdown description for prompt templates in evaluate_videos_with_gecko.ipynb mentions image, but it should refer to video.
Typo in RUBRIC_VALIDATOR_PROMPT: There is a typo in RUBRIC_VALIDATOR_PROMPT in evaluate_videos_with_gecko.ipynb where 'Vidoe' should be 'Video'.

Merge Readiness

The pull request is not ready to be merged until the identified inconsistencies are addressed. The issues related to copyright year, QARecord attributes, error handling, table keys, and prompt template descriptions should be resolved to ensure the quality and consistency of the code. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

gemini/evaluation/evaluate_videos_with_gecko.ipynb

Greg Breard added 2 commits April 8, 2025 18:53

feat: Add Gecko image and video evaluation notebooks

72ac5ff

Clarify documentation and corresponding code in notebook.

6d76b22

gregbreard requested a review from a team as a code owner April 9, 2025 20:28

gemini-code-assist bot reviewed Apr 9, 2025

View reviewed changes

gemini-code-assist bot suggested changes Apr 9, 2025

View reviewed changes

gemini/evaluation/evaluate_videos_with_gecko.ipynb Show resolved Hide resolved

gemini/evaluation/evaluate_videos_with_gecko.ipynb Outdated Show resolved Hide resolved

gemini/evaluation/evaluate_videos_with_gecko.ipynb Show resolved Hide resolved

Greg Breard added 3 commits April 9, 2025 20:52

Fix typos and inconsistencies.

af49fef

Fix spelling errors.

06870bb

Fix pip.

fc14312

gregbreard self-assigned this Apr 9, 2025

inardini self-requested a review April 16, 2025 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Gecko image and video evaluation notebooks #1956

feat: Add Gecko image and video evaluation notebooks #1956

gregbreard commented Apr 9, 2025

gemini-code-assist bot left a comment

gemini-code-assist bot left a comment

feat: Add Gecko image and video evaluation notebooks #1956

Are you sure you want to change the base?

feat: Add Gecko image and video evaluation notebooks #1956

Conversation

gregbreard commented Apr 9, 2025

Description

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness