Skip to content

feat: Add Gecko image and video evaluation notebooks #1956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

gregbreard
Copy link
Member

Description

Contributing notebooks that demonstrate text-to-image and text-to-video evaluations with Gecko using the Vertex Evaluation SDK.

Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

No issue fix for this PR.

@gregbreard gregbreard requested a review from a team as a code owner April 9, 2025 20:28
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @gregbreard, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces two new Colab notebooks for evaluating image and video generation using the Gecko evaluation method within the Vertex AI environment. The notebooks provide a step-by-step guide on setting up the environment, defining helper functions and prompt templates, preparing the dataset, and running the evaluation. The core of the evaluation revolves around generating rubrics (QA pairs) based on user prompts and then validating the generated content against these rubrics to compute an overall score.

Highlights

  • Image Evaluation Notebook: The evaluate_images_with_gecko.ipynb notebook demonstrates how to evaluate images generated from text prompts using the Vertex AI evaluation service and the Gecko metric. It includes custom parsing logic to handle the sophisticated outputs of Gecko, along with helper functions for computing scores and presenting the output in a human-readable format.
  • Video Evaluation Notebook: The evaluate_videos_with_gecko.ipynb notebook provides a similar framework for evaluating video generation. It adapts the Gecko evaluation method to the video modality, including specific prompt templates and dataset examples tailored for video content. The notebook also includes helper functions and custom parsing logic.
  • Gecko Evaluation Method: Both notebooks leverage the Gecko evaluation method, which involves generating question-answer pairs (rubrics) based on the prompt and then validating the generated image or video against these rubrics. This approach allows for a more fine-grained and customizable metric compared to standard rubric-based evaluations.
  • Custom Parsing Logic: The pull request introduces custom parsing functions (parse_json_to_qa_records and parse_rubric_results) to handle the specific output format of the Gecko evaluation service. These functions convert the text output of rubric generation into structured QARecord objects and extract answers from the validation step.

Changelog

Click here to see the changelog
  • gemini/evaluation/evaluate_images_with_gecko.ipynb
    • Adds a new Colab notebook demonstrating text-to-image evaluations with Gecko using the Vertex Evaluation SDK.
    • Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
    • Provides code for preparing the dataset, generating rubrics, and evaluating images with rubrics.
    • Implements custom parsing logic to handle Gecko outputs.
    • Includes helper functions for computing scores and pretty printing results.
  • gemini/evaluation/evaluate_videos_with_gecko.ipynb
    • Adds a new Colab notebook demonstrating text-to-video evaluations with Gecko using the Vertex Evaluation SDK.
    • Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
    • Provides code for preparing the dataset, generating rubrics, and evaluating videos with rubrics.
    • Implements custom parsing logic to handle Gecko outputs.
    • Includes helper functions for computing scores and pretty printing results.
    • Uses multiple-choice questions for video evaluation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A model's creation,
Evaluated with Gecko's precision,
Rubrics arise,
'Neath judging eyes,
Quality's truth, a clear decision.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces two new notebooks for evaluating image and video generation using the Gecko evaluation method. The notebooks are well-structured and provide a clear explanation of the evaluation process. However, there are some inconsistencies between the two notebooks that should be addressed to ensure consistency and maintainability.

Summary of Findings

  • Copyright Year: The copyright year in the notebooks is set to 2025. Please confirm that this is the correct year.
  • Missing Question Type in QARecord: The QARecord dataclass in evaluate_videos_with_gecko.ipynb is missing the question_type attribute, which is present in evaluate_images_with_gecko.ipynb. This inconsistency should be addressed.
  • Inconsistent Error Handling in parse_json_to_qa_records: The parse_json_to_qa_records function in evaluate_videos_with_gecko.ipynb returns empty lists for qa_records in case of errors, while evaluate_images_with_gecko.ipynb returns the json_response itself. This inconsistency should be addressed.
  • Inconsistent Keys in _qa_records_to_html_table: The _qa_records_to_html_table function in evaluate_videos_with_gecko.ipynb uses different keys (question, answer_choices, gt_answer) compared to evaluate_images_with_gecko.ipynb (question, gt_answer). This inconsistency should be addressed.
  • Inconsistent Prompt Template Description: The markdown description for prompt templates in evaluate_videos_with_gecko.ipynb mentions image, but it should refer to video.
  • Typo in RUBRIC_VALIDATOR_PROMPT: There is a typo in RUBRIC_VALIDATOR_PROMPT in evaluate_videos_with_gecko.ipynb where 'Vidoe' should be 'Video'.

Merge Readiness

The pull request is not ready to be merged until the identified inconsistencies are addressed. The issues related to copyright year, QARecord attributes, error handling, table keys, and prompt template descriptions should be resolved to ensure the quality and consistency of the code. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

@gregbreard gregbreard self-assigned this Apr 9, 2025
@inardini inardini self-requested a review April 16, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant