-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat: Add Gecko image and video evaluation notebooks #1956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @gregbreard, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces two new Colab notebooks for evaluating image and video generation using the Gecko evaluation method within the Vertex AI environment. The notebooks provide a step-by-step guide on setting up the environment, defining helper functions and prompt templates, preparing the dataset, and running the evaluation. The core of the evaluation revolves around generating rubrics (QA pairs) based on user prompts and then validating the generated content against these rubrics to compute an overall score.
Highlights
- Image Evaluation Notebook: The
evaluate_images_with_gecko.ipynb
notebook demonstrates how to evaluate images generated from text prompts using the Vertex AI evaluation service and the Gecko metric. It includes custom parsing logic to handle the sophisticated outputs of Gecko, along with helper functions for computing scores and presenting the output in a human-readable format. - Video Evaluation Notebook: The
evaluate_videos_with_gecko.ipynb
notebook provides a similar framework for evaluating video generation. It adapts the Gecko evaluation method to the video modality, including specific prompt templates and dataset examples tailored for video content. The notebook also includes helper functions and custom parsing logic. - Gecko Evaluation Method: Both notebooks leverage the Gecko evaluation method, which involves generating question-answer pairs (rubrics) based on the prompt and then validating the generated image or video against these rubrics. This approach allows for a more fine-grained and customizable metric compared to standard rubric-based evaluations.
- Custom Parsing Logic: The pull request introduces custom parsing functions (
parse_json_to_qa_records
andparse_rubric_results
) to handle the specific output format of the Gecko evaluation service. These functions convert the text output of rubric generation into structuredQARecord
objects and extract answers from the validation step.
Changelog
Click here to see the changelog
- gemini/evaluation/evaluate_images_with_gecko.ipynb
- Adds a new Colab notebook demonstrating text-to-image evaluations with Gecko using the Vertex Evaluation SDK.
- Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
- Provides code for preparing the dataset, generating rubrics, and evaluating images with rubrics.
- Implements custom parsing logic to handle Gecko outputs.
- Includes helper functions for computing scores and pretty printing results.
- gemini/evaluation/evaluate_videos_with_gecko.ipynb
- Adds a new Colab notebook demonstrating text-to-video evaluations with Gecko using the Vertex Evaluation SDK.
- Includes sections for setting up the environment, defining helper functions, prompt templates, and metrics.
- Provides code for preparing the dataset, generating rubrics, and evaluating videos with rubrics.
- Implements custom parsing logic to handle Gecko outputs.
- Includes helper functions for computing scores and pretty printing results.
- Uses multiple-choice questions for video evaluation.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A model's creation,
Evaluated with Gecko's precision,
Rubrics arise,
'Neath judging eyes,
Quality's truth, a clear decision.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces two new notebooks for evaluating image and video generation using the Gecko evaluation method. The notebooks are well-structured and provide a clear explanation of the evaluation process. However, there are some inconsistencies between the two notebooks that should be addressed to ensure consistency and maintainability.
Summary of Findings
- Copyright Year: The copyright year in the notebooks is set to 2025. Please confirm that this is the correct year.
- Missing Question Type in QARecord: The
QARecord
dataclass inevaluate_videos_with_gecko.ipynb
is missing thequestion_type
attribute, which is present inevaluate_images_with_gecko.ipynb
. This inconsistency should be addressed. - Inconsistent Error Handling in parse_json_to_qa_records: The
parse_json_to_qa_records
function inevaluate_videos_with_gecko.ipynb
returns empty lists forqa_records
in case of errors, whileevaluate_images_with_gecko.ipynb
returns thejson_response
itself. This inconsistency should be addressed. - Inconsistent Keys in _qa_records_to_html_table: The
_qa_records_to_html_table
function inevaluate_videos_with_gecko.ipynb
uses different keys (question
,answer_choices
,gt_answer
) compared toevaluate_images_with_gecko.ipynb
(question
,gt_answer
). This inconsistency should be addressed. - Inconsistent Prompt Template Description: The markdown description for prompt templates in
evaluate_videos_with_gecko.ipynb
mentions image, but it should refer to video. - Typo in RUBRIC_VALIDATOR_PROMPT: There is a typo in
RUBRIC_VALIDATOR_PROMPT
inevaluate_videos_with_gecko.ipynb
where 'Vidoe' should be 'Video'.
Merge Readiness
The pull request is not ready to be merged until the identified inconsistencies are addressed. The issues related to copyright year, QARecord
attributes, error handling, table keys, and prompt template descriptions should be resolved to ensure the quality and consistency of the code. I am unable to approve this pull request, and recommend that others review and approve this code before merging.
Description
Contributing notebooks that demonstrate text-to-image and text-to-video evaluations with Gecko using the Vertex Evaluation SDK.
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).No issue fix for this PR.