feat: Update agent eval notebooks with fixes & ADK #1975

inardini · 2025-04-15T21:04:54Z

Description

This PR fixes some bugs in the current agent eval notebooks (CrewAI and LangGraph) and introduces a new notebook to evaluate an ADK agent.

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

gemini-code-assist

Hello @inardini, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request focuses on updating the evaluating_crewai_agent.ipynb notebook. The primary goal is to ensure the notebook functions correctly with the latest versions of crewai and related libraries, and to improve the BYOD evaluation process. The changes include updating package versions, adjusting code for compatibility, and enhancing the visualization of evaluation results.

Highlights

Dependency Updates: The pull request updates the crewai and crewai-tools packages to specific versions (0.95.0 and 0.25.8, respectively) and adjusts the installation commands.
Code Compatibility: The pull request modifies the code to ensure compatibility with the updated libraries, including changes to how trajectories are handled and displayed.
BYOD Evaluation Enhancement: The pull request enhances the Bring Your Own Dataset (BYOD) evaluation process by ensuring that predicted and reference trajectories are properly serialized and handled as JSON strings.

Changelog

gemini/evaluation/evaluating_crewai_agent.ipynb
- Updates crewai and crewai-tools package versions in the installation command.
- Removes unnecessary package dependencies like cloudpickle, pydantic, and requests from the initial installation.
- Reorders imports and adds import json for trajectory handling.
- Removes unused imports related to crewai.flow.flow.
- Adds JSON serialization for predicted trajectories before returning the final output.
- Adds JSON deserialization for predicted trajectories before displaying function calls.
- Updates the BYOD sample dataset to serialize predicted_trajectory and reference_trajectory as JSON strings.
- Updates the title of the BYOD section to reflect that it is evaluating a CrewAI agent, not a LangGraph agent.
- Removes the 'Define router using Flow' section.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A notebook's code,
Needs updates to stay afloat,
New versions arrive.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request updates the CrewAI agent eval notebook, focusing on dependency management, code clarity, and BYOD evaluation. The changes seem reasonable and improve the notebook's usability and maintainability. However, some areas could benefit from further refinement.

Summary of Findings

Dependency Management: The update streamlines dependency installation, but it's crucial to ensure that the specified versions are compatible and tested thoroughly to avoid potential runtime issues.
Code Clarity: The removal of unused imports and code blocks enhances readability. However, ensure that these removals don't inadvertently affect other parts of the notebook.
BYOD Evaluation: The BYOD evaluation section is enhanced with trajectory serialization, but the hardcoded data and string conversion could be more flexible and robust.

Merge Readiness

The pull request is almost ready for merging. Addressing the high and medium severity issues would significantly improve the code quality and reliability. I am unable to directly approve the pull request, and users should have others review and approve this code before merging. I recommend addressing the identified issues before merging.

gemini/evaluation/evaluating_crewai_agent.ipynb

gemini/evaluation/evaluating_adk_agent.ipynb

inardini added 3 commits April 15, 2025 20:58

fix issues

290cb65

linter passed

5183afd

linter passed

da9bb83

inardini requested a review from a team as a code owner April 15, 2025 21:04

gemini-code-assist bot reviewed Apr 15, 2025

View reviewed changes

gemini-code-assist bot suggested changes Apr 15, 2025

View reviewed changes

gemini/evaluation/evaluating_crewai_agent.ipynb Show resolved Hide resolved

gemini/evaluation/evaluating_crewai_agent.ipynb Show resolved Hide resolved

gemini/evaluation/evaluating_crewai_agent.ipynb Show resolved Hide resolved

inardini changed the title ~~fix: Update CrewAI agent eval notebook~~ fix: Update agent eval notebooks Apr 15, 2025

inardini added 5 commits April 15, 2025 21:14

fix issues

3c7fcd3

linter passed

f7206e4

add output uri for UI

1505581

adding adk agent eval

d2c735f

linter passed

2c18d71

inardini changed the title ~~fix: Update agent eval notebooks~~ feat: Update agent eval notebooks with fixes & ADK Apr 16, 2025

holtskinner requested changes Apr 16, 2025

View reviewed changes

gemini/evaluation/evaluating_adk_agent.ipynb Outdated Show resolved Hide resolved

formatting

9b11d77

holtskinner assigned inardini Apr 16, 2025

inardini and others added 2 commits April 16, 2025 14:49

holt review

0e671d0

Formatting

181c316

holtskinner approved these changes Apr 16, 2025

View reviewed changes

holtskinner merged commit dcac4de into GoogleCloudPlatform:main Apr 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update agent eval notebooks with fixes & ADK #1975

feat: Update agent eval notebooks with fixes & ADK #1975

inardini commented Apr 15, 2025 •

edited

Loading

gemini-code-assist bot left a comment

gemini-code-assist bot left a comment

feat: Update agent eval notebooks with fixes & ADK #1975

feat: Update agent eval notebooks with fixes & ADK #1975

Conversation

inardini commented Apr 15, 2025 • edited Loading

Description

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

inardini commented Apr 15, 2025 •

edited

Loading