Skip to content

feat: Update agent eval notebooks with fixes & ADK #1975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 16, 2025

Conversation

inardini
Copy link
Contributor

@inardini inardini commented Apr 15, 2025

Description

This PR fixes some bugs in the current agent eval notebooks (CrewAI and LangGraph) and introduces a new notebook to evaluate an ADK agent.

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@inardini inardini requested a review from a team as a code owner April 15, 2025 21:04
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @inardini, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request focuses on updating the evaluating_crewai_agent.ipynb notebook. The primary goal is to ensure the notebook functions correctly with the latest versions of crewai and related libraries, and to improve the BYOD evaluation process. The changes include updating package versions, adjusting code for compatibility, and enhancing the visualization of evaluation results.

Highlights

  • Dependency Updates: The pull request updates the crewai and crewai-tools packages to specific versions (0.95.0 and 0.25.8, respectively) and adjusts the installation commands.
  • Code Compatibility: The pull request modifies the code to ensure compatibility with the updated libraries, including changes to how trajectories are handled and displayed.
  • BYOD Evaluation Enhancement: The pull request enhances the Bring Your Own Dataset (BYOD) evaluation process by ensuring that predicted and reference trajectories are properly serialized and handled as JSON strings.

Changelog

  • gemini/evaluation/evaluating_crewai_agent.ipynb
    • Updates crewai and crewai-tools package versions in the installation command.
    • Removes unnecessary package dependencies like cloudpickle, pydantic, and requests from the initial installation.
    • Reorders imports and adds import json for trajectory handling.
    • Removes unused imports related to crewai.flow.flow.
    • Adds JSON serialization for predicted trajectories before returning the final output.
    • Adds JSON deserialization for predicted trajectories before displaying function calls.
    • Updates the BYOD sample dataset to serialize predicted_trajectory and reference_trajectory as JSON strings.
    • Updates the title of the BYOD section to reflect that it is evaluating a CrewAI agent, not a LangGraph agent.
    • Removes the 'Define router using Flow' section.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A notebook's code,
Needs updates to stay afloat,
New versions arrive.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates the CrewAI agent eval notebook, focusing on dependency management, code clarity, and BYOD evaluation. The changes seem reasonable and improve the notebook's usability and maintainability. However, some areas could benefit from further refinement.

Summary of Findings

  • Dependency Management: The update streamlines dependency installation, but it's crucial to ensure that the specified versions are compatible and tested thoroughly to avoid potential runtime issues.
  • Code Clarity: The removal of unused imports and code blocks enhances readability. However, ensure that these removals don't inadvertently affect other parts of the notebook.
  • BYOD Evaluation: The BYOD evaluation section is enhanced with trajectory serialization, but the hardcoded data and string conversion could be more flexible and robust.

Merge Readiness

The pull request is almost ready for merging. Addressing the high and medium severity issues would significantly improve the code quality and reliability. I am unable to directly approve the pull request, and users should have others review and approve this code before merging. I recommend addressing the identified issues before merging.

@inardini inardini changed the title fix: Update CrewAI agent eval notebook fix: Update agent eval notebooks Apr 15, 2025
@inardini inardini changed the title fix: Update agent eval notebooks feat: Update agent eval notebooks with fixes & ADK Apr 16, 2025
@holtskinner holtskinner merged commit dcac4de into GoogleCloudPlatform:main Apr 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants