Skip to content

feat: add scene transition detection notebook using Gemini #1891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

VJlaxmi
Copy link
Contributor

@VJlaxmi VJlaxmi commented Mar 28, 2025

Summary

This PR adds a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The model is prompted using structured schema-guided generation and one-shot/few-shot techniques. It uses video and subtitle (VTT) input to identify meaningful scene changes in narrative, characters, and location.


What’s Included

  • Notebook: vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
  • Structured schema for response formatting
  • one-shot prompting with Gemini + Part objects

Checklist

  • Signed CLA
  • Notebook follows official template
  • Passed nox -s format
  • Output cells are clean and minimal
  • All imports are used and no unused code is present

@VJlaxmi VJlaxmi requested a review from a team as a code owner March 28, 2025 07:13
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @VJlaxmi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new notebook, vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb, that demonstrates how to use Gemini 1.5 Pro and Vertex AI for scene transition detection in videos. The notebook uses a structured schema for response formatting and employs one-shot prompting techniques with Gemini and Part objects. It analyzes video and subtitle (VTT) input to identify scene changes based on narrative, characters, and location.

Highlights

  • New Notebook: Adds a new notebook vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb for scene transition detection using Gemini 1.5 Pro.
  • Structured Schema: Implements a structured schema for formatting the response from the Gemini model, ensuring consistent and parsable output.
  • One-Shot Prompting: Utilizes one-shot prompting with Gemini and Part objects to guide the model in identifying scene transitions.
  • Multimodal Input: Leverages both video and subtitle (VTT) files as input to provide a comprehensive understanding of the video content.

Changelog

Click here to see the changelog
  • vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
    • Initial commit of the scenetransition.ipynb notebook.
    • Includes markdown cells for introduction, overview, and setup instructions.
    • Adds code cells for installing required packages, authenticating the environment, and setting up Google Cloud project information.
    • Imports necessary libraries such as google.generativeai, vertexai, and pandas.
    • Defines the output schema for scene transition extraction, specifying the structure of the model's response.
    • Configures the Gemini 1.5 Pro model with specified parameters like temperature, max output tokens, and response schema.
    • Sets up file paths for input video and VTT files, including one-shot examples.
    • Defines the base instructions for the Gemini model, outlining the criteria for identifying scene transitions.
    • Constructs the prompt using Part objects, combining video, VTT, and instructions.
    • Generates content using the Gemini model and processes the output to extract scene transition information.
    • Includes code to handle multiple candidates from the model and consolidate the results.
    • Implements a function candidate_count_handler_with_tolerance to consolidate scene transition timestamps from multiple candidates, handling slight variations with a tolerance of 1 second.
    • Prints the final model output, showing the identified scene transitions.
  • vision/use-cases/identify-scene-transition-using-gemini/video3.vtt
    • Initial commit of the video3.vtt file.
    • Contains WebVTT formatted subtitles for a sample video used in the one-shot example.
    • Subtitles include dialogue and audio cues to aid in scene transition detection.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


In realms of vision, where scenes unfold,
Gemini's wisdom, a story told.
With video and text, a fusion bright,
Transitions detected, in AI's light.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The notebook uses video and subtitle input to identify meaningful scene changes. Overall, the notebook is well-structured and provides a clear example of how to use Gemini for this task. However, there are a few areas that could be improved for clarity and robustness.

Summary of Findings

  • API Key Security: The notebook includes a placeholder for the API key. It's crucial to emphasize the importance of not committing API keys directly into the notebook and instead using environment variables or secure configuration management.
  • Error Handling: The code uses eval(text) to parse the model's output. This can be risky if the model generates unexpected output. Consider using json.loads with appropriate error handling to ensure the notebook doesn't crash.
  • Path Handling: The notebook uses hardcoded paths for the one-shot video and VTT files. It would be beneficial to make these paths configurable or provide a mechanism for users to easily upload their own files.

Merge Readiness

The notebook provides a valuable demonstration of scene transition detection using Gemini. However, before merging, it's important to address the security concerns related to the API key and the potential risks associated with using eval(text). Additionally, improving the flexibility of path handling would enhance the user experience. I am unable to approve this pull request, and recommend that it not be merged until the critical and high severity issues are addressed.

" for part in content.parts:\n",
" text = part.text \n",
" if text:\n",
" parsed_content = eval(text) \n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using eval(text) can be risky, especially when dealing with model-generated output. If the model returns something unexpected, eval could execute arbitrary code or cause the notebook to crash. It's safer to use json.loads with proper error handling. Consider adding a try-except block to catch potential errors during parsing.

            try:
                parsed_content = json.loads(text)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON: {e}")
                continue  # Or handle the error as appropriate

"outputs": [],
"source": [
"# Base path\n",
"base_path = \"gs://<yourbucket>/<yourvideofolder>/\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider allowing users to specify the base path or individual file paths through input parameters or a configuration file. This would make the notebook more flexible and easier to use with different video and VTT files. Add a comment explaining that users should replace these placeholders with their actual bucket and folder names.

base_path = input("Enter the base path (gs://<yourbucket>/<yourvideofolder>/): ") or "gs://<yourbucket>/<yourvideofolder>/" # Replace with your bucket and folder

Comment on lines 371 to 381
"oneshot_parts = [Part.from_uri(input_video_path, \"video/mp4\"),\n",
" Part.from_uri(input_vtt_path, \"text/vtt\"),\n",
" Part.from_text(base_instructions),\n",
" Part.from_text(\"\"\"The following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data.\"\"\"),\n",
" Part.from_uri(one_shot_video_path, \"video/mp4\"),\n",
" Part.from_uri(one_shot_vtt_path, \"text/vtt\"),\n",
" Part.from_text(\"\"\"Scene transition of the earlier shared video and VTT file:\n",
" Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n",
" Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description Henry Ford II is helped into the car on the tarmac.\n",
" Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n",
" so on..\"\"\"), ]"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It might be helpful to include the base instructions in the prompt itself, rather than as a separate part. This could improve the model's understanding of the task. Consider concatenating the base instructions with the one-shot example to create a more comprehensive prompt.

oneshot_parts = [
    Part.from_uri(input_video_path, "video/mp4"),
    Part.from_uri(input_vtt_path, "text/vtt"),
    Part.from_text(base_instructions + "\nThe following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data."),
    Part.from_uri(one_shot_video_path, "video/mp4"),
    Part.from_uri(one_shot_vtt_path, "text/vtt"),
    Part.from_text("""Scene transition of the earlier shared video and VTT file:\n        Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n        Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description  Henry Ford II is helped into the car on the tarmac.\n        Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n        so on.."""),
]

@VJlaxmi VJlaxmi changed the title Added notebook for scene transition detection using Gemini feat: add scene transition detection notebook using Gemini Mar 28, 2025
"# Use the environment variable if the user doesn't provide Project ID.\n",
"import os\n",
"\n",
"GCP_PROJECT = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change these constants to be PROJECT and LOCATION

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@VJlaxmi
Copy link
Contributor Author

VJlaxmi commented Apr 2, 2025

@holtskinner can we merge this now?

"id": "84f0f73a0f76"
},
"source": [
"| Author(s) |\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"| Author(s) |\n",
"| Author |\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@holtskinner
Copy link
Collaborator

@VJlaxmi Looks like your updates haven't been added to this PR

@VJlaxmi
Copy link
Contributor Author

VJlaxmi commented Apr 7, 2025

is this good to go?

@holtskinner
Copy link
Collaborator

is this good to go?

No, this Notebook still isn't using the Google Gen AI SDK. Not sure if the commits are getting missed. See this Notebook for examples of how to use the new SDK https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb

Copy link

It appears the author, @VJlaxmi, is being asked by @holtskinner to update the notebook to use the Google Gen AI SDK, referencing a specific notebook as an example. I don't have enough information to add to this conversation, so I will refrain from commenting further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants