feat: add scene transition detection notebook using Gemini #1891

VJlaxmi · 2025-03-28T07:13:41Z

Summary

This PR adds a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The model is prompted using structured schema-guided generation and one-shot/few-shot techniques. It uses video and subtitle (VTT) input to identify meaningful scene changes in narrative, characters, and location.

What’s Included

Notebook: vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Structured schema for response formatting
one-shot prompting with Gemini + Part objects

Checklist

Signed CLA
Notebook follows official template
Passed nox -s format
Output cells are clean and minimal
All imports are used and no unused code is present

gemini-code-assist

Hello @VJlaxmi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new notebook, vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb, that demonstrates how to use Gemini 1.5 Pro and Vertex AI for scene transition detection in videos. The notebook uses a structured schema for response formatting and employs one-shot prompting techniques with Gemini and Part objects. It analyzes video and subtitle (VTT) input to identify scene changes based on narrative, characters, and location.

Highlights

New Notebook: Adds a new notebook vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb for scene transition detection using Gemini 1.5 Pro.
Structured Schema: Implements a structured schema for formatting the response from the Gemini model, ensuring consistent and parsable output.
One-Shot Prompting: Utilizes one-shot prompting with Gemini and Part objects to guide the model in identifying scene transitions.
Multimodal Input: Leverages both video and subtitle (VTT) files as input to provide a comprehensive understanding of the video content.

Changelog

Click here to see the changelog

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
- Initial commit of the scenetransition.ipynb notebook.
- Includes markdown cells for introduction, overview, and setup instructions.
- Adds code cells for installing required packages, authenticating the environment, and setting up Google Cloud project information.
- Imports necessary libraries such as google.generativeai, vertexai, and pandas.
- Defines the output schema for scene transition extraction, specifying the structure of the model's response.
- Configures the Gemini 1.5 Pro model with specified parameters like temperature, max output tokens, and response schema.
- Sets up file paths for input video and VTT files, including one-shot examples.
- Defines the base instructions for the Gemini model, outlining the criteria for identifying scene transitions.
- Constructs the prompt using Part objects, combining video, VTT, and instructions.
- Generates content using the Gemini model and processes the output to extract scene transition information.
- Includes code to handle multiple candidates from the model and consolidate the results.
- Implements a function candidate_count_handler_with_tolerance to consolidate scene transition timestamps from multiple candidates, handling slight variations with a tolerance of 1 second.
- Prints the final model output, showing the identified scene transitions.
vision/use-cases/identify-scene-transition-using-gemini/video3.vtt
- Initial commit of the video3.vtt file.
- Contains WebVTT formatted subtitles for a sample video used in the one-shot example.
- Subtitles include dialogue and audio cues to aid in scene transition detection.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

In realms of vision, where scenes unfold,
Gemini's wisdom, a story told.
With video and text, a fusion bright,
Transitions detected, in AI's light.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR introduces a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The notebook uses video and subtitle input to identify meaningful scene changes. Overall, the notebook is well-structured and provides a clear example of how to use Gemini for this task. However, there are a few areas that could be improved for clarity and robustness.

Summary of Findings

API Key Security: The notebook includes a placeholder for the API key. It's crucial to emphasize the importance of not committing API keys directly into the notebook and instead using environment variables or secure configuration management.
Error Handling: The code uses eval(text) to parse the model's output. This can be risky if the model generates unexpected output. Consider using json.loads with appropriate error handling to ensure the notebook doesn't crash.
Path Handling: The notebook uses hardcoded paths for the one-shot video and VTT files. It would be beneficial to make these paths configurable or provide a mechanism for users to easily upload their own files.

Merge Readiness

The notebook provides a valuable demonstration of scene transition detection using Gemini. However, before merging, it's important to address the security concerns related to the API key and the potential risks associated with using eval(text). Additionally, improving the flexibility of path handling would enhance the user experience. I am unable to approve this pull request, and recommend that it not be merged until the critical and high severity issues are addressed.

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

gemini-code-assist · 2025-03-28T07:14:04Z

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

+        "    for part in content.parts:\n",
+        "        text = part.text \n",
+        "        if text:\n",
+        "            parsed_content = eval(text)  \n",


Using eval(text) can be risky, especially when dealing with model-generated output. If the model returns something unexpected, eval could execute arbitrary code or cause the notebook to crash. It's safer to use json.loads with proper error handling. Consider adding a try-except block to catch potential errors during parsing.

try: parsed_content = json.loads(text) except json.JSONDecodeError as e: print(f"Error decoding JSON: {e}") continue # Or handle the error as appropriate

gemini-code-assist · 2025-03-28T07:14:04Z

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

+      "outputs": [],
+      "source": [
+        "# Base path\n",
+        "base_path = \"gs://<yourbucket>/<yourvideofolder>/\"\n",


Consider allowing users to specify the base path or individual file paths through input parameters or a configuration file. This would make the notebook more flexible and easier to use with different video and VTT files. Add a comment explaining that users should replace these placeholders with their actual bucket and folder names.

base_path = input("Enter the base path (gs://<yourbucket>/<yourvideofolder>/): ") or "gs://<yourbucket>/<yourvideofolder>/" # Replace with your bucket and folder

gemini-code-assist · 2025-03-28T07:14:05Z

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

+        "oneshot_parts = [Part.from_uri(input_video_path, \"video/mp4\"),\n",
+        "                 Part.from_uri(input_vtt_path, \"text/vtt\"),\n",
+        "                 Part.from_text(base_instructions),\n",
+        "                 Part.from_text(\"\"\"The following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data.\"\"\"),\n",
+        "                 Part.from_uri(one_shot_video_path, \"video/mp4\"),\n",
+        "                 Part.from_uri(one_shot_vtt_path, \"text/vtt\"),\n",
+        "                 Part.from_text(\"\"\"Scene transition of the earlier shared video and VTT file:\n",
+        "        Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n",
+        "        Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description  Henry Ford II is helped into the car on the tarmac.\n",
+        "        Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n",
+        "        so on..\"\"\"), ]"


It might be helpful to include the base instructions in the prompt itself, rather than as a separate part. This could improve the model's understanding of the task. Consider concatenating the base instructions with the one-shot example to create a more comprehensive prompt.

oneshot_parts = [ Part.from_uri(input_video_path, "video/mp4"), Part.from_uri(input_vtt_path, "text/vtt"), Part.from_text(base_instructions + "\nThe following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data."), Part.from_uri(one_shot_video_path, "video/mp4"), Part.from_uri(one_shot_vtt_path, "text/vtt"), Part.from_text("""Scene transition of the earlier shared video and VTT file:\n Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description Henry Ford II is helped into the car on the tarmac.\n Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n so on.."""), ]

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

holtskinner · 2025-03-31T19:10:19Z

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

+    "# Use the environment variable if the user doesn't provide Project ID.\n",
+    "import os\n",
+    "\n",
+    "GCP_PROJECT = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",


Change these constants to be PROJECT and LOCATION

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

vision/use-cases/identify-scene-transition-using-gemini/video3.mp4

.github/actions/spelling/allow.txt

Co-authored-by: Holt Skinner <[email protected]>

VJlaxmi · 2025-04-02T07:04:57Z

@holtskinner can we merge this now?

.github/actions/spelling/allow.txt

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

holtskinner · 2025-04-02T18:08:01Z

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

+    "id": "84f0f73a0f76"
+   },
+   "source": [
+    "| Author(s) |\n",


Suggested change

"| Author(s) |\n",

"| Author |\n",

vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb

holtskinner · 2025-04-03T15:10:30Z

@VJlaxmi Looks like your updates haven't been added to this PR

.github/actions/spelling/allow.txt

gemini/use-cases/scenetransition.ipynb

VJlaxmi · 2025-04-07T08:12:34Z

is this good to go?

holtskinner · 2025-04-07T14:45:49Z

is this good to go?

No, this Notebook still isn't using the Google Gen AI SDK. Not sure if the commits are getting missed. See this Notebook for examples of how to use the new SDK https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb

gemini-code-assist · 2025-04-07T14:45:55Z

It appears the author, @VJlaxmi, is being asked by @holtskinner to update the notebook to use the Google Gen AI SDK, referencing a specific notebook as an example. I don't have enough information to add to this conversation, so I will refrain from commenting further.

VJlaxmi requested a review from a team as a code owner March 28, 2025 07:13

gemini-code-assist bot reviewed Mar 28, 2025

View reviewed changes

gemini-code-assist bot suggested changes Mar 28, 2025

View reviewed changes

feat: add scene transition detection notebook using Gemini

8b6c93a

VJlaxmi force-pushed the scenetransition branch from f4c7361 to 8b6c93a Compare March 28, 2025 07:15

VJlaxmi added 2 commits March 28, 2025 12:52

chore(spelling): allow false positives in scene transition notebook

91d3866

feat: add scene transition detection notebook using Gemini

b999174

VJlaxmi force-pushed the scenetransition branch from 9ad12f7 to b999174 Compare March 28, 2025 07:30

VJlaxmi changed the title ~~Added notebook for scene transition detection using Gemini~~ feat: add scene transition detection notebook using Gemini Mar 28, 2025

style: reformat notebook to comply with nbqa black

f1a4cf8

holtskinner requested changes Mar 31, 2025

View reviewed changes

holtskinner assigned VJlaxmi Mar 31, 2025

VJlaxmi and others added 4 commits April 1, 2025 20:58

Update .github/actions/spelling/allow.txt

452fe00

Co-authored-by: Holt Skinner <[email protected]>

fix: apply review suggestions for scene transition notebook

ca9579e

style: apply final nbqa black formatting for linter

3bfb986

Merge branch 'main' into scenetransition

f2d47b4

Merge branch 'main' into scenetransition

355628a

holtskinner requested changes Apr 2, 2025

View reviewed changes

VJlaxmi and others added 4 commits April 4, 2025 17:28

made changes

526c538

chore: format notebooks and fix notebook links

46b3f0f

chore(spelling): fix allow.txt entry for inputvtt

f58a1dd

Merge branch 'main' into scenetransition

d410f35

holtskinner reviewed Apr 4, 2025

View reviewed changes

.github/actions/spelling/allow.txt Outdated Show resolved Hide resolved

holtskinner reviewed Apr 4, 2025

View reviewed changes

.github/actions/spelling/allow.txt Outdated Show resolved Hide resolved

gemini/use-cases/scenetransition.ipynb Outdated Show resolved Hide resolved

VJlaxmi and others added 2 commits April 4, 2025 20:36

Merge branch 'main' into scenetransition

d4db657

chore(spelling): fix allow.txt entry for inputvtt

556fea7

WIP: local changes before rebase

819994b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add scene transition detection notebook using Gemini #1891

feat: add scene transition detection notebook using Gemini #1891

VJlaxmi commented Mar 28, 2025

gemini-code-assist bot left a comment

gemini-code-assist bot left a comment

gemini-code-assist bot Mar 28, 2025

gemini-code-assist bot Mar 28, 2025

gemini-code-assist bot Mar 28, 2025

holtskinner Mar 31, 2025

VJlaxmi Apr 2, 2025

VJlaxmi commented Apr 2, 2025

holtskinner Apr 2, 2025

VJlaxmi Apr 4, 2025

holtskinner commented Apr 3, 2025

VJlaxmi commented Apr 7, 2025

holtskinner commented Apr 7, 2025

gemini-code-assist bot commented Apr 7, 2025

feat: add scene transition detection notebook using Gemini #1891

Are you sure you want to change the base?

feat: add scene transition detection notebook using Gemini #1891

Conversation

VJlaxmi commented Mar 28, 2025

Summary

What’s Included

Checklist

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

gemini-code-assist bot Mar 28, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Mar 28, 2025

Choose a reason for hiding this comment

gemini-code-assist bot Mar 28, 2025

Choose a reason for hiding this comment

holtskinner Mar 31, 2025

Choose a reason for hiding this comment

VJlaxmi Apr 2, 2025

Choose a reason for hiding this comment

VJlaxmi commented Apr 2, 2025

holtskinner Apr 2, 2025

Choose a reason for hiding this comment

VJlaxmi Apr 4, 2025

Choose a reason for hiding this comment

holtskinner commented Apr 3, 2025

VJlaxmi commented Apr 7, 2025

holtskinner commented Apr 7, 2025

gemini-code-assist bot commented Apr 7, 2025