-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat: add scene transition detection notebook using Gemini #1891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @VJlaxmi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces a new notebook, vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
, that demonstrates how to use Gemini 1.5 Pro and Vertex AI for scene transition detection in videos. The notebook uses a structured schema for response formatting and employs one-shot prompting techniques with Gemini and Part
objects. It analyzes video and subtitle (VTT) input to identify scene changes based on narrative, characters, and location.
Highlights
- New Notebook: Adds a new notebook
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
for scene transition detection using Gemini 1.5 Pro. - Structured Schema: Implements a structured schema for formatting the response from the Gemini model, ensuring consistent and parsable output.
- One-Shot Prompting: Utilizes one-shot prompting with Gemini and
Part
objects to guide the model in identifying scene transitions. - Multimodal Input: Leverages both video and subtitle (VTT) files as input to provide a comprehensive understanding of the video content.
Changelog
Click here to see the changelog
- vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
- Initial commit of the scenetransition.ipynb notebook.
- Includes markdown cells for introduction, overview, and setup instructions.
- Adds code cells for installing required packages, authenticating the environment, and setting up Google Cloud project information.
- Imports necessary libraries such as google.generativeai, vertexai, and pandas.
- Defines the output schema for scene transition extraction, specifying the structure of the model's response.
- Configures the Gemini 1.5 Pro model with specified parameters like temperature, max output tokens, and response schema.
- Sets up file paths for input video and VTT files, including one-shot examples.
- Defines the base instructions for the Gemini model, outlining the criteria for identifying scene transitions.
- Constructs the prompt using
Part
objects, combining video, VTT, and instructions. - Generates content using the Gemini model and processes the output to extract scene transition information.
- Includes code to handle multiple candidates from the model and consolidate the results.
- Implements a function
candidate_count_handler_with_tolerance
to consolidate scene transition timestamps from multiple candidates, handling slight variations with a tolerance of 1 second. - Prints the final model output, showing the identified scene transitions.
- vision/use-cases/identify-scene-transition-using-gemini/video3.vtt
- Initial commit of the video3.vtt file.
- Contains WebVTT formatted subtitles for a sample video used in the one-shot example.
- Subtitles include dialogue and audio cues to aid in scene transition detection.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
In realms of vision, where scenes unfold,
Gemini's wisdom, a story told.
With video and text, a fusion bright,
Transitions detected, in AI's light.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The notebook uses video and subtitle input to identify meaningful scene changes. Overall, the notebook is well-structured and provides a clear example of how to use Gemini for this task. However, there are a few areas that could be improved for clarity and robustness.
Summary of Findings
- API Key Security: The notebook includes a placeholder for the API key. It's crucial to emphasize the importance of not committing API keys directly into the notebook and instead using environment variables or secure configuration management.
- Error Handling: The code uses
eval(text)
to parse the model's output. This can be risky if the model generates unexpected output. Consider usingjson.loads
with appropriate error handling to ensure the notebook doesn't crash. - Path Handling: The notebook uses hardcoded paths for the one-shot video and VTT files. It would be beneficial to make these paths configurable or provide a mechanism for users to easily upload their own files.
Merge Readiness
The notebook provides a valuable demonstration of scene transition detection using Gemini. However, before merging, it's important to address the security concerns related to the API key and the potential risks associated with using eval(text)
. Additionally, improving the flexibility of path handling would enhance the user experience. I am unable to approve this pull request, and recommend that it not be merged until the critical and high severity issues are addressed.
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
" for part in content.parts:\n", | ||
" text = part.text \n", | ||
" if text:\n", | ||
" parsed_content = eval(text) \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using eval(text)
can be risky, especially when dealing with model-generated output. If the model returns something unexpected, eval
could execute arbitrary code or cause the notebook to crash. It's safer to use json.loads
with proper error handling. Consider adding a try-except block to catch potential errors during parsing.
try:
parsed_content = json.loads(text)
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
continue # Or handle the error as appropriate
"outputs": [], | ||
"source": [ | ||
"# Base path\n", | ||
"base_path = \"gs://<yourbucket>/<yourvideofolder>/\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider allowing users to specify the base path or individual file paths through input parameters or a configuration file. This would make the notebook more flexible and easier to use with different video and VTT files. Add a comment explaining that users should replace these placeholders with their actual bucket and folder names.
base_path = input("Enter the base path (gs://<yourbucket>/<yourvideofolder>/): ") or "gs://<yourbucket>/<yourvideofolder>/" # Replace with your bucket and folder
"oneshot_parts = [Part.from_uri(input_video_path, \"video/mp4\"),\n", | ||
" Part.from_uri(input_vtt_path, \"text/vtt\"),\n", | ||
" Part.from_text(base_instructions),\n", | ||
" Part.from_text(\"\"\"The following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data.\"\"\"),\n", | ||
" Part.from_uri(one_shot_video_path, \"video/mp4\"),\n", | ||
" Part.from_uri(one_shot_vtt_path, \"text/vtt\"),\n", | ||
" Part.from_text(\"\"\"Scene transition of the earlier shared video and VTT file:\n", | ||
" Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n", | ||
" Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description Henry Ford II is helped into the car on the tarmac.\n", | ||
" Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n", | ||
" so on..\"\"\"), ]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to include the base instructions in the prompt itself, rather than as a separate part. This could improve the model's understanding of the task. Consider concatenating the base instructions with the one-shot example to create a more comprehensive prompt.
oneshot_parts = [
Part.from_uri(input_video_path, "video/mp4"),
Part.from_uri(input_vtt_path, "text/vtt"),
Part.from_text(base_instructions + "\nThe following example illustrate how to apply the scene transition instructions above. Pay attention to how the scene transitions are identified and described based on the provided video and VTT data."),
Part.from_uri(one_shot_video_path, "video/mp4"),
Part.from_uri(one_shot_vtt_path, "text/vtt"),
Part.from_text("""Scene transition of the earlier shared video and VTT file:\n Scene1: Timecode: start_time 00:00:06 – end_time 00:01:00 - description Carroll Shelby and Henry Ford II discuss the car’s top speed in a garage\n Scene2: Timecode: start_time 00:01:00 – end_time 00:01:54 - description Henry Ford II is helped into the car on the tarmac.\n Scene3: Timecode: start_time 00:01:55 – end_time 00:03:45 - description The dialogue transitions from technical aspects to the thrill of the driving experience, and the setting changes from static indoors to dynamic action on the tarmac.\n so on.."""),
]
f4c7361
to
8b6c93a
Compare
9ad12f7
to
b999174
Compare
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
"# Use the environment variable if the user doesn't provide Project ID.\n", | ||
"import os\n", | ||
"\n", | ||
"GCP_PROJECT = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change these constants to be PROJECT
and LOCATION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/video3.mp4
Outdated
Show resolved
Hide resolved
@holtskinner can we merge this now? |
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
"id": "84f0f73a0f76" | ||
}, | ||
"source": [ | ||
"| Author(s) |\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"| Author(s) |\n", | |
"| Author |\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Outdated
Show resolved
Hide resolved
@VJlaxmi Looks like your updates haven't been added to this PR |
is this good to go? |
No, this Notebook still isn't using the Google Gen AI SDK. Not sure if the commits are getting missed. See this Notebook for examples of how to use the new SDK https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb |
It appears the author, @VJlaxmi, is being asked by @holtskinner to update the notebook to use the Google Gen AI SDK, referencing a specific notebook as an example. I don't have enough information to add to this conversation, so I will refrain from commenting further. |
Summary
This PR adds a new notebook demonstrating scene transition detection using Gemini 1.5 Pro and Vertex AI. The model is prompted using structured schema-guided generation and one-shot/few-shot techniques. It uses video and subtitle (VTT) input to identify meaningful scene changes in narrative, characters, and location.
What’s Included
vision/use-cases/identify-scene-transition-using-gemini/scenetransition.ipynb
Part
objectsChecklist
nox -s format