Skip to content

Conversation

irfanariyaz
Copy link
Contributor

Description

Added image generation functionality to the slide generator tool. This enhancement allows dynamic, context-aware image generation for presentation slides using Google's Imagen model and Firebase for image storage. The implementation supports:

Generating images based on slide content and context
Uploading generated images to Firebase
Supporting multiple image styles and templates
Handling image generation for different slide types

Related Issue

This is a feature enhancement for the presentation generator.

Type of Change

Please select the type(s) of change that apply and delete those that do not.

  • Bug fix: A non-breaking change that fixes an issue.
  • New feature: A non-breaking change that adds functionality.
  • Breaking change: A change that causes existing functionality to not work as expected.
  • Documentation update: Changes or updates to documentation.
  • Code style update: Changes that do not affect the meaning of the code (e.g., formatting).
  • Refactoring: A code change that neither fixes a bug nor adds a feature.
  • Performance improvement: A change that improves performance.
  • Test enhancement: Adding or updating tests; no production code change.
  • Chore: Changes to the build process or auxiliary tools; no production code change.
  • Other: (please describe)

Proposed Solution

Implemented image generation in the slide generator with the following key components:

ImageGenerator class using Vertex AI's Imagen model
Firebase integration for image storage and URL generation
Dynamic image prompt generation based on slide content
Configurable image generation parameters (width, height, aspect ratio)
Parallel image generation for multiple slides

Key modifications:

Added imagen.py for image generation logic
Updated tools.py to include image generation in slide creation workflow
Enhanced generate_slides() method to handle image generation
Added error handling and logging for image generation process

How to Test

Provide instructions on how to test these changes. Include details on test configurations, test cases, and expected outcomes.

Unit Tests

List the unit tests added or modified to verify your changes.

  1. test_executor()
  2. test_generate_slide_image()
  3. test_executor_missing_inputs()
  4. test_validate_slides_content()
  5. test_validate_slides_content_with_garbage()
  6. test_validate_slides_content_empty_slides()
  7. test_slide_generator_compile_context()
  8. test_slide_model()
  9. test_slide_presentation_model()
  10. test_imagen_generate_image()

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

  • [] Yes
  • No

If yes, describe what documentation updates are needed and link to the relevant documentation.

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

Inputs for outline generator:
{
"user": {
"id": "string",
"fullName": "string",
"email": "string"
},
"type": "tool",
"tool_data": {
"tool_id": "outline-generator",
"inputs": [
{
"name": "topic",
"value": "Lang chain"
},
{

        "name": "instructional_level",
        "value": "graduate"
    },    
    {
        "name": "n_slides",
        "value": 6
    },
    {
        "name": "file_type",
        "value": ""
    },
    {
                                    
        "name": "file_url",
        "value": ""
    },
    {
        "name": "lang",
        "value": "en"
    }
]

}
}

Input for slide generator:
{
"user": {
"id": "string",
"fullName": "string",
"email": "string"
},
"type": "chat",
"tool_data": {
"tool_id": "slide-generator",
"inputs": [
{
"name": "slides_titles",
"value": [
"Introduction to LangChain: A Conceptual Overview",
"LangChain Architecture: Modules & Core Components",
"Advanced LangChain Capabilities: Agents & Memory",
"Real-World LangChain Applications: Case Studies",
"Building Your First LangChain Application: A Practical Example",
"Future Directions and Challenges in LangChain"
]
},{
"name": "topic",
"value": "Lang chain"
},
{

        "name": "instructional_level",
        "value": "graduate"
    },    
    {
        "name": "file_type",
        "value": ""
    },
    {
      
        "name": "file_url",
        "value": ""
    },
    {
        "name": "lang",
        "value": "en"
    }
]

}
}
Add any other information that might be useful for the reviewers.

Loom video:https://www.loom.com/share/5c82751cd6224410a88d7046b88d8d29?sid=12145d25-99d2-406e-b7fb-6e23bf36893b

@irfanariyaz
Copy link
Contributor Author

Generated output

{
"data": [
{
"title": "Introduction to LangChain: A Comprehensive Overview",
"template": "titleAndBody",
"content": "Welcome! This presentation explores LangChain, a powerful framework for developing applications powered by large language models (LLMs). We'll cover its core components, practical applications, and advanced techniques, equipping you with the knowledge to build robust LLM-driven solutions. LangChain simplifies the complexities of LLM integration, enabling efficient development and deployment of sophisticated applications.",
"image_url": null
},
{
"title": "Core Components of LangChain: Modules & Architectures",
"template": "titleAndBullets",
"content": [
"LLMs: Integration with various LLMs (OpenAI, Hugging Face, etc.)",
"Prompts: Techniques for crafting effective prompts for LLMs.",
"Indexes: Structuring and accessing external data for LLMs.",
"Chains: Combining multiple components to create complex workflows.",
"Agents: Enabling LLMs to interact with external tools and APIs.",
"Memory: Maintaining context across multiple interactions with an LLM."
],
"image_url": "https://storage.googleapis.com/marvel-ai-firebase.firebasestorage.app/slides/Lang_chain/slide_1.png"
},
{
"title": "LangChain in Action: Practical Use Cases and Examples",
"template": "titleAndBullets",
"content": [
"Chatbots: Building conversational AI agents.",
"Question Answering Systems: Creating systems that answer questions from various data sources.",
"Summarization: Generating concise summaries of lengthy documents.",
"Data Analysis: Using LLMs for insightful data interpretation.",
"Creative Writing Assistants: Aiding writers with idea generation and text refinement."
],
"image_url": "https://storage.googleapis.com/marvel-ai-firebase.firebasestorage.app/slides/Lang_chain/slide_2.png"
},
{
"title": "Advanced LangChain Techniques: Memory & Agents",
"template": "twoColumn",
"content": {
"leftColumn": "Memory: Explore different memory types (ConversationBufferMemory, ConversationSummaryMemory) and their impact on maintaining context in long conversations. Discuss challenges and best practices for managing context across multiple interactions.",
"rightColumn": "Agents: Examine different agent types (ZeroShotAgent, ToolAgent) and their capabilities. Illustrate how agents can enhance LLM applications by enabling interaction with external tools and APIs. Discuss real-world examples."
},
"image_url": null
},
{
"title": "Building Robust LLM Applications with LangChain",
"template": "titleAndBody",
"content": "This section focuses on best practices for building robust and scalable LLM applications using LangChain. We will discuss topics such as error handling, efficient data management, prompt engineering strategies, and deployment considerations. Real-world examples of successful deployments will be presented.",
"image_url": null
},
{
"title": "Future Trends and Challenges in LangChain",
"template": "titleAndBullets",
"content": [
"Improved Agent Capabilities: More sophisticated agents with enhanced reasoning and decision-making abilities.",
"Enhanced Memory Management: More efficient and scalable memory solutions for complex applications.",
"Integration with other Frameworks: Seamless integration with other AI and machine learning tools.",
"Addressing Ethical Concerns: Developing responsible and ethical LLM applications.",
"Standardization and Interoperability: Establishing standards to ensure compatibility and interoperability."
],
"image_url": null
}
]
}

@buriihenry buriihenry self-assigned this Mar 27, 2025
@buriihenry buriihenry self-requested a review March 27, 2025 08:32
Copy link
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a quota limit when generating the images and storing them in Firebase. Is it possible to store them in the Google Cloud Bucket?

@irfanariyaz
Copy link
Contributor Author

i will try storing in the google cloud bucket and update you.

@irfanariyaz
Copy link
Contributor Author

I got the same issue with the GCS too. The error says: 429 Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: imagen-3.0-generate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible not to edit this file? Instead can we have schemas.py file inside the tools/presentation_generator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is possible .i can refractor that

Dockerfile Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to commit this Dockerfile. It should remain local since it contains configurations and images that shouldn't be pushed to the remote branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i'l remove this.

@irfanariyaz
Copy link
Contributor Author

Done the changes mentioned above.

@stevenrayhinojosa-gmail-com
Copy link
Contributor

I've thoroughly tested your PR for the presentation generator feature and have identified several issues that need to be addressed before it can be merged.

Current Status
The PR appears to be a work in progress with several incomplete components:

Missing Prompt Files:
The slide generator is looking for a prompt file that doesn't exist: slide_generator_prompt_batch.txt
This causes most of the tests to fail with FileNotFoundError
Import Errors:
The outline generator tests are failing because they're trying to import OutlineGeneratorInput from app.services.schemas, but this class doesn't exist in that module
Test Failures:
The original presentation generator tests are failing with document loading errors
The updated slide generator tests have assertion errors due to mismatched parameters
Incomplete Implementation:
The directory structure for the updated presentation generator is in place, but some implementation files appear to be incomplete
Recommendations
To make this PR ready for merging, I recommend the following steps:

Fix Missing Files:
Create the missing prompt file: app/tools/presentation_generator_updated/slide_generator/prompt/slide_generator_prompt_batch.txt
Ensure all required prompt files are included in the repository
Fix Import Issues:
Update the import paths in the tests to use the correct modules
Ensure all required schema classes are defined in the appropriate modules
Fix Test Assertions:
Update the test assertions to match the actual behavior of the code
In particular, fix the test_imagen_generate_image test to match the actual parameters being used
Complete Implementation:
Finish implementing any incomplete components
Ensure all required functionality is properly implemented and tested
Documentation:
Add documentation explaining the new two-component approach
Include examples of how to use the outline generator and slide generator together
I appreciate the work you've done so far on this feature. The architectural approach of separating the outline generation and slide generation is sound, but the implementation needs to be completed before the PR can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants