Skip to content

Conversation

ArtimisJ26
Copy link

Description

This project implements an AI-powered image generation system for presentation slides by leveraging Black Forest Labs's FLUX.1-schnell model through Together.ai and Gemini 1.5 pro to generate comprehensive prompts for Image Generation.

Key Features:

  • ImagePromptGenerator class creates context-aware image descriptions using Gemini.
  • ImageGenerator class handles image creation via Together.ai.
  • Comprehensive error handling, logging, and validation mechanisms.
  • Supports presentation-optimized 1024x768 resolution images with customizable parameters.
  • Google Cloud Storage integration for persistence and accessibility.

Motivation: Enhance presentation generation by automatically creating relevant, high-quality images for slides, improving visual engagement while maintaining contextual relevance.

Type of Change

Please select the type(s) of change that apply and delete those that do not.

  • New feature: A non-breaking change that adds functionality.

Proposed Solution

Implementation Steps:

1. Initial Processing (executor function)

  • Validates presentation content.
  • Extracts slides list from input dictionary.
  • Creates ImageGeneratorInput with the correct format.
  • Can be called in two ways:
    1. As a standalone tool via /submit-tool endpoint using tool ID "image-generator".
    2. Integrated within SlideGenerator through image_executor call in generate_slides method.

2. Prompt Generation (ImagePromptGenerator class)

  • Uses generate_image_prompt method to create prompts for each slide.
  • Compiles prompt template using the compile method.
  • Ensures generated prompts are not empty.
  • Returns a dictionary mapping slide titles to prompts.

3. Image Generation Chain Creation

  • Uses create_image_generation_chain to set up parallel processing.
  • Implements RunnableParallel for concurrent image generation.
  • Each image task runs independently.

4. Image Generation (ImageGenerator class)

  • Generates images using Together.ai API.
  • Configuration:
    • Model: "black-forest-labs/FLUX.1-schnell"
    • Resolution: 1024x768
    • Steps: 4 for generation
    • Output format: b64_json

5. Storage and Response Handling

  • Saves generated images to Google Cloud Storage.
  • Creates public URLs for each image.
  • Returns a dictionary with:
    • Status ("success" or "failed").
    • Image URLs mapped to slide titles/content titles.
    • Original prompts and any error messages.

The entire process is orchestrated by image_generation_handler, ensuring smooth execution with error handling.

How to Test

Run the test suite in:

marvel-ai-backend\app\tools\presentation_generator_updated\image_generator\tests\

pytest test_tools.py test_core.py

Key Test Cases:

  • Basic image generation with valid inputs.
  • Error handling for invalid inputs.
  • GCS storage integration testing.
  • Parallel processing validation.
  • End-to-end integration test.

The test suite uses mock clients for Together.ai and Google Cloud Storage.
Environment variables for testing are configured in conftest.py.

Expected Outcomes:

  • All tests should pass without errors.
  • Generated images should be stored in GCS with public URLs.
  • Error cases should be properly handled and logged.

Unit Tests

Added Unit Tests (marvel-ai-backend\app\tools\presentation_generator_updated\image_generator\tests\):

test_tools.py

  • test_init: Verifies correct ImageGenerator initialization and configuration.
  • test_generate_single_image_success: Tests successful image generation pipeline.
  • test_generate_single_image_failure: Validates error handling for failed generations.
  • test_save_image_to_gcs: Tests Google Cloud Storage integration.

test_core.py

  • test_executor: Validates main executor function with presentation content.
  • test_executor_with_invalid_input: Tests error handling for invalid inputs.
  • test_parallel_processing: Verifies concurrent image generation functionality.

Note: All tests use mock objects for Together.ai client and Google Cloud Storage to ensure reliable testing without external dependencies.

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

  • Yes

A new setup guide needs to be added to the documentation. Here are the specific updates needed:

Image Generator Setup Guide

1. Prerequisites and API Keys

Users need to obtain:

  • Together.ai API key for accessing FLUX.1-schnell model
  • Google AI Studio API key for Gemini-1.5-pro model
  • Google Cloud Project credentials for storage functionality

2. Environment Configuration

Create or update .env file with:

TOGETHER_API_KEY=your_together_api_key
GOOGLE_API_KEY=your_google_ai_studio_key
GOOGLE_CLOUD_PROJECT=your-project-id
GCS_BUCKET_NAME=your-bucket-name

3. Google Cloud Platform Setup

  1. Create a new Google Cloud Project (or use existing)
  2. Enable Cloud Storage API in the project
  3. Create a service account with Storage Admin role
  4. Download the service account key JSON
  5. Set the credentials environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account-key.json"

4. Project Files Structure

Ensure prompt templates exist at:

marvel-ai-backend/app/tools/presentation_generator_updated/image_generator/prompt/
├── visual_prompt.txt
└── theme_prompt.txt

5. Dependencies Installation

pip install together langchain-google-genai google-cloud-storage

6. Setup Verification Checklist

  • Together.ai API key configured
  • Google AI Studio API key set up
  • GCS bucket created and configured
  • Service account credentials set up
  • Prompt template files in place
  • Environment variables set

7. Common Issues and Solutions

  • Image generation failures: Verify Together.ai API key and quota
  • Storage errors: Check GCS permissions and bucket configuration
  • Prompt generation issues: Validate Google AI Studio API key and quota

Links to documentation:

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

Files Added

New files created in marvel-ai-backend/app/tools/presentation_generator_updated/

marvel-ai-backend
├── app/tools/presentation_generator_updated/image_generator/
│   ├── prompts/
│   │   ├── theme_prompt.txt    # Prompt templates for image generation
│   │   └── visual_prompt.txt   # Prompt templates for image generation     
│   ├── tests/
│   │   ├── conftest.py         # Test configurations
│   │   ├── test_core.py        # Core functionality tests
│   │   └── test_tools.py       # Generator classes tests
│   ├── core.py                 # Main executor and pipeline logic
│   ├── metadata.json           # Metadata for the tool
│   └── tools.py                # ImageGenerator and ImagePromptGenerator classes
│
└── docs/
    ├── api/
    │   └── image-generator.md    # API integration guide
    └── setup/
        └── image-generator.md    # Full setup guide

Files Changed

Modified Existing Files:

  1. slide_generator/core.py

    • Added image generation integration.
    • Updated slide generation pipeline.
  2. slide_generator/tools.py

    • Added ImageGenerator client initialization.
    • Integrated image generation with slide creation.
  3. utils/tools_config.json

    • Added image-generator tool configuration.
    • Updated path mappings.
  4. services/schemas.py

    • Added ImageGeneratorInput schema.

@buriihenry buriihenry self-assigned this Apr 3, 2025
Copy link
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before approving thi PR Please do the following: Remove the saved images. No need to commit them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Need commit the Image

@ArtimisJ26
Copy link
Author

I have removed those old generated image files.

Copy link
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you manage to implement?

@stevenrayhinojosa-gmail-com
Copy link
Contributor

Thank you for working on the presentation generator update (mission-2). I've reviewed the current state of your PR branch and have some feedback:

Current Status
The PR appears to be in an early stage of development. I can see the directory structure has been set up for the new two-component approach (outline_generator and slide_generator), but the implementation files (core.py, tools.py) are missing from these directories.

Positive Aspects
The directory structure is well-organized, separating the outline generation and slide generation concerns
The metadata.json files are in place with appropriate input definitions
The tools_config.json has been updated to include the new components
The schema definitions in app/services/schemas.py are well-designed for both components
Areas for Completion
Before this PR can be fully tested and reviewed, the following items need to be addressed:

Implementation Files: Add the core implementation files (core.py, tools.py) to both the outline_generator and slide_generator directories
Test Files: Add test_core.py files to the tests directories to ensure proper test coverage
Documentation: Consider adding README.md files to explain the purpose and usage of each component
API Integration: Consider adding dedicated API endpoints for these components
Suggestions
Consider implementing a workflow that allows the output of the outline_generator to be directly used as input for the slide_generator
Add examples of how to use both components together
Include sample outputs in the documentation
I look forward to reviewing the completed implementation. The two-component approach is a promising architectural decision that should lead to more flexible and powerful presentation generation capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't to commit this file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants