Image generator #206

theoteske · 2025-04-22T05:00:13Z

Description

This PR introduces a new Image Generator tool that creates high-quality educational images from text prompts. The tool leverages Black Forest Labs' Flux 1.1 Pro API for image generation and includes Google Cloud Storage integration for persistent image storage. It features educational context enhancement, content type detection and subsequent prompt enhancement, and safety filtering to ensure generated images are high quality and appropriate for educational use.

I have included a Loom video walkthrough.

Related Issue

N/A

Type of Change

Please select the type(s) of change that apply and delete those that do not.

New feature: A non-breaking change that adds functionality.

Proposed Solution

The Image Generator tool is designed to create educational images with the following key components:

Core Image Generation: Use Black Forest Labs' Flux 1.1 Pro API to generate high-quality images from text prompts. Using the API provided by Black Forest Labs rather than other APIs (such as together.ai) allows us to surpass the rate limit if necessary. As stated in the documentation they provide, we can reach out to [email protected] if we want to perform higher volumes of API calls. We poll for the image at most 30 times, and if we successfully receive an image, it is then stored as a base 64 encoding.
Educational Context Enhancement: Use Gemini to detect educational subjects and grade levels if they are not provided by the user, then enhances prompts with appropriate educational context in order to ensure that the generated image is suitable for the age of the students and relevant to the desired subject area.
Content Type Detection: Automatically identify the type of educational content being requested (diagrams, concepts, processes, etc.) if certain keywords appear, or use Gemini to infer the content type if none of the keywords appear. Once the content type has been ascertained, specialized prompt templates based on the content type with more detailed instructions for the image generator are appended to the base prompt. For example, if Gemini infers that the prompt is asking for a diagram, detailed instructions regarding proper labeling, inclusion of a key, etc are added on to the base prompt. Empirically, this step greatly improved the quality of the images generated.
Safety Filtering: Similar to the content type detection, if certain unsafe keywords appear, the prompt is immediately marked as unsafe and a verbose error is raised. If none of the keywords appear, then Gemini infers whether or not the prompt is safe. If the safety check fails for some reason, the current default is to consider the prompt to be safe, but this could be changed depending on what is more desirable.
Google Cloud Storage Integration: Automatically store generated images in a GCP bucket for persistent access with public URLs. The implementation gracefully handles environments without GCP configuration by making the storage feature optional. Path handling is added to support running in Docker containers with mounted credential files for the service account.

How to Test

Basic Image Generation

Test Configuration:
Use the FastAPI Swagger UI at http://localhost:8000/docs to test the /submit-tool endpoint with the following JSON:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "A diagram of the solar system"
      },
      {
        "name": "subject",
        "value": "astronomy"
      },
      {
        "name": "grade_level",
        "value": "middle school"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The response should include:

An image_b64 field containing a base64-encoded image
A prompt_used field showing the enhanced prompt
An educational_context field with "astronomy for middle school level"
A safety_applied field set to true
Example response:

{
  "image_b64": "iVBORw0KGgoAAAANSUhEUgAA...",
  "prompt_used": "A diagram of the solar system, educational context: astronomy for middle school level",
  "educational_context": "astronomy for middle school level",
  "safety_applied": true
}

If GCP storage is configured, the response will also include a gcp_url field with a public URL to the stored image.

Educational Context Enhancement

Test Configuration:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "The process of photosynthesis"
      },
      {
        "name": "subject",
        "value": "biology"
      },
      {
        "name": "grade_level",
        "value": "high school"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The response should show that the prompt was enhanced with the appropriate educational context:

The prompt_used field should include "educational context: biology for high school level"
The image should be appropriate for high school biology students
The content should be scientifically accurate and detailed enough for high school level

Try another example with a different subject and grade level:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "Addition and subtraction with fractions"
      },
      {
        "name": "subject",
        "value": "mathematics"
      },
      {
        "name": "grade_level",
        "value": "elementary"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

The response should show a simpler, more elementary-appropriate enhancement in the prompt_used field.

Safety Filtering
Test Configuration:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "Violent battle scene with gore and blood"
      },
      {
        "name": "subject",
        "value": "history"
      },
      {
        "name": "grade_level",
        "value": "elementary"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The system should reject the unsafe content with an error response:

{
  "status": 400,
  "message": "The prompt contains inappropriate content for educational use"
}

The logs should show:

ERROR - HTTPException: 400: The prompt contains inappropriate content for educational use

Test with Missing Required Inputs

Test Configuration:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": []
  }
}

Expected Outcome:
The system should return an error indicating that the prompt is required.

{
  "status": 400,
  "message": "Missing input: `prompt`"
}

Unit Tests

The implementation includes comprehensive test coverage:
Core Functionality Tests:

test_executor: Tests the main executor function
test_executor_without_gcp: Tests behavior without GCP configuration
test_executor_missing_inputs: Tests error handling for missing inputs

Image Generation Tests:

test_generate_image_with_api_key: Tests basic image generation
test_generate_image_with_gcp_storage: Tests GCP storage integration
test_generate_image_development_mode: Tests fallback behavior without API key

Educational Enhancement Tests:

test_generate_educational_image: Tests the full educational pipeline
test_enhance_prompt_with_educational_context: Tests context enhancement
test_detect_content_type: Tests content type detection

Safety Tests:

test_check_prompt_safety: Tests safety filtering
test_generate_educational_image_unsafe: Tests rejection of unsafe content

GCP Integration Tests:

test_upload_to_gcp_bucket: Tests GCP upload functionality
test_generate_educational_image_without_gcp: Tests behavior without GCP

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

Yes
No

The README.md written in the PR includes comprehensive documentation covering:

Tool overview and features
Setup instructions
API usage with examples
GCP integration setup
Troubleshooting tips

Checklist

I have performed a self-review of my code.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
Any dependent changes have been merged and published in downstream modules.

Additional Information

This reverts commit 1c75762.

This reverts commit 5d2d4ef.

stevenrayhinojosa-gmail-com · 2025-04-29T21:04:21Z

Dude you were killing me for a second, I had to do some remote add stuff but I finally was able to run a Pytest.

app/tools/image_generator/tests/test_tools.py::test_image_generator_args_model PASSED [100%]

==================================================== 44 passed in 0.44s =====================================================

buriihenry

Looking good. Share a loom video to show the process flow and the logic

theoteske added 7 commits April 16, 2025 16:27

initial image generator

7039900

implemented prompt routing and created unit tests

1c75762

Revert "implemented prompt routing and created unit tests"

5d2d4ef

This reverts commit 1c75762.

Revert "Revert "implemented prompt routing and created unit tests""

e040cb7

This reverts commit 5d2d4ef.

small fixes

d28f69e

added GCP bucket for image storage

a07158f

edited readme file

13662e7

buriihenry approved these changes May 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image generator #206

Image generator #206

Uh oh!

theoteske commented Apr 22, 2025

Uh oh!

stevenrayhinojosa-gmail-com commented Apr 29, 2025 •

edited

Loading

Uh oh!

buriihenry left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Image generator #206

Are you sure you want to change the base?

Image generator #206

Uh oh!

Conversation

theoteske commented Apr 22, 2025

Description

Related Issue

Type of Change

Proposed Solution

How to Test

Unit Tests

Documentation Updates

Checklist

Additional Information

Uh oh!

stevenrayhinojosa-gmail-com commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buriihenry left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevenrayhinojosa-gmail-com commented Apr 29, 2025 •

edited

Loading