Skip to content

Conversation

theoteske
Copy link

Description

This PR introduces a new Image Generator tool that creates high-quality educational images from text prompts. The tool leverages Black Forest Labs' Flux 1.1 Pro API for image generation and includes Google Cloud Storage integration for persistent image storage. It features educational context enhancement, content type detection and subsequent prompt enhancement, and safety filtering to ensure generated images are high quality and appropriate for educational use.

I have included a Loom video walkthrough.

Related Issue

N/A

Type of Change

Please select the type(s) of change that apply and delete those that do not.

  • New feature: A non-breaking change that adds functionality.

Proposed Solution

The Image Generator tool is designed to create educational images with the following key components:

  1. Core Image Generation: Use Black Forest Labs' Flux 1.1 Pro API to generate high-quality images from text prompts. Using the API provided by Black Forest Labs rather than other APIs (such as together.ai) allows us to surpass the rate limit if necessary. As stated in the documentation they provide, we can reach out to [email protected] if we want to perform higher volumes of API calls. We poll for the image at most 30 times, and if we successfully receive an image, it is then stored as a base 64 encoding.
  2. Educational Context Enhancement: Use Gemini to detect educational subjects and grade levels if they are not provided by the user, then enhances prompts with appropriate educational context in order to ensure that the generated image is suitable for the age of the students and relevant to the desired subject area.
  3. Content Type Detection: Automatically identify the type of educational content being requested (diagrams, concepts, processes, etc.) if certain keywords appear, or use Gemini to infer the content type if none of the keywords appear. Once the content type has been ascertained, specialized prompt templates based on the content type with more detailed instructions for the image generator are appended to the base prompt. For example, if Gemini infers that the prompt is asking for a diagram, detailed instructions regarding proper labeling, inclusion of a key, etc are added on to the base prompt. Empirically, this step greatly improved the quality of the images generated.
  4. Safety Filtering: Similar to the content type detection, if certain unsafe keywords appear, the prompt is immediately marked as unsafe and a verbose error is raised. If none of the keywords appear, then Gemini infers whether or not the prompt is safe. If the safety check fails for some reason, the current default is to consider the prompt to be safe, but this could be changed depending on what is more desirable.
  5. Google Cloud Storage Integration: Automatically store generated images in a GCP bucket for persistent access with public URLs. The implementation gracefully handles environments without GCP configuration by making the storage feature optional. Path handling is added to support running in Docker containers with mounted credential files for the service account.

How to Test

  1. Basic Image Generation

Test Configuration:
Use the FastAPI Swagger UI at http://localhost:8000/docs to test the /submit-tool endpoint with the following JSON:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "A diagram of the solar system"
      },
      {
        "name": "subject",
        "value": "astronomy"
      },
      {
        "name": "grade_level",
        "value": "middle school"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The response should include:

  • An image_b64 field containing a base64-encoded image
  • A prompt_used field showing the enhanced prompt
  • An educational_context field with "astronomy for middle school level"
  • A safety_applied field set to true
    Example response:
{
  "image_b64": "iVBORw0KGgoAAAANSUhEUgAA...",
  "prompt_used": "A diagram of the solar system, educational context: astronomy for middle school level",
  "educational_context": "astronomy for middle school level",
  "safety_applied": true
}

If GCP storage is configured, the response will also include a gcp_url field with a public URL to the stored image.

  1. Educational Context Enhancement

Test Configuration:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "The process of photosynthesis"
      },
      {
        "name": "subject",
        "value": "biology"
      },
      {
        "name": "grade_level",
        "value": "high school"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The response should show that the prompt was enhanced with the appropriate educational context:

  • The prompt_used field should include "educational context: biology for high school level"
  • The image should be appropriate for high school biology students
  • The content should be scientifically accurate and detailed enough for high school level

Try another example with a different subject and grade level:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "Addition and subtraction with fractions"
      },
      {
        "name": "subject",
        "value": "mathematics"
      },
      {
        "name": "grade_level",
        "value": "elementary"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

The response should show a simpler, more elementary-appropriate enhancement in the prompt_used field.

  1. Safety Filtering
    Test Configuration:
{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": [
      {
        "name": "prompt",
        "value": "Violent battle scene with gore and blood"
      },
      {
        "name": "subject",
        "value": "history"
      },
      {
        "name": "grade_level",
        "value": "elementary"
      },
      {
        "name": "lang",
        "value": "en"
      }
    ]
  }
}

Expected Outcome:
The system should reject the unsafe content with an error response:

{
  "status": 400,
  "message": "The prompt contains inappropriate content for educational use"
}

The logs should show:

ERROR - HTTPException: 400: The prompt contains inappropriate content for educational use
  1. Test with Missing Required Inputs

Test Configuration:

{
  "user": {
    "id": "test-user",
    "fullName": "Test User",
    "email": "[email protected]"
  },
  "type": "tool",
  "tool_data": {
    "tool_id": "image-generator",
    "inputs": []
  }
}

Expected Outcome:
The system should return an error indicating that the prompt is required.

{
  "status": 400,
  "message": "Missing input: `prompt`"
}

Unit Tests

The implementation includes comprehensive test coverage:
Core Functionality Tests:

  • test_executor: Tests the main executor function
  • test_executor_without_gcp: Tests behavior without GCP configuration
  • test_executor_missing_inputs: Tests error handling for missing inputs

Image Generation Tests:

  • test_generate_image_with_api_key: Tests basic image generation
  • test_generate_image_with_gcp_storage: Tests GCP storage integration
  • test_generate_image_development_mode: Tests fallback behavior without API key

Educational Enhancement Tests:

  • test_generate_educational_image: Tests the full educational pipeline
  • test_enhance_prompt_with_educational_context: Tests context enhancement
  • test_detect_content_type: Tests content type detection

Safety Tests:

  • test_check_prompt_safety: Tests safety filtering
  • test_generate_educational_image_unsafe: Tests rejection of unsafe content

GCP Integration Tests:

  • test_upload_to_gcp_bucket: Tests GCP upload functionality
  • test_generate_educational_image_without_gcp: Tests behavior without GCP

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

  • Yes
  • No

The README.md written in the PR includes comprehensive documentation covering:

  • Tool overview and features
  • Setup instructions
  • API usage with examples
  • GCP integration setup
  • Troubleshooting tips

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

@stevenrayhinojosa-gmail-com
Copy link
Contributor

stevenrayhinojosa-gmail-com commented Apr 29, 2025

Dude you were killing me for a second, I had to do some remote add stuff but I finally was able to run a Pytest.

app/tools/image_generator/tests/test_tools.py::test_image_generator_args_model PASSED [100%]

==================================================== 44 passed in 0.44s =====================================================

Copy link
Contributor

@buriihenry buriihenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Share a loom video to show the process flow and the logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants