[FirebaseAI] FR: Extend GenerativeAIMultimodalExample to support analysis of multiple media types

# Overview

The current `GenerativeAIMultimodalExample` sample project only supports image analysis, with a single input method—using `PhotosPicker` to select images, and only processes image (`UIImage`) formats.

However, according to the Swift version of the Firebase AI API, it can actually support analysis of more media types, including video, audio, and PDF.

## Firebase AI API Analysis

According to the official documentation: https://firebase.google.com/docs/ai-logic, all media types use the same `generateContent` format, which makes things much simple.

```swift
// Image analysis (implemented)
let response = try await model.generateContent(image, prompt)

// Video analysis (to be extended)
let video = InlineDataPart(data: videoData, mimeType: "video/mp4")
let response = try await model.generateContent(video, prompt)

// Audio analysis (to be extended)
let audio = InlineDataPart(data: audioData, mimeType: "audio/mpeg")
let response = try await model.generateContent(audio, prompt)

// PDF document analysis (to be extended)
let pdf = InlineDataPart(data: pdfData, mimeType: "application/pdf")
let response = try await model.generateContent(pdf, prompt)
```

## Proposed Design Plan

### 1. UI Enhancement

#### Tab Navigation Design
Add 4 media type selection tabs at the top of the interface:

- **📷 Image**
- **🎥 Video**
- **🎵 Audio**
- **📄 Document**

#### Input Component Upgrade
Expand the `MultimodalInputField` component:

1. **Dynamic Button**: Show the corresponding file picker based on the selected tab. Not all the pickers for every tab.
2. **Type Indicator**: Clearly display the currently selected media type
3. **Preview Optimization**: Provide appropriate preview for different media types

### 2. Data Processing Extension

#### File Handler
Implement file processing logic for each media type, support DocumentPicker selection, convert to `InlineDataPart`.

#### ViewModel Refactor
Extend `PhotoReasoningViewModel` to support state management for multiple media types.

### 3. User Experience Optimization

Smart Prompt: Provide corresponding prompt to the `MultimodalInputField` for preview based on the selected media type, for example:

- **Image**: "Describe the content of this image"
- **Video**: "Summarize the main content of this video"
- **Audio**: "Transcribe and analyze this audio"
- **Document**: "Extract key information from the document"

## Conclusion

This feature extension will fully showcase the complete multimedia analysis capabilities of Firebase AI, providing iOS developers with more comprehensive learning and reference resources. @peterfriese 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FirebaseAI] FR: Extend GenerativeAIMultimodalExample to support analysis of multiple media types #1729

Overview

Firebase AI API Analysis

Proposed Design Plan

1. UI Enhancement

Tab Navigation Design

Input Component Upgrade

2. Data Processing Extension

File Handler

ViewModel Refactor

3. User Experience Optimization

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FirebaseAI] FR: Extend GenerativeAIMultimodalExample to support analysis of multiple media types #1729

Description

Overview

Firebase AI API Analysis

Proposed Design Plan

1. UI Enhancement

Tab Navigation Design

Input Component Upgrade

2. Data Processing Extension

File Handler

ViewModel Refactor

3. User Experience Optimization

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions