Skip to content

mtmd : support home-cooked Mistral Small Omni #14928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Jul 28, 2025

Support a home-cooked version of Mistral Small which can take both audio and image as input

Link to GGUF: https://huggingface.co/ngxson/Home-Cook-Mistral-Small-Omni-24B-2507-GGUF

image

Try it

# audio
llama-mtmd-cli \
  -m ../models/mistral_omni_gguf/Home-Cook-Mistral-Small-Omni-2507-Q4_K_M.gguf \
  --mmproj ../models/mistral_omni_gguf/mmproj-model-f16.gguf \
  --audio ../models/i-have-a-dream-30s.mp3  \
  -p "Transcribe this"

# Here is the transcription of the provided text:
#
# "I have a dream that one day every valley shall be exalted, every hill and mountain shall be made low, the rough places will
# be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed, and all flesh shall
# see it together. This is our hope. With this faith, we will be able to hew out of the mountain of despair a stone of hope."


# vision
llama-mtmd-cli \
  -m ../models/mistral_omni_gguf/Home-Cook-Mistral-Small-Omni-2507-Q4_K_M.gguf \
  --mmproj ../models/mistral_omni_gguf/mmproj-model-f16.gguf \
  --image ../models/bliss.png  \
  -p "What is this"

# This image depicts a serene and expansive landscape featuring a vast, rolling green field under a clear blue sky dotted with 
# fluffy white clouds. The field appears to be lush and well-maintained, possibly a meadow or pasture. The horizon is marked by 
# a gentle, grassy hill that adds depth and dimension to the scene. The overall atmosphere is tranquil and inviting, 
# suggesting a peaceful countryside setting.

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! :)

@stduhpf
Copy link
Contributor

stduhpf commented Jul 28, 2025

Ah I was trying to make something like that work too. Nice job

@ngxson ngxson marked this pull request as ready for review July 29, 2025 10:01
@stduhpf
Copy link
Contributor

stduhpf commented Jul 29, 2025

Maybe the merge_mmproj_models.py script should be included in llama.cpp? (and maybe edited to be able to support other architectures like qwen2.5-omni ?)

@ngxson
Copy link
Collaborator Author

ngxson commented Jul 29, 2025

merge_mmproj_models.py is quite a one-off solution for now, so I don't think it's useful to be included inside llama.cpp. It's needed because the original model are literally 2 different models.

We don't need that for qwen2.5-omni, because both audio+vision tensors are all included in the original model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants