Skip to content

Conversation

LeosCtrt
Copy link

@LeosCtrt LeosCtrt commented Jun 19, 2025

Description

Hello, I propose here an optional soft mixture of experts implementation for RF-DETR. The idea comes from the article From Sparse to Soft Mixtures of Experts, and it seems to usually improve the overall performance of a model. I thought that it would be a nice option to add.

As the article suggests, I set the default value of experts per slots to 1.

In order to add this option we would need to add the python library "soft_moe". This library offers good a Soft MoE Wrapper, the source code is available here.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

I tested it on a small private dataset and it improved a bit the accuracy (I gained roughly 1% mAP, so a small gain but still a gain). I am not able to test it on the full COCO dataset but I guess it would also improve a bit the overall accuracy.

As expected the MoE RF-DETR model is slightly slower than the basic one, but still very fast.

@isaacrob-roboflow
Copy link
Collaborator

we are leveraging a pretrained backbone, so without robust pretrained weights we can't use a different architecture

@isaacrob-roboflow
Copy link
Collaborator

additionally, have you looked at TRT speed for this? many of these types of architectures actually end up being meaningfully slower in terms of latency when compiled due to memory bottleneck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants