Replies: 1 comment 3 replies
-
|
Imo, single training will always beat merging, because it allows the model to distribute the weights based on the training data of multiple classes. Merging has no such consideration. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Maybe a naive question, but it's been on my mind for a long time. Please feel free to correct me if I say anything stupid.
LoRA types
Conceptually, in the field of image and video, there's globally two kind of LoRAs:
I don't know if that's the right semantics, but you get the idea.
LoRA rules
There are no precise rules but the broad outlines are:
There are dozens of interesting tutorials about creating datasets and preparing a training session. That's not the point here.
LoRA training
Here's where I'm going with this. Let's imagine a textbook case for training an individual, whatever the model, let's say Arnold Schwarzenegger. And let's say we want to produce a LoRA based on a mixture of images and videos.
Based on the generic rules of a character LoRA training. In a dozen images and/or videos, we must both present the face to the model from several angles and attitudes (avoid faces with a neutral attitude), present the half body and the full body, ideally from several angles.
Mechanically, we are already extremely limited in relation to the size of the corpus. Even more so if the character has characteristic body and facial attitudes (walk, facial tics, etc.).
Multi-Phase LoRA training
Let's take Arnold's idea and imagine a corpus where the model is only shown his face in close-up, from several angles, and with different attitudes: smile, anger, laughter ... It's a safe bet that the model will reproduce the face quite perfectly if the training has been done correctly. But without any indication of his body, as soon as it comes to producing a full body video, it's a safe bet that the result will be, let's say, weird.
So here's the idea. It's probably been explored already I'm sure, but I'd like to have your point of view, especially on the notion of merging. Wouldn't it be ideal in the context of training a character LoRa to multiply the specific trainings:
and then merge the LoRAs so as to produce a more complex LoRA by helping the model to generalize on each type of training?
LoRA merging
I am calling here on experts who can express their point of view on the principle of merging. I imagine that there is no miracle and that there must potentially be the equivalent of a signal/noise ratio as in the audio field. But what I would like to know is to what extent the loss of precision during merging is composed by the multiplicity of trainings which can potentially be much more qualitative?
What is your choice:
Beta Was this translation helpful? Give feedback.
All reactions