Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path? about mergekit HOT 2 OPEN

win10ogod commented on May 27, 2024

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

from mergekit.

Comments (2)

shamanez commented on May 27, 2024

@win10ogod I also have a similar question. When we are using the pass-through method, is there any logical way where we can select layers from each model? Can we use something like task arithmetic values to pick the most useful layers?

from mergekit.

vince62s commented on May 27, 2024

I am pretty sure you are asking the same question as what I am looking at (if not sorry to hijack this post).
I read the paper Model soup mentioned by @cg123 here https://arxiv.org/pdf/2203.05482.pdf
When reading the section 4, we see they try to compare "soups" and "Ensembling"
If am not mistaken, my understanding is that Soups is well suited for models sharing the same initialization weights (seed) otherwise models take a completely different path and averaging weights is either irrelevant OR require a post training (finetuning) that may or may not be beneficial. On the other hand, Ensembling is suited for different models since it acts at logits level hence taking the best path mentioned in the title.
Ensembling is de facto superior to "Soup" (as they refer in the paper).
So the question is, do other methods than Linear emulate better ensembling for models that do not share the same initialization.

Am I correct ?

from mergekit.

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path? about mergekit HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent