Hi I noticed that you used parametric ReLU for these experiments, and I tried replacin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why vanilla ReLU cannot train at all. about siamese-triplet HOT 6 CLOSED

w-hc commented on May 27, 2024

Why vanilla ReLU cannot train at all.

from siamese-triplet.

Comments (6)

w-hc commented on May 27, 2024 1

No it still does not work. The basic classification training loss gets stuck at 2.3 i.e. random guessing. I don't think initialization can play such a huge role. It might really have to do with the fact that the final output is in 2d and in this case throwing away 3/4 of the space is too difficult for the network.

update:

lowering the starting learning rate down to 1e-3 can help to make a pure ReLU network trainable. It still converges much lower compared to PReLU. I should have spent a little more time doing hyper-parameter searching. The embedding looks as expected: 10 beams squeezed in the first quadrant.
Using vanilla ReLU for all layers, and simply changing the nonlinear in classification net to a PReLU, with default initialization, would make the network converge faster and get good looking embeddings. Constraining them to a quadrant is really too difficult.

from siamese-triplet.

adambielski commented on May 27, 2024

ReLU should work just as well. I chose PReLU because their outputs serve better for visualizations. And personally I like the idea of a learnable slope in the activation function (although various papers show it's not always better)

from siamese-triplet.

w-hc commented on May 27, 2024

But I tried replacing PReLU with ReLU, keeping the other config unchanged, and the training does not converge. The default setup seems reasonable though. It seems a little surprising.

from siamese-triplet.

adambielski commented on May 27, 2024

For training with ReLU, you need to initialize convolutional layers more carefully. E.g. you can use kaiming initialization with gain for ReLU nonlinearity. You can do it with these lines in initialization of EmbeddingNet:

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

from siamese-triplet.

adambielski commented on May 27, 2024

@w-hc did you try training with this change?

from siamese-triplet.

w-hc commented on May 27, 2024

I will try in a minute. Sry.

from siamese-triplet.

Recommend Projects

Why vanilla ReLU cannot train at all. about siamese-triplet HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent