First of all, thanks for sharing your code, I was looking for a keras implementation o

Trying to replicate the results obtained with denny brtiz's code about cnn-for-sentence-classification-in-keras HOT 8 CLOSED

alexander-rakhlin commented on July 20, 2024

Trying to replicate the results obtained with denny brtiz's code

from cnn-for-sentence-classification-in-keras.

Comments (8)

alexander-rakhlin commented on July 20, 2024

Hi,
It's hard to tell having no data. I would suggest you to 1) leave Adam settings default; set smaller lr; try other optimizer; 2) make embedding layer trainable or remove it (static mode); 3) decrease model size; 4) check weight initialization; 5) larger validation set. Apart from optimizer, which could be different implementation, I see no reasons for significant difference between the frameworks

from cnn-for-sentence-classification-in-keras.

Quenjy commented on July 20, 2024

Thanks for the quick answer,

I've played around with other optimizers and learning rates for a bit, and the same kind of thing happens everytime. The first few iterations show a normal increase, then it starts jumping up and down 5% accuracy. Here's an example of it with the code above and adagrad instead of Adam.

This also happens with lower learning rates, the initial steady climb is just slower and then the jumps are less extreme (+-2% instead of 5)

Here's what my data typically looks like (before padding to size 32)

The embeddings are the pretrained fasttext embeddings provided by facebook.

I feel like the only difference between the implementation given in my first message and the denny britz one is the reshape layer, I couldn't get it to work and figured that your way of doing it (adding one hidden layer) was kind of equivalent. Do you think that could be the issue or should I just keep playing around with hyperparameters?

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin commented on July 20, 2024

Reshape layer isn't trainable and should't have an effect. Try increasing validation set size or explicitly fix this set for both frameworks.

from cnn-for-sentence-classification-in-keras.

Quenjy commented on July 20, 2024

I just increased the validation set from 10K to 40K samples (from a total of 200K labelled samples) and this is the result

I will try setting the same validation set for both this implementation and the denny britz one tomorrow, and keep you updated, it's getting pretty late here.

Thanks for taking the time to help me out anyway!

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin commented on July 20, 2024

Batch size can be an issue too. Try setting to 64-128

from cnn-for-sentence-classification-in-keras.

Quenjy commented on July 20, 2024

Alright, I've set the same validation set for both denny britz's CNN and the keras implementation and I can confirm that the same trend keeps happening. The TF CNN acts as expected (both validation and training accuracy grow steadily, then at some point we overfit so the validation acc goes down while the training acc keeps going up)

But with the keras implementation, the behavior noted above keeps happening, the first few iterations work as expected but then the validation accuracy goes crazy.

I've also set the batch size to 64, but the same behavior happens

The good thing is that from time to time I reach accuracies I never reach with the denny britz CNN, the bad thing is that it looks so messy I can't really use that for a paper.

I then decided to try the keras CNN on my whole dataset (2.5 million samples) and really lowered the learning rate (100 times smaller than the default RMSprop one). This is the result

It kind of works I guess? It's very slow but seems to steadily be going up, after 100 epochs (about 5 hours) it barely made it pas 0.8 validation accuracy though.
The weird thing is I checked and the optimizer denny uses it the same one I used at first, Adam optimizer with base parameters, so it would seem like the problem comes from the CNN and not the optimizer

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin commented on July 20, 2024

Well, one last idea - turn the embedding layer trainable=True

from cnn-for-sentence-classification-in-keras.

Quenjy commented on July 20, 2024

I tried it and it didn't really change much, except for the training time. I think I will just have to keep tweaking the parameters. A low learning rate avoids the erratic validation accuracy but gets stuck in a local minima.
The funny thing is how different the results can be for learning rates that are on the same order of magnitude, for example 0.00003 gets stuck at 0.8 after 50 epochs but 0.00005 reaches 0.83 after only a few epochs but keeps jumping up and down with no clear signs of improvement.

Such is life I guess

Also I forgot to mention my keras was built on top of TF, not Theanos. But I doubt that's the problem, since incompatibility issues between the two implementations usually result in a crash and I didn't run into one.

from cnn-for-sentence-classification-in-keras.

Trying to replicate the results obtained with denny brtiz's code about cnn-for-sentence-classification-in-keras HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent