Code Monkey home page Code Monkey logo

Comments (8)

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Hi,
It's hard to tell having no data. I would suggest you to 1) leave Adam settings default; set smaller lr; try other optimizer; 2) make embedding layer trainable or remove it (static mode); 3) decrease model size; 4) check weight initialization; 5) larger validation set. Apart from optimizer, which could be different implementation, I see no reasons for significant difference between the frameworks

from cnn-for-sentence-classification-in-keras.

Quenjy avatar Quenjy commented on July 20, 2024

Thanks for the quick answer,

I've played around with other optimizers and learning rates for a bit, and the same kind of thing happens everytime. The first few iterations show a normal increase, then it starts jumping up and down 5% accuracy. Here's an example of it with the code above and adagrad instead of Adam.

image

This also happens with lower learning rates, the initial steady climb is just slower and then the jumps are less extreme (+-2% instead of 5)

Here's what my data typically looks like (before padding to size 32)

image

The embeddings are the pretrained fasttext embeddings provided by facebook.

I feel like the only difference between the implementation given in my first message and the denny britz one is the reshape layer, I couldn't get it to work and figured that your way of doing it (adding one hidden layer) was kind of equivalent. Do you think that could be the issue or should I just keep playing around with hyperparameters?

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Reshape layer isn't trainable and should't have an effect. Try increasing validation set size or explicitly fix this set for both frameworks.

from cnn-for-sentence-classification-in-keras.

Quenjy avatar Quenjy commented on July 20, 2024

I just increased the validation set from 10K to 40K samples (from a total of 200K labelled samples) and this is the result

image

I will try setting the same validation set for both this implementation and the denny britz one tomorrow, and keep you updated, it's getting pretty late here.

Thanks for taking the time to help me out anyway!

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Batch size can be an issue too. Try setting to 64-128

from cnn-for-sentence-classification-in-keras.

Quenjy avatar Quenjy commented on July 20, 2024

Alright, I've set the same validation set for both denny britz's CNN and the keras implementation and I can confirm that the same trend keeps happening. The TF CNN acts as expected (both validation and training accuracy grow steadily, then at some point we overfit so the validation acc goes down while the training acc keeps going up)

But with the keras implementation, the behavior noted above keeps happening, the first few iterations work as expected but then the validation accuracy goes crazy.

I've also set the batch size to 64, but the same behavior happens

image

The good thing is that from time to time I reach accuracies I never reach with the denny britz CNN, the bad thing is that it looks so messy I can't really use that for a paper.

I then decided to try the keras CNN on my whole dataset (2.5 million samples) and really lowered the learning rate (100 times smaller than the default RMSprop one). This is the result

image

It kind of works I guess? It's very slow but seems to steadily be going up, after 100 epochs (about 5 hours) it barely made it pas 0.8 validation accuracy though.
The weird thing is I checked and the optimizer denny uses it the same one I used at first, Adam optimizer with base parameters, so it would seem like the problem comes from the CNN and not the optimizer

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Well, one last idea - turn the embedding layer trainable=True

from cnn-for-sentence-classification-in-keras.

Quenjy avatar Quenjy commented on July 20, 2024

I tried it and it didn't really change much, except for the training time. I think I will just have to keep tweaking the parameters. A low learning rate avoids the erratic validation accuracy but gets stuck in a local minima.
The funny thing is how different the results can be for learning rates that are on the same order of magnitude, for example 0.00003 gets stuck at 0.8 after 50 epochs but 0.00005 reaches 0.83 after only a few epochs but keeps jumping up and down with no clear signs of improvement.

Such is life I guess

Also I forgot to mention my keras was built on top of TF, not Theanos. But I doubt that's the problem, since incompatibility issues between the two implementations usually result in a crash and I didn't run into one.

from cnn-for-sentence-classification-in-keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.