Comments (8)
Hi,
It's hard to tell having no data. I would suggest you to 1) leave Adam settings default; set smaller lr; try other optimizer; 2) make embedding layer trainable or remove it (static mode); 3) decrease model size; 4) check weight initialization; 5) larger validation set. Apart from optimizer, which could be different implementation, I see no reasons for significant difference between the frameworks
from cnn-for-sentence-classification-in-keras.
Thanks for the quick answer,
I've played around with other optimizers and learning rates for a bit, and the same kind of thing happens everytime. The first few iterations show a normal increase, then it starts jumping up and down 5% accuracy. Here's an example of it with the code above and adagrad instead of Adam.
This also happens with lower learning rates, the initial steady climb is just slower and then the jumps are less extreme (+-2% instead of 5)
Here's what my data typically looks like (before padding to size 32)
The embeddings are the pretrained fasttext embeddings provided by facebook.
I feel like the only difference between the implementation given in my first message and the denny britz one is the reshape layer, I couldn't get it to work and figured that your way of doing it (adding one hidden layer) was kind of equivalent. Do you think that could be the issue or should I just keep playing around with hyperparameters?
from cnn-for-sentence-classification-in-keras.
Reshape layer isn't trainable and should't have an effect. Try increasing validation set size or explicitly fix this set for both frameworks.
from cnn-for-sentence-classification-in-keras.
I just increased the validation set from 10K to 40K samples (from a total of 200K labelled samples) and this is the result
I will try setting the same validation set for both this implementation and the denny britz one tomorrow, and keep you updated, it's getting pretty late here.
Thanks for taking the time to help me out anyway!
from cnn-for-sentence-classification-in-keras.
Batch size can be an issue too. Try setting to 64-128
from cnn-for-sentence-classification-in-keras.
Alright, I've set the same validation set for both denny britz's CNN and the keras implementation and I can confirm that the same trend keeps happening. The TF CNN acts as expected (both validation and training accuracy grow steadily, then at some point we overfit so the validation acc goes down while the training acc keeps going up)
But with the keras implementation, the behavior noted above keeps happening, the first few iterations work as expected but then the validation accuracy goes crazy.
I've also set the batch size to 64, but the same behavior happens
The good thing is that from time to time I reach accuracies I never reach with the denny britz CNN, the bad thing is that it looks so messy I can't really use that for a paper.
I then decided to try the keras CNN on my whole dataset (2.5 million samples) and really lowered the learning rate (100 times smaller than the default RMSprop one). This is the result
It kind of works I guess? It's very slow but seems to steadily be going up, after 100 epochs (about 5 hours) it barely made it pas 0.8 validation accuracy though.
The weird thing is I checked and the optimizer denny uses it the same one I used at first, Adam optimizer with base parameters, so it would seem like the problem comes from the CNN and not the optimizer
from cnn-for-sentence-classification-in-keras.
Well, one last idea - turn the embedding layer trainable=True
from cnn-for-sentence-classification-in-keras.
I tried it and it didn't really change much, except for the training time. I think I will just have to keep tweaking the parameters. A low learning rate avoids the erratic validation accuracy but gets stuck in a local minima.
The funny thing is how different the results can be for learning rates that are on the same order of magnitude, for example 0.00003 gets stuck at 0.8 after 50 epochs but 0.00005 reaches 0.83 after only a few epochs but keeps jumping up and down with no clear signs of improvement.
Such is life I guess
Also I forgot to mention my keras was built on top of TF, not Theanos. But I doubt that's the problem, since incompatibility issues between the two implementations usually result in a crash and I didn't run into one.
from cnn-for-sentence-classification-in-keras.
Related Issues (20)
- error when retraining word vector HOT 3
- Error in w2v.py line 52 HOT 4
- TypeError: __init__() takes at least 3 arguments (2 given) HOT 1
- Running instructions HOT 1
- expected input_4 to have shape (None, 185) but got array with shape (1665, 35) HOT 1
- how to run trained model on sample sentence HOT 1
- Using local directory dataset does not yield the marked results HOT 1
- Using Glove or GoogleNews? HOT 1
- Wrong model for Y.Kim's TextCNN HOT 1
- Using pre-trained google word embeddings HOT 3
- Negative dimension size caused by subtracting 3 from 1
- Only words, no sentences HOT 1
- 问题咨询 HOT 8
- How to train the model with multi-class dataset HOT 2
- accuracy HOT 3
- The model always predicts the same label HOT 1
- Two fully-connected layers after convolutions HOT 1
- as for the CNN-non-static model initialization issue
- Multiple Dropouts different from Original Paper and Denny Britz
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cnn-for-sentence-classification-in-keras.