Code Monkey home page Code Monkey logo

sentiment-analysis-with-convolutional-networks's Introduction

Sentiment Analysis with Convolutional Networks

Here is one of my submissions to Kaggle challenge 'Bag of Words meets Bags of Popcorn'.

It is based on the idea of combining pre-trained word2vec embeddings with convolutional networks proposed by Yoon Kim [http://arxiv.org/abs/1408.5882].

The code consists of two IPython Notebooks:

  1. Process Kaggle Dataset Train+Test.ipynb contains data pre-processing.

  2. Train CNN IMDB.ipynb implements convolutional network with one convolutional layer.

This model (trained for 3 epochs) yields AUC = 0.96823 (on test data).

Ensemble of three convolutional networks (having different number of convolutional layers and feature maps) gives AUC = 0.97310.

Dependencies

sentiment-analysis-with-convolutional-networks's People

Contributors

vsl9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sentiment-analysis-with-convolutional-networks's Issues

'list' object has no attribute 'min'

Hello, i'm trying the code and i found an error that i can not fix in file "Train CNN IMDB.ipynb":

Traceback (most recent call last):
File "/home/ch//Sandbox/keras/kaggle_sentAnalysis/cnn_kaggle.py", line 136, in
output = model.predict_proba(val_X, batch_size=10, verbose=1)
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 791, in predict_proba
if preds.min() < 0. or preds.max() > 1.:
AttributeError: 'list' object has no attribute 'min'

Thank you.

Error after uploading the GoogleNews-vectors-negative300-SLIM.bin file

@vsl9

I am getting this error when I was trying to load 'GoogleNews-vectors-negative300-SLIM.bin'. (Code given below)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-b567896a8aa6> in <module>
     14 print('vocab size: ' + str(len(vocab)))
     15 print('max sentence length: ' + str(max_l))
---> 16 w2v = load_bin_vec(wv_from_bin, vocab)
     17 print(w2v)
     18 print('word2vec loaded!')

<ipython-input-20-59822c213c28> in load_bin_vec(fname, vocab)
     49     """
     50     word_vecs = {}
---> 51     with open(fname, 'rb') as f:
     52         header = f.readline()
     53         vocab_size, layer1_size = map(int, header.split())

TypeError: expected str, bytes or os.PathLike object, not Word2VecKeyedVectors

Code is

w2v_file = 'GoogleNews-vectors-negative300-SLIM.bin'
revs, vocab = build_data_train_test(data_train, train_ratio=0.6, clean_string=True)
max_l = np.max(pd.DataFrame(revs)['num_words'])
print('data loaded!')
print('number of sentences: ' + str(len(revs)))
print('vocab size: ' + str(len(vocab)))
print('max sentence length: ' + str(max_l))
w2v = load_bin_vec(w2v_file, vocab)
print(w2v)
print('word2vec loaded!')
print('num words already in word2vec: ' + str(len(w2v)))

add_unknown_words(w2v, vocab)
W, word_idx_map = get_W(w2v)
cPickle.dump([revs, W, word_idx_map, vocab], open('imdb-train-val-testN.pickle', 'wb'))
print('dataset created successfully!')

Any help or guidance is highly appreciated.

The output file

Hey, the output file currently gives values ranging between 0.0 and 1.0. Is it not supposed to be binary?

How do I get the sentiment value to be binary?

0 being negative, and 1 being positive.

Accuracy

Hi,
which accuracy did the Yoon Kim CNN achieve with these settings?

Keras reshape error

Seems like the line "model.add(Reshape(1, conv_input_height, conv_input_width))" no longer works. It throws an error:


TypeError Traceback (most recent call last)
in ()
26 # Reshape word vectors from Embedding to tensor format suitable for Convolutional layer
27 # reshape from three dimensional to four dimensional
---> 28 model.add(Reshape(1,conv_input_height, conv_input_width) )
29
30 # first convolutional layer

TypeError: init() takes exactly 2 arguments (4 given)


Based on the Keras documentation, I changed it to:
model.add(Reshape((1, conv_input_height, conv_input_width)))

And yet it throws another error now:


 34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

---> 35 return umr_prod(a, axis, dtype, out, keepdims)
36
37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: long() argument must be a string or a number, not 'NoneType'


I have upgraded both keras/theano to the bleeding edge version. It's been very hard to find any alternative working example of embedding followed by 2d convolution with keras. Any idea why this happens and how we may fix it?

Thanks a lot!!!

Why indices when creating sentences?

Hello there,

I was going through the way you constructed the sentence matrix and then I didn't understand one things. Why did you take the indices of the words in the vocabulary while creating the sentence matrices? I would have imagined that as input for the convolutional neural network, we would be constructing matrices out of the word vectors for each word in the review. Or am i missing something here?

And I am sorry to have posted this question as an issue. I would have contacted you by email but I couldn't find any contact information.

Thanks for your help

Keras "object() doesn't take any parameter" error

Hi there,

I was just trying to implement your model and I ran into trouble trying to run the CNN using Keras. When I run model.compile with the parameters you provide i get the following error

 TypeError                                 Traceback (most recent call last)
<ipython-input-24-8079df85a0aa> in <module>()
----> 1 model.compile(loss='categorical_crossentropy', optimizer='adadelta')

D:\Users\Dhruv.Sharma\AppData\Local\Continuum\Anaconda\lib\site-packages\keras-0.1.2-py2.7.egg\keras\models.pyc in compile(self, optimizer, loss, class_mode, theano_mode)
332         for r in self.regularizers:
333             train_loss = r(train_loss)
--> 334         updates = self.optimizer.get_updates(self.params, self.constraints, train_loss)
335 
336         if type(self.X_train) == list:

D:\Users\Dhruv.Sharma\AppData\Local\Continuum\Anaconda\lib\site-packages\keras-0.1.2-py2.7.egg\keras\optimizers.pyc in get_updates(self, params, constraints, loss)
138 
139             new_p = p - self.lr * update
--> 140             updates.append((p, c(new_p))) # apply constraints
141 
142             # update delta_accumulator

TypeError: object() takes no parameters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.