Code Monkey home page Code Monkey logo

Comments (8)

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024 1

Looks okay. embedding_weights must be a list of len=1 of ndarray with shape=(len(vocabulary_inv), num_features). It was made a list for compatibility with keras layer.set_weights()

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Hi,

for w in vocabulary_inv is list of words, not indexes.

from cnn-for-sentence-classification-in-keras.

chunjoe avatar chunjoe commented on July 20, 2024

Hi,

I appreciate for your instant reply.

In here, you mentioned that it is dict {int:str}.

In for w in vocabulary_inv , is w a list of words?

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Sorry, vocabulary_inv is list of strings, not dict. And w is string (i.e. word)

from cnn-for-sentence-classification-in-keras.

chunjoe avatar chunjoe commented on July 20, 2024

Sorry to disturb you again. I still feel it is strange...

In sentiment_cnn.py, vocabulary_inv is a dictionary object {int:str}. The vocabulary_inv is inputted to train_word2vec as a part of parameters then.

vocabulary = imdb.get_word_index()
vocabulary_inv = dict((v, k) for k, v in vocabulary.items())
vocabulary_inv[0] = "<PAD/>"

In w2v.py, I don't see where vocabulary_inv is converted to a list type object.
And I added print(type(vocabulary_inv )) in w2v.py. The program printed <class 'dict'> out.

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

This discrepancy arose after I switched to new [keras] data source. In previous major version data source was data_helpers.load_data() and it returns vocabulary_inv as list. I will fix it when I have more time. Should be dict everywhere

from cnn-for-sentence-classification-in-keras.

chunjoe avatar chunjoe commented on July 20, 2024

Thank you very much!!!

I wrote the following code. I know that is a little waste of memory...
For the purpose of solving problem , is the code right?

vocabulary_inv_list = [vocabulary_inv[i] for i in range(0, len(vocabulary_inv))]
embedding_weights = [np.array([embedding_model[w] if w in embedding_model
		else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)
		for w in vocabulary_inv_list])]

from cnn-for-sentence-classification-in-keras.

alexander-rakhlin avatar alexander-rakhlin commented on July 20, 2024

Please see updated version

from cnn-for-sentence-classification-in-keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.