Code Monkey home page Code Monkey logo

word2veclite's Introduction

Word2VecLite

Word2VecLite is a Python implementation of Word2Vec that makes it easy to understand how Word2Vec works. This package is intended to be used in conjunction with this blog post.

Installation

  • In your target folder, clone the repository with the command:

    git clone https://github.com/cbellei/word2veclite.git
  • Then, inside the same folder (as always, it is advisable to use a virtual environment):

    pip install .

NOTE: if you get a ModuleNotFoundError: No module named 'setuptools.wheel' error, you may need to update pip and setuptools via the command pip install -U pip setuptools

  • Make sure you install the correct dependencies by typing pip install -r requirements.txt

NOTE: this should work on both a conda environment and a standard virtual environment

  • To check that the package has been installed, in the Python shell type:

    import word2veclite
  • If everything works correctly, the package will be imported without errors.

Dependencies

  • Word2VecLite is tested on Python 3.6 and depends on NumPy, Keras (see requirements.txt for version

information). The unit tests in test/test_word2veclite.py depend on Tensorflow.

How to use Word2VecLite

Input. You need to: 1. Define a corpus of text (the vocabulary will be built from it) 2. Define which method you want to use: cbow or skipgram 3. Decide how many nodes should make the hidden layer 4. Define the size of the context window around the center word 5. Define learning rate 6. Decide how many epochs you want to train the neural network for

Output. Word2VecLite outputs the embeddings W1 and W2 of the neural network and the history of the loss vs. epochs.

Example

  • In IPython, type:

    from word2veclite import Word2Vec
    
    corpus = "I like playing football with my friends"
    cbow = Word2Vec(method="cbow", corpus=corpus,
                    window_size=1, n_hidden=2,
                    n_epochs=10, learning_rate=0.8)
    W1, W2, loss_vs_epoch = cbow.run()
    
    print(W1)
    #[[ 0.99870389  0.20697257]
    # [-1.01911559  2.26364436]
    # [-0.69737232  0.14131477]
    # [ 3.28315183  1.13801973]
    # [-1.42944927 -0.62142097]
    # [ 0.65359329 -2.21415048]
    # [-0.22343751 -1.17927987]]
    
    print(W2)
    #[[-0.97080793  1.21120331  2.15603796 -1.79083151  3.38445043 -1.65295511
    #   1.36685097]
    # [2.77323464  0.78710269  2.74152617  0.08953005  0.04400675 -1.34149651
    #   -2.19375528]]
    
    print(loss_vs_epoch)
    #[14.328868654443703, 12.290456644464603, 10.366644621637064,
    # 9.1759777684446622, 8.4233626997233895, 7.3952948684910256,
    # 6.1727393307549736, 5.1639476117698191, 4.6333377088153043,
    # 4.2944697259465485]

Licence

Apache License, Version 2.0

word2veclite's People

Contributors

cbellei avatar jeremy9959 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.