Code Monkey home page Code Monkey logo

word-embeddings-benchmarks's Introduction

Word Embeddings Benchmarks =====

image

Word Embedding Benchmark (web) package is focused on providing methods for easy evaluating and reporting results on common benchmarks (analogy, similarity and categorization).

Research goal of the package is to help drive research in word embeddings by easily accessible reproducible results (as there is a lot of contradictory results in the literature right now). This should also help to answer question if we should devise new methods for evaluating word embeddings.

To evaluate your embedding (converted to word2vec or python dict pickle) on all fast-running benchmarks execute ./scripts/eval_on_all.py <path-to-file>. See here results for embeddings available in the package.

Warnings and Disclaimers:

  • Analogy test does not normalize internally word embeddings.
  • Package is currently under development, and we expect within next few months an official release. The main issue that might hit you at the moment is rather long embeddings loading times (especially if you use fetchers).

Please also refer to our recent publication on evaluation methods https://arxiv.org/abs/1702.02170.

Features:

  • scikit-learn API and conventions
  • 18 popular datasets
  • 11 word embeddings (word2vec, HPCA, morphoRNNLM, GloVe, LexVec, ConceptNet, HDC/PDC and others)
  • methods to solve analogy, similarity and categorization tasks

Included datasets:

  • TR9856
  • WordRep
  • Google Analogy
  • MSR Analogy
  • SemEval2012
  • AP
  • BLESS
  • Battig
  • ESSLI (2b, 2a, 1c)
  • WS353
  • MTurk
  • RG65
  • RW
  • SimLex999
  • MEN

Note: embeddings are not hosted currently on a proper server, if the download is too slow consider downloading embeddings manually from original sources referred in docstrings.

Dependencies ======

Please see the requirements.txt and pip_requirements.txt file.

Install ======

This package uses setuptools. You can install it running:

python setup.py install

If you have problems during this installation. First you may need to install the dependencies:

pip install -r requirements.txt

If you already have the dependencies listed in requirements.txt installed, to install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

You can also install it in development mode with:

python setup.py develop

Examples

See examples folder.

License

Code is licensed under MIT, however available embeddings distributed within package might be under different license. If you are unsure please reach to authors (references are included in docstrings)

word-embeddings-benchmarks's People

Contributors

kudkudak avatar danielhers avatar lamiane avatar alexandres avatar stonesjtu avatar rafis avatar jlowryduda avatar

Watchers

zhouyonglong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.