Code Monkey home page Code Monkey logo

mcsg's Introduction

ICML-2019 Supporting Code for Submission #960 (Model Comparison for Semantic Grouping)

This repo consits of a slight set of modifications for SentEval to reproduce results in accepted submission #960 @ICML2019. Manuscript can be found here.

NOTE: This repo is a fork of SentEval. Any commits prior to 2019 are not associated to publication #960 @ICML2019 .

Dependencies

This code is written in python. The dependencies are:

Download datasets

To get all the transfer tasks datasets, run (in data/) using Bash >= 4.0:

./get_transfer_data_ptb.bash

This will automatically download and preprocess the datasets, and store them in data/senteval_data (warning: for MacOS users, you may have to use p7zip instead of unzip). Note: we provide PTB or MOSES tokenization.

WARNING: Extracting the MRPC MSI file requires the "cabextract" command line (i.e apt-get/yum install cabextract).

This will also download glove.840B.300d.txt and enwiki_vocab_min200.txt (The SIF frequencies from Arora et al. 2016).

To download the other word vectors please go to GoogleNews-word2vec and FastText, then convert binary files into the same text file format as glove.840B.300d.txt and place in /data/word_vectors. We could not upload them to GitHub since they are above the allowed disk-quota.

Reproduce Results for Submission #960

In order to reproduce results please run:

cd examples
python arora.py # To reproduce Arora et al. (2016)'s SIF+PCA results
python gaussian.py # To reproduce our Gaussian-AIC/TIC results
python vmf.py  # To reproduce our vMF-AIC/TIC results

This will reproduce the results for glove.840B.300d.txt and potentially crash afterwards if you have not downloaded the other word vectors.

Similarity code

The entire codebase connected to the similarity metrics described in the paper is encapsulated in the similarity folder. This is where the core contributions of our work are.

mcsg's People

Contributors

aconneau avatar franciscovargas avatar jegou avatar kamenbrestnichki avatar keksimusprime avatar oroszgy avatar sojvai avatar tscheepers avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.