Code Monkey home page Code Monkey logo

umwe's People

Contributors

ccsasuke avatar glample avatar prabhakar267 avatar sufuf3 avatar swapnil3597 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

umwe's Issues

Impossible to import Faiss-GPU

I use conda install faiss-gpu -c pytorch to install faiss-gpu. My version is 1.4.0 and I encountered the above warning when running training scripts. But I can see that my python process was using GPU. Could you give me some suggestions?

What is your training time using GPU and CPU?

Thanks.

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Hi! I am learning about UMWE paper and trying to re-producing results from this model.
I got errors as below:

.......................
INFO - 02/27/19 10:02:36 - 0:34:57 - 1500 source words - csls_knn_10 - Precision at k = 10: 83.533333
INFO - 02/27/19 10:02:36 - 0:34:57 - word translation precision@1: 68.61333
INFO - 02/27/19 10:03:09 - 0:35:30 - Loaded europarl de-en (773704 sentences).
Traceback (most recent call last):
File "unsupervised.py", line 161, in <module>
evaluator.all_eval(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 335, in all_eval
self.sent_translation(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 228, in sent_translation
method=method, idf=idf
File "/multi_embedd/umwe/src/evaluation/sent_translation.py", line 74, in get_sent_translation_accuracy
emb2 = emb2.cpu().numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Could anyone give me advice or comment?
Thanks.

Seg fault in eval with faiss-gpu

I'm trying to run unsupervised.py, and I consistently get a seg fault in the word translation evaluation unless I drop faiss. The version of faiss I'm using is:

Name Version Build Channel
faiss-gpu 1.4.0 py36_cuda8.0.61_1 pytorch

The command I'm running (with few languages and small epoch size in the hope that it might run then) is:

python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries

And the error looks like:

INFO - 10/16/18 20:01:17 - 0:01:06 - Found 2416 pairs of words in /tmp/umwe_190103/umwe-data/crosslingual/dictionaries/es-en.5000-6500.txt (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
unsupervised2.sh: line 36: 190190 Segmentation fault (core dumped) python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries

Any suggestions welcome!

MULTILINGUAL PSEUDO-SUPERVISED REFINEMENT : AssertionError

Hi, I got this error when training a multilingual embeddings using this command line :
python unsupervised.py" --exp_path $res_dir --dis_most_frequent 0 --n_epochs 30 --device cuda --export txt --src_langs fr de --tgt_lang en --emb_dim 100 --max_vocab -1 --src_embs $fr_em $de_em --tgt_emb $de_em

Do you have an idea please how can I solve this problem.
Thank you

  • ----> MULTILINGUAL PSEUDO-SUPERVISED REFINEMENT <----

INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best fr to en model from /debug/19k7t4cexy/best_mapping_fr2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best de to en model from /debug/19k7t4cexy/best_mapping_de2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - Starting refinement iteration 0...
INFO - 04/06/20 18:00:10 - 0:15:59 - Building the train dictionary ...
Traceback (most recent call last):
File "/tools/git/umwe/unsupervised.py", line 190, in
trainer.build_dictionary()
File "/tools/git/umwe/src/trainer.py", line 229, in build_dictionary
self.dicos[(lang1, lang2)] = build_dictionary(src_emb, tgt_emb, self.params)
File "/tools/git/umwe/src/dico_builder.py", line 157, in build_dictionary
s2t_candidates = get_candidates(src_emb, tgt_emb, params)
File "/tools/git/umwe/src/dico_builder.py", line 109, in get_candidates
assert all_scores.size() == all_pairs.size() == (n_src, 2)
AssertionError

train on other embeddings

Could this be used to map words to another embeddings than fasttext, something like paragram embeddings? Would anything have to change?

assertion error assert os.path.isfile(params.tgt_emb)

I am getting this error:

File "unsupervised.py", line 110, in assert os.path.isfile(params.tgt_emb) AssertionError

I have placed embeddings in a directory and I am using this command:

python unsupervised.py --src_embs /data_disk/embeddings/crawl-300d-2M-subword.vec --src_langs en --tgt_lang fr --exp_path /data_disk/umwe/data

I tried --emb_dir but its not part of the options.

Not using multiple GPUs

Hi
I have 2 GPUs available. When running this code it is using only one GPU and other is in ideal state. I am getting warning about memory on GPU1. Is this code capable of using multiple GPUs?

WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size

Thanks

"cross-lingual word similarity score average" is nan

moved over from email correspondence

Having an issue when trying to map fastText-provided "fr" and "tl" source embeddings to target "en" with the following command:

python unsupervised.py --dis_most_frequent 50000 --src_langs fr tl --tgt_lang en

Everything runs (and ends) smoothly, but the "cross-lingual word similarity score average" is always "nan" after every iteration. I've tried with several different language combinations and I never see a value. The train log is attached.
train.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.