ccsasuke / umwe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookresearch/muse

82.0 82.0 17.0 129 KB

Unsupervised Multilingual Word Embeddings (EMNLP 2018)

Home Page: https://arxiv.org/abs/1808.08933

License: Other

Shell 2.72% Jupyter Notebook 30.32% Python 66.95%

umwe's People

Contributors

Stargazers

Watchers

Forkers

zhenyangiacas zjulins peithous yuscz gentaiscool asadullah797 intervieweb sofiaaparicio xabikrant selimseker amal-hasni yangshao hkboo mjdaoudi bgonzalezbustamante atnafuatx

umwe's Issues

Impossible to import Faiss-GPU

I use conda install faiss-gpu -c pytorch to install faiss-gpu. My version is 1.4.0 and I encountered the above warning when running training scripts. But I can see that my python process was using GPU. Could you give me some suggestions?

What is your training time using GPU and CPU?

Thanks.

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Hi! I am learning about UMWE paper and trying to re-producing results from this model.
I got errors as below:

.......................
INFO - 02/27/19 10:02:36 - 0:34:57 - 1500 source words - csls_knn_10 - Precision at k = 10: 83.533333
INFO - 02/27/19 10:02:36 - 0:34:57 - word translation precision@1: 68.61333
INFO - 02/27/19 10:03:09 - 0:35:30 - Loaded europarl de-en (773704 sentences).
Traceback (most recent call last):
File "unsupervised.py", line 161, in <module>
evaluator.all_eval(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 335, in all_eval
self.sent_translation(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 228, in sent_translation
method=method, idf=idf
File "/multi_embedd/umwe/src/evaluation/sent_translation.py", line 74, in get_sent_translation_accuracy
emb2 = emb2.cpu().numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Could anyone give me advice or comment?
Thanks.

Seg fault in eval with faiss-gpu

I'm trying to run unsupervised.py, and I consistently get a seg fault in the word translation evaluation unless I drop faiss. The version of faiss I'm using is:

Name Version Build Channel
faiss-gpu 1.4.0 py36_cuda8.0.61_1 pytorch

The command I'm running (with few languages and small epoch size in the hope that it might run then) is:

python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries

And the error looks like:

INFO - 10/16/18 20:01:17 - 0:01:06 - Found 2416 pairs of words in /tmp/umwe_190103/umwe-data/crosslingual/dictionaries/es-en.5000-6500.txt (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
unsupervised2.sh: line 36: 190190 Segmentation fault (core dumped) python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries

Any suggestions welcome!

MULTILINGUAL PSEUDO-SUPERVISED REFINEMENT : AssertionError

Hi, I got this error when training a multilingual embeddings using this command line :
python unsupervised.py" --exp_path $res_dir --dis_most_frequent 0 --n_epochs 30 --device cuda --export txt --src_langs fr de --tgt_lang en --emb_dim 100 --max_vocab -1 --src_embs $fr_em $de_em --tgt_emb $de_em

Do you have an idea please how can I solve this problem.
Thank you

----> MULTILINGUAL PSEUDO-SUPERVISED REFINEMENT <----

INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best fr to en model from /debug/19k7t4cexy/best_mapping_fr2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best de to en model from /debug/19k7t4cexy/best_mapping_de2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - Starting refinement iteration 0...
INFO - 04/06/20 18:00:10 - 0:15:59 - Building the train dictionary ...
Traceback (most recent call last):
File "/tools/git/umwe/unsupervised.py", line 190, in
trainer.build_dictionary()
File "/tools/git/umwe/src/trainer.py", line 229, in build_dictionary
self.dicos[(lang1, lang2)] = build_dictionary(src_emb, tgt_emb, self.params)
File "/tools/git/umwe/src/dico_builder.py", line 157, in build_dictionary
s2t_candidates = get_candidates(src_emb, tgt_emb, params)
File "/tools/git/umwe/src/dico_builder.py", line 109, in get_candidates
assert all_scores.size() == all_pairs.size() == (n_src, 2)
AssertionError

train on other embeddings

Could this be used to map words to another embeddings than fasttext, something like paragram embeddings? Would anything have to change?

assertion error assert os.path.isfile(params.tgt_emb)

I am getting this error:

File "unsupervised.py", line 110, in assert os.path.isfile(params.tgt_emb) AssertionError

I have placed embeddings in a directory and I am using this command:

python unsupervised.py --src_embs /data_disk/embeddings/crawl-300d-2M-subword.vec --src_langs en --tgt_lang fr --exp_path /data_disk/umwe/data

I tried --emb_dir but its not part of the options.

Not using multiple GPUs

Hi
I have 2 GPUs available. When running this code it is using only one GPU and other is in ideal state. I am getting warning about memory on GPU1. Is this code capable of using multiple GPUs?

WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size

Thanks

"cross-lingual word similarity score average" is nan

moved over from email correspondence

Having an issue when trying to map fastText-provided "fr" and "tl" source embeddings to target "en" with the following command:

python unsupervised.py --dis_most_frequent 50000 --src_langs fr tl --tgt_lang en

Everything runs (and ends) smoothly, but the "cross-lingual word similarity score average" is always "nan" after every iteration. I've tried with several different language combinations and I never see a value. The train log is attached.
train.log

ccsasuke / umwe Goto Github PK

umwe's People

Contributors

Stargazers

Watchers

Forkers

umwe's Issues

Impossible to import Faiss-GPU

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Seg fault in eval with faiss-gpu

MULTILINGUAL PSEUDO-SUPERVISED REFINEMENT : AssertionError

train on other embeddings

assertion error assert os.path.isfile(params.tgt_emb)

Not using multiple GPUs

"cross-lingual word similarity score average" is nan

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent