ccsasuke / umwe Goto Github PK
View Code? Open in Web Editor NEWThis project forked from facebookresearch/muse
Unsupervised Multilingual Word Embeddings (EMNLP 2018)
Home Page: https://arxiv.org/abs/1808.08933
License: Other
This project forked from facebookresearch/muse
Unsupervised Multilingual Word Embeddings (EMNLP 2018)
Home Page: https://arxiv.org/abs/1808.08933
License: Other
I use conda install faiss-gpu -c pytorch
to install faiss-gpu. My version is 1.4.0 and I encountered the above warning when running training scripts. But I can see that my python process was using GPU. Could you give me some suggestions?
What is your training time using GPU and CPU?
Thanks.
Hi! I am learning about UMWE paper and trying to re-producing results from this model.
I got errors as below:
.......................
INFO - 02/27/19 10:02:36 - 0:34:57 - 1500 source words - csls_knn_10 - Precision at k = 10: 83.533333
INFO - 02/27/19 10:02:36 - 0:34:57 - word translation precision@1: 68.61333
INFO - 02/27/19 10:03:09 - 0:35:30 - Loaded europarl de-en (773704 sentences).
Traceback (most recent call last):
File "unsupervised.py", line 161, in <module>
evaluator.all_eval(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 335, in all_eval
self.sent_translation(to_log)
File "/multi_embedd/umwe/src/evaluation/evaluator.py", line 228, in sent_translation
method=method, idf=idf
File "/multi_embedd/umwe/src/evaluation/sent_translation.py", line 74, in get_sent_translation_accuracy
emb2 = emb2.cpu().numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
Could anyone give me advice or comment?
Thanks.
I'm trying to run unsupervised.py, and I consistently get a seg fault in the word translation evaluation unless I drop faiss. The version of faiss I'm using is:
Name Version Build Channel
faiss-gpu 1.4.0 py36_cuda8.0.61_1 pytorch
The command I'm running (with few languages and small epoch size in the hope that it might run then) is:
python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries
And the error looks like:
INFO - 10/16/18 20:01:17 - 0:01:06 - Found 2416 pairs of words in /tmp/umwe_190103/umwe-data/crosslingual/dictionaries/es-en.5000-6500.txt (1500 unique). 0 other pairs contained at least one unknown word (0 in lang1, 0 in lang2)
unsupervised2.sh: line 36: 190190 Segmentation fault (core dumped) python unsupervised.py --src_langs es fr --tgt_lang en --epoch_size 10000 --exp_path /tmp/umwe_190103/ --emb_dir /tmp/umwe_190103/umwe-data/fasttext_vectors --dico_eval /tmp/umwe_190103/umwe-data/crosslingual/dictionaries
Any suggestions welcome!
Hi, I got this error when training a multilingual embeddings using this command line :
python unsupervised.py" --exp_path $res_dir --dis_most_frequent 0 --n_epochs 30 --device cuda --export txt --src_langs fr de --tgt_lang en --emb_dim 100 --max_vocab -1 --src_embs $fr_em $de_em --tgt_emb $de_em
Do you have an idea please how can I solve this problem.
Thank you
INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best fr to en model from /debug/19k7t4cexy/best_mapping_fr2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - * Reloading the best de to en model from /debug/19k7t4cexy/best_mapping_de2en.t7 ...
INFO - 04/06/20 18:00:10 - 0:15:59 - Starting refinement iteration 0...
INFO - 04/06/20 18:00:10 - 0:15:59 - Building the train dictionary ...
Traceback (most recent call last):
File "/tools/git/umwe/unsupervised.py", line 190, in
trainer.build_dictionary()
File "/tools/git/umwe/src/trainer.py", line 229, in build_dictionary
self.dicos[(lang1, lang2)] = build_dictionary(src_emb, tgt_emb, self.params)
File "/tools/git/umwe/src/dico_builder.py", line 157, in build_dictionary
s2t_candidates = get_candidates(src_emb, tgt_emb, params)
File "/tools/git/umwe/src/dico_builder.py", line 109, in get_candidates
assert all_scores.size() == all_pairs.size() == (n_src, 2)
AssertionError
Could this be used to map words to another embeddings than fasttext, something like paragram embeddings? Would anything have to change?
I am getting this error:
File "unsupervised.py", line 110, in assert os.path.isfile(params.tgt_emb) AssertionError
I have placed embeddings in a directory and I am using this command:
python unsupervised.py --src_embs /data_disk/embeddings/crawl-300d-2M-subword.vec --src_langs en --tgt_lang fr --exp_path /data_disk/umwe/data
I tried --emb_dir but its not part of the options.
Hi
I have 2 GPUs available. When running this code it is using only one GPU and other is in ideal state. I am getting warning about memory on GPU1. Is this code capable of using multiple GPUs?
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size
Thanks
moved over from email correspondence
Having an issue when trying to map fastText-provided "fr" and "tl" source embeddings to target "en" with the following command:
python unsupervised.py --dis_most_frequent 50000 --src_langs fr tl --tgt_lang en
Everything runs (and ends) smoothly, but the "cross-lingual word similarity score average" is always "nan" after every iteration. I've tried with several different language combinations and I never see a value. The train log is attached.
train.log
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.