siegfang / word2vec Goto Github PK
View Code? Open in Web Editor NEWword2vec的Java并行实现
word2vec的Java并行实现
想问下你为什么也没有实现negative sampling呢?在NLPchina看到你最早提出的那个issue,我也有同样的问题。。
程序一直运行不通过,说是“word2vec的词向量为空,请先训练模型。”,我不知道哪里出了问题
请问你有研究过python格式的sentence2vec吗?我想改为java格式的,但是不知道sentence2vec与word2vec的原理是否一样,希望大神帮忙指点。
如何?
多谢提供Java版本。 我用原来C版本的 text8 试了一下,得到以下抱怨 (用C版本没有问提,只用了一分钟):
Hi Good work in making a java version. Thanks.
I run the testing data text8 from the original word2vec release and gotten the following error (running the original c code on the same dataset was fine, it only took about 1 minute :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.(String.java:203)
at java.lang.String.substring(String.java:1913)
at java.lang.String.split(String.java:2288)
at java.lang.String.split(String.java:2355)
at org.nlp.util.Tokenizer.(Tokenizer.java:31)
at test.TestWord2Vec.readByJava(TestWord2Vec.java:29)
at test.TestWord2Vec.main(TestWord2Vec.java:61)
请问测试代码中的D:/data/corpus.dat" 和 "D:/data/corpus.nn";这两个文件能给提供下吗?
多谢指教。试过不同大小文件和 -Xmx2G, <= 85 MB 时没问题. 试了86 MB 开始OutOfMemory.
多谢文章,会拜读。
Tried different sizes of text8 file, when it is <= 85 MB, there is no problem, but tried 86 MB size, gotten OutOfMemory: Java heap space
java -Xmx2G test/TestWord2Vec text8_86000000 > output_86000000
Apr 04, 2014 1:26:44 AM org.nlp.vec.Word2Vec readTokens
INFO: create temp file successfully in/home/aura/projects/deepLearning_java/word2vec/src/temp/tempCorpus1287275066488351187.txt
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at org.nlp.util.Tokenizer.toString(Tokenizer.java:89)
at org.nlp.vec.Word2Vec.readTokens(Word2Vec.java:181)
at test.TestWord2Vec.readByJava(TestWord2Vec.java:29)
at test.TestWord2Vec.main(TestWord2Vec.java:66)
HuffmanNeuron outNext = (HuffmanNeuron) pathNeurons.get(neuronIndex+1);
double g = (1 - outNext.code - f) * alpha;
请问你做过句子向量吗?
Could you pls upload the training data?
我是在ubuntu上使用word2vec,按照教程使用 ./word2vec -train resultbig.txt -output vectors.bin -cbow 0 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1 生成了vectors.bin 请问word2vec 生成的bin文件如何打开?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.