Code Monkey home page Code Monkey logo

word2vec's People

Contributors

siegfang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

word2vec's Issues

negative sampling的实现

想问下你为什么也没有实现negative sampling呢?在NLPchina看到你最早提出的那个issue,我也有同样的问题。。

sentence2vec请教

请问你有研究过python格式的sentence2vec吗?我想改为java格式的,但是不知道sentence2vec与word2vec的原理是否一样,希望大神帮忙指点。

OutOfMemoryError: Java heap space

多谢提供Java版本。 我用原来C版本的 text8 试了一下,得到以下抱怨 (用C版本没有问提,只用了一分钟):
Hi Good work in making a java version. Thanks.
I run the testing data text8 from the original word2vec release and gotten the following error (running the original c code on the same dataset was fine, it only took about 1 minute :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.(String.java:203)
at java.lang.String.substring(String.java:1913)
at java.lang.String.split(String.java:2288)
at java.lang.String.split(String.java:2355)
at org.nlp.util.Tokenizer.(Tokenizer.java:31)
at test.TestWord2Vec.readByJava(TestWord2Vec.java:29)
at test.TestWord2Vec.main(TestWord2Vec.java:61)

用了-Xmx2G text8<=85MB ok, >85 MB, 还是有OutOfMemory 问题。

多谢指教。试过不同大小文件和 -Xmx2G, <= 85 MB 时没问题. 试了86 MB 开始OutOfMemory.
多谢文章,会拜读。

Tried different sizes of text8 file, when it is <= 85 MB, there is no problem, but tried 86 MB size, gotten OutOfMemory: Java heap space

java -Xmx2G test/TestWord2Vec text8_86000000 > output_86000000
Apr 04, 2014 1:26:44 AM org.nlp.vec.Word2Vec readTokens
INFO: create temp file successfully in/home/aura/projects/deepLearning_java/word2vec/src/temp/tempCorpus1287275066488351187.txt
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at org.nlp.util.Tokenizer.toString(Tokenizer.java:89)
at org.nlp.vec.Word2Vec.readTokens(Word2Vec.java:181)
at test.TestWord2Vec.readByJava(TestWord2Vec.java:29)
at test.TestWord2Vec.main(TestWord2Vec.java:66)

请问word2vec 生成的bin文件如何打开?

我是在ubuntu上使用word2vec,按照教程使用 ./word2vec -train resultbig.txt -output vectors.bin -cbow 0 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1 生成了vectors.bin 请问word2vec 生成的bin文件如何打开?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.