Code Monkey home page Code Monkey logo

Comments (4)

bhoppi avatar bhoppi commented on August 10, 2024

Is your input corpus in LIBSVM format?
And what's your command arguments used?

from zen.

razrLeLe avatar razrLeLe commented on August 10, 2024

Yes, I used libsvm format, and the submit command is as follows:
spark-submit --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://goyoo/user/yuyue/log --executor-cores 4 --num-executors 21 --driver-memory 4G --executor-memory 4G --master yarn-client --class com.github.cloudml.zen.examples.ml.LDADriver zen-examples-0.3-SNAPSHOT-spark1.6.1.jar -numPartitions=21 -LDAAlgorithm=LightLDA -numThreads=16 -numTopics=500 -alpha=0.01 -beta=0.01 -alphaAS=1.0 -totalIter=1500 hdfs://goyoo/user/yuyue/10w_doc.libsvm hdfs://goyoo/user/yuyue/zen_10w_result

from zen.

bhoppi avatar bhoppi commented on August 10, 2024
  1. The wrong: -numThreads means number of threads allocated for each partition, so this parameter must be <= --executor-cores, otherwise the job won't start.
  2. May need tune: I don't know how much your corpus is, but if your corpus is very big, --executor-memory 4G may be not enough and you may need increase it if OOM happens.
  3. Other suggestions: -LDAAlgorithm=ZenLDA is the fastest algorithm among all the LDA implementations; -chkptinterval=100 (for example) is needed to do checkpoints every 100 iterations, otherwise your job will be very slow after hundreds of iterations (because driver memory is eaten up by the very long RDD lineage information)

from zen.

razrLeLe avatar razrLeLe commented on August 10, 2024

@bhoppi Thanks so much for your help, I finally find out there is something wrong with pretreatment of the corpus, which caused every word count of every document is zero, and then got the exception.

from zen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.