talkingdata / fregata Goto Github PK
View Code? Open in Web Editor NEWA light weight, super fast, large scale machine learning library on spark .
License: Other
A light weight, super fast, large scale machine learning library on spark .
License: Other
17/02/10 18:20:31 INFO TaskSetManager: Lost task 345.2 in stage 1.0 (TID 18697) on executor worker8.spark.training.m.com: java.lang.NullPointerException (null) [duplicate 908]
Exception in thread "main" 17/02/10 18:20:31 INFO TaskSetManager: Lost task 141.2 in stage 1.0 (TID 18698) on executor worker3.spark.training.m.com: java.lang.NullPointerExce
ption (null) [duplicate 909]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 370 in stage 1.0 failed 4 times, most recent failure: Lost task 370.3 in stage 1.0 (TID 18666, worker4
.spark.training.m.com): java.lang.NullPointerException
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:83)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:73)
at fregata.model.ModelTrainer$$anonfun$run$1.apply$mcVI$sp(ModelTrainer.scala:21)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at fregata.model.ModelTrainer$class.run(ModelTrainer.scala:19)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:73)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:26)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:24)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1127)
at org.apache.spark.rdd.RDD$$anonfun$treeReduce$1.apply(RDD.scala:1058)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.treeReduce(RDD.scala:1036)
at fregata.spark.model.SparkTrainer.run(SparkTrainer.scala:28)
at fregata.spark.model.SparkTrainer$$anonfun$run$1.apply$mcVI$sp(SparkTrainer.scala:15)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at fregata.spark.model.SparkTrainer.run(SparkTrainer.scala:13)
at fregata.spark.model.classification.LogisticRegression$.run(LogisticRegression.scala:29)
at com.meitu.rec.longTermRecTest$.fregata_lr(longTermRecTest.scala:69)
at com.meitu.rec.longTermRecTest$.run(longTermRecTest.scala:59)
at com.meitu.rec.longTermRecTest$$anonfun$main$1.apply(longTermRecTest.scala:49)
at com.meitu.rec.longTermRecTest$$anonfun$main$1.apply(longTermRecTest.scala:48)
at scala.Option.map(Option.scala:145)
at com.meitu.rec.longTermRecTest$.main(longTermRecTest.scala:48)
at com.meitu.rec.longTermRecTest.main(longTermRecTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:83)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:73)
at fregata.model.ModelTrainer$$anonfun$run$1.apply$mcVI$sp(ModelTrainer.scala:21)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at fregata.model.ModelTrainer$class.run(ModelTrainer.scala:19)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:73)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:26)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:24)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I have done this.
张老师:
看了您的"轻量级大规模机器学习算法库Fregata:快速,无需调参",有几个问题想请教一下。
Parameter Server,是李沐实现的那套,还是你自己写的?我就是想问问Parameter Server有没有多种实现。
问一下为什么LogisticRegression.run(trainData, localEpochNum, epochNum)把localEpochNum, epochNum设置成不是1的参数,就会报错
ERROR TaskSetManager: Task 699 in stage 4.0 failed 4 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 699 in stage 4.0 failed 4 times, most recent failure: Lost task 699.3 in stage 4.0: java.lang.NullPointerException
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1443)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1430)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1430)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:810)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:810)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:810)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1652)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1600)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1874)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1994)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1127)
at org.apache.spark.rdd.RDD$$anonfun$treeReduce$1.apply(RDD.scala:1058)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.treeReduce(RDD.scala:1036)
at fregata.spark.model.SparkTrainer.run(SparkTrainer.scala:28)
at fregata.spark.model.SparkTrainer$$anonfun$run$1.apply$mcVI$sp(SparkTrainer.scala:15)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at fregata.spark.model.SparkTrainer.run(SparkTrainer.scala:13)
at fregata.spark.model.classification.LogisticRegression$.run(LogisticRegression.scala:29)
"/Volumes/takun/data/libsvm/a9a",
,"/Volumes/takun/data/libsvm/a9a.t"
/Volumes/takun/data/libsvm/mnist2
I have a question about trainer.ps.set(ws)
at the end of the function "run" in SparkTrainer.scala. If we use ps.set() rather than ps.adjust, the epochNum is meaningless. Do I understand the code correctly?
LBFGS OR SGD ?
here is the stack trace:
java.lang.NullPointerException
at fregata.param.LocalParameterServer$$anonfun$adjust$1.apply$mcVID$sp(Parameter.scala:35)
at fregata.util.VectorUtil$.forV(VectorUtil.scala:44)
at fregata.param.LocalParameterServer.adjust(Parameter.scala:34)
at fregata.optimize.sgd.StochasticGradientDescent$$anonfun$run$1.apply(StochasticGradientDescent.scala:25)
at fregata.optimize.sgd.StochasticGradientDescent$$anonfun$run$1.apply(StochasticGradientDescent.scala:20)
at scala.collection.immutable.Stream.foreach(Stream.scala:594)
at fregata.optimize.sgd.StochasticGradientDescent.run(StochasticGradientDescent.scala:20)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:83)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:74)
at fregata.model.ModelTrainer$$anonfun$run$1.apply$mcVI$sp(ModelTrainer.scala:21)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at fregata.model.ModelTrainer$class.run(ModelTrainer.scala:19)
at fregata.model.classification.LogisticRegression.run(LogisticRegression.scala:74)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:26)
at fregata.spark.model.SparkTrainer$$anonfun$1.apply(SparkTrainer.scala:24)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:766)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:766)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
I have done this.
I notice the calculation of greedy step size in softmax codes is inconsistent with the paper's representation.
I.e., the variable 'probs(yi)' is not equal to 'b_k' of equation(10). Did you miss applying an exponential function to it? The same problem happens to the variable 'product'.
I run the softmax with little data, but error
1 2:1.0 3:1.0
2 1:1.0 2:1.0 3:1.0
3 1:1.0 2:1.0 3:1.0
Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, executor id: 49, host: hadoop-afd-tretw.bj): java.lang.IndexOutOfBoundsException: 2 not in [-2,2)
at breeze.linalg.DenseVector$mcD$sp.apply$mcD$sp(DenseVector.scala:71)
at fregata.util.VectorUtil$$anonfun$wxpb$2.apply$mcVID$sp(VectorUtil.scala:29)
at fregata.util.VectorUtil$.forV(VectorUtil.scala:44)
我的数据集有300万条数据,feature维度为1亿左右,每条数据非零值个数约200,用下面的参数提交,总是报143的错误(内存问题),想请教下你们训练 1 billion x 1 billion 的提交参数,谢谢!
--conf spark.driver.maxResultSize=12g
--conf spark.network.timeout=1200s
--conf spark.executor.heartbeatInterval=1200s
--master yarn-client
--num-executors 1000
--executor-cores 2
--executor-memory 12g
--driver-memory 10g
I trained a model with 2500000 samples and 900 features. But the result Wi were all NaN. I randomly chose 1000 samples from the dataset, and the reault seems ok.
Does it have limitation on the size of dataset or feature values?
Hi
The RDT code example is not working with 0.0.2,
Please help
Is there a way to dump or export model weights?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.