angel-ml / sona Goto Github PK
View Code? Open in Web Editor NEWSpark On Angel, arming Spark with a powerful Parameter Server, which enable Spark to train very big models
License: Apache License 2.0
Spark On Angel, arming Spark with a powerful Parameter Server, which enable Spark to train very big models
License: Apache License 2.0
Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls groups to:
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls groups to:
2019-12-26 14:09:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(deepthought); groups with view permissions: Set(); users with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception:
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO ShutdownHookManager:54 - Shutdown hook called
my SONA-example script:
source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster \
--driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
--keytab /home/deepthought/deepthought.keytab \
--principal deepthought \
--queue longyuan.p0 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.instances=10 \
--conf spark.ps.cores=2 \
--conf spark.ps.memory=6g \
--jars $SONA_SPARK_JARS\
--name "LR-spark-on-angel" \
--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
--driver-memory 10g \
--num-executors 10 \
--executor-cores 2 \
--executor-memory 4g \
--class org.apache.spark.angel.examples.JsonRunnerExamples \
./../lib/angelml-${SONA_VERSION}.jar \
data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
jsonFile:./lr.json \
lr:0.1
and my spark-on-angel-env.sh:
export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2
...<not changed default content below>...```
Originally posted by @wqh17101 in #48 (comment)
log is here log2.txt
get the same error as Angel-ML/angel#827 in the angel 3.0
Please don't worry, if you meet a problem like this:
java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
We are solving this problem.
run the demo of this page
import com.tencent.angel.sona.core.DriverContext
import org.apache.spark.angel.ml.classification.AngelClassifier
import org.apache.spark.angel.ml.feature.LabeledPoint
import org.apache.spark.angel.ml.linalg.Vectors
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrameReader, SparkSession}
val spark = SparkSession.builder()
.master("yarn-cluster")
.appName("AngelClassification")
.getOrCreate()
val sparkConf = spark.sparkContext.getConf
val driverCtx = DriverContext.get(sparkConf)
driverCtx.startAngelAndPSAgent()
val libsvm = spark.read.format("libsvmex")
val dummy = spark.read.format("dummy")
val trainData = libsvm.load("./data/angel/a9a/a9a_123d_train.libsvm")
val classifier = new AngelClassifier()
.setModelJsonFile("./angelml/src/test/jsons/logreg.json")
.setNumClass(2)
.setNumBatch(10)
.setMaxIter(2)
.setLearningRate(0.1)
val model = classifier.fit(trainData)
model.write.overwrite().save("trained_models/lr")
while run this line
val model = classifier.fit(trainData)
scala> val model = classifier.fit(trainData)
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.Vector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.DenseVector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.SparseVector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.Matrix, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.DenseMatrix, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.SparseMatrix, which is already registered.
19/09/03 13:35:45 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12)
java.lang.Exception: Pls. startAngel first!
at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext$lzycompute(ExecutorContext.scala:32)
at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext(ExecutorContext.scala:30)
at com.tencent.angel.sona.core.ExecutorContext$.checkGraphModelPool(ExecutorContext.scala:65)
at com.tencent.angel.sona.core.ExecutorContext$.toGraphModelPool(ExecutorContext.scala:78)
at org.apache.spark.angel.ml.common.Trainer.trainOneBatch(Trainer.scala:43)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1015)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/09/03 13:35:45 WARN TaskSetManager: Lost task 0.0 in stage 12.0 (TID 12, localhost, executor driver): java.lang.Exception: Pls. startAngel first!
at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext$lzycompute(ExecutorContext.scala:32)
at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext(ExecutorContext.scala:30)
at com.tencent.angel.sona.core.ExecutorContext$.checkGraphModelPool(ExecutorContext.scala:65)
at com.tencent.angel.sona.core.ExecutorContext$.toGraphModelPool(ExecutorContext.scala:78)
at org.apache.spark.angel.ml.common.Trainer.trainOneBatch(Trainer.scala:43)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1015)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/09/03 13:35:45 ERROR TaskSetManager: Task 0 in stage 12.0 failed 1 times; aborting job
my spark version is 2.3.0
and run this code in spark-shell
spark-shell \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.instances=10 \
--conf spark.ps.cores=2 \
--conf spark.ps.memory=6g \
--jars $SONA_SPARK_JARS \
--name "demo1" \
--driver-memory 10g \
--num-executors 10 \
--executor-cores 2 \
--executor-memory 4g
错误信息:scala.reflect.internal.MissingRequirementError: object java.lang.Object in compiler mirror not found.
Serialization stack:
- object not serializable (class: com.tencent.angel.ml.math2.utils.LabeledData, value: com.tencent.angel.ml.math2.utils.LabeledData@4da494b8)
2020-05-09 18:05:47 INFO DAGScheduler:54 - Job 0 failed: count at FTRLExample.scala:109, took 0.395223 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: com.tencent.angel.ml.math2.utils.LabeledData
Serialization stack:
- object not serializable (class: com.tencent.angel.ml.math2.utils.LabeledData, value: com.tencent.angel.ml.math2.utils.LabeledData@4da494b8)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
at org.apache.spark.rdd.RDD.count(RDD.scala:1162)
at com.tencent.angel.sona.examples.online_learning.FTRLExample$.train(FTRLExample.scala:109)
at com.tencent.angel.sona.examples.online_learning.FTRLExample$.main(FTRLExample.scala:51)
at com.tencent.angel.sona.examples.online_learning.FTRLExample.main(FTRLExample.scala)
Spark 2.4.4
Hadoop3.1
Angel3.0.1
sona 0.1.0
my pararm:
${SPARK_HOME}/bin/spark-submit
--master yarn
--deploy-mode client
--conf spark.ps.jars=${SONA_ANGEL_JARS}
--conf spark.ps.instances=1
--conf spark.ps.cores=1
--conf spark.ps.memory=2g
--conf spark.yarn.queue=default
--jars ${SONA_SPARK_JARS}
--name "LR-spark-on-angel"
--files ${SONA_HOME}/bin/lr.json
--driver-memory 2g
--num-executors 2
--executor-cores 2
--executor-memory 3g
--class org.apache.spark.angel.examples.JsonRunnerExamples
${SONA_HOME}/lib/angelml-${SONA_VERSION}.jar
data:/src/sona/data/angel/a9a/a9a_123d_train.libsvm
modelPath:/test/spark-output-model
jsonFile:./lr.json
lr:0.1
Have the below error log:
2019-10-16 06:28:18,138 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on cc442b7f4080:37167 (size: 8.2 KB, free: 1450.3 MB)
2019-10-16 06:28:18,188 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on cc442b7f4080:37167 (size: 1844.0 B, free: 1450.3 MB)
2019-10-16 06:28:18,214 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-16 06:28:18,215 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 2.0 (TID 3, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,227 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 2.0 (TID 3) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 1]
2019-10-16 06:28:18,227 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 2.0 (TID 4, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,237 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 2.0 (TID 4) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 2]
2019-10-16 06:28:18,238 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,247 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 2.0 (TID 5) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 3]
2019-10-16 06:28:18,248 ERROR scheduler.TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
2019-10-16 06:28:18,249 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
2019-10-16 06:28:18,251 INFO cluster.YarnClusterScheduler: Cancelling stage 2
2019-10-16 06:28:18,251 INFO cluster.YarnClusterScheduler: Killing all running tasks in stage 2: Stage cancelled
2019-10-16 06:28:18,252 INFO scheduler.DAGScheduler: ResultStage 2 (reduce at AngelClassifier.scala:167) failed in 0.132 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
2019-10-16 06:28:18,253 INFO scheduler.DAGScheduler: Job 2 failed: reduce at AngelClassifier.scala:167, took 0.136431 s
2019-10-16 06:28:18,254 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-16 06:28:18,258 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
)
2019-10-16 06:28:18,265 INFO spark.SparkContext: Invoking stop() from shutdown hook
2019-10-16 06:28:18,269 INFO server.AbstractConnector: Stopped Spark@3f71da43{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
2019-10-16 06:28:18,270 INFO ui.SparkUI: Stopped Spark web UI at http://cc442b7f4080:34269
2019-10-16 06:28:18,273 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
2019-10-16 06:28:18,275 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
2019-10-16 06:28:18,275 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2019-10-16 06:28:18,278 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
2019-10-16 06:28:18,336 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2019-10-16 06:28:18,376 INFO memory.MemoryStore: MemoryStore cleared
2019-10-16 06:28:18,376 INFO storage.BlockManager: BlockManager stopped
2019-10-16 06:28:18,377 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2019-10-16 06:28:18,379 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2019-10-16 06:28:18,389 INFO spark.SparkContext: Successfully stopped SparkContext
2019-10-16 06:28:18,391 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
)
2019-10-16 06:28:18,400 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
2019-10-16 06:28:18,502 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://localhost:9000/user/root/.sparkStaging/application_1571206972109_0001
2019-10-16 06:28:18,527 INFO util.ShutdownHookManager: Shutdown hook called
2019-10-16 06:28:18,527 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1571206972109_0001/spark-04921a2c-c3eb-4a0d-be35-808fccf8473f
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop heartbeat thread!
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop op log merger
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop clock cache
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop matrix cache
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop user request adapater
2019-10-16 06:28:18,531 INFO psagent.PSAgent: stop rpc dispacher
2019-10-16 06:28:18,531 INFO transport.ChannelManager2: Channel manager stop
2019-10-16 06:28:18,540 INFO broadcast.TorrentBroadcast: Destroying Broadcast(3) (from destroy at DriverContext.scala:128)
2019-10-16 06:28:18,540 WARN util.ShutdownHookManager: ShutdownHook '$anon$1' failed, java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
at org.apache.spark.broadcast.TorrentBroadcast.doDestroy(TorrentBroadcast.scala:198)
at org.apache.spark.broadcast.Broadcast.destroy(Broadcast.scala:111)
at org.apache.spark.broadcast.Broadcast.destroy(Broadcast.scala:98)
at com.tencent.angel.sona.core.DriverContext$$anonfun$com$tencent$angel$sona$core$DriverContext$$doStopAngel$1.apply(DriverContext.scala:128)
at com.tencent.angel.sona.core.DriverContext$$anonfun$com$tencent$angel$sona$core$DriverContext$$doStopAngel$1.apply(DriverContext.scala:125)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at com.tencent.angel.sona.core.DriverContext.com$tencent$angel$sona$core$DriverContext$$doStopAngel(DriverContext.scala:125)
at com.tencent.angel.sona.core.DriverContext$$anon$1.run(DriverContext.scala:90)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
The binning process in GBDT requires to cache two copies of datasets, which is inefficient for memory-limited users.
Implement a two-phase data loading method, which is more memory efficient.
请问sona提交到maven**仓库了吗?
spark-submit --master yarn-cluster --conf spark.ps.jars=hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar
--conf spark.ps.instances=2 --conf spark.ps.cores=3 --conf spark.ps.memory=5g
--jars hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar
--files ./deepfm.json --driver-memory 2g --num-executors 2 --executor-cores 3 --executor-memory 5g
--class com.tencent.angel.sona.examples.JsonRunnerExamples
../lib/angelml-0.1.0.jar
jsonFile:./deepfm.json
dataFormat:libsvm
data:a9a_123d_train.libsvm
modelPath:model_dfm
predictPath:pred_dfm
actionType:train
numBatch:500
maxIter:2
lr:4.0
numField:39
This is my submit command.
Both wide&deep and deepFm give this error. Looking forward to your help?
java.lang.NullPointerException
at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:512)
at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
at sun.misc.URLClassPath.getResource(URLClassPath.java:238)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at net.qihoo.spinner.HdfsClassLoader.loadClass(HdfsClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:70)
at net.qihoo.spinner.SpinnerViewFs.initializeProperties(SpinnerViewFs.java:18)
at net.qihoo.spinner.SpinnerDistributedFileSystem.initialize(SpinnerDistributedFileSystem.java:47)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2689)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$cleanupStagingDirInternal$1(Client.scala:199)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:217)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:182)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1523)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:880)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:93)
at net.qihoo.spinner.SpinnerViewFs.initializeProperties(SpinnerViewFs.java:18)
at net.qihoo.spinner.SpinnerDistributedFileSystem.initialize(SpinnerDistributedFileSystem.java:47)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2689)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$cleanupStagingDirInternal$1(Client.scala:199)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:217)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:182)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1523)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:880)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:512)
at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
at sun.misc.URLClassPath.getResource(URLClassPath.java:238)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at net.qihoo.spinner.HdfsClassLoader.loadClass(HdfsClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:70)
... 15 more
想参与sona的研发,如何参与,我现在正在用我们的angel,社区更新慢,想参与进来。
Need I update the angel-mlcore
library or json4s
library?
An exception or error caused a run to abort: org.json4s.native.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
java.lang.NoSuchMethodError: org.json4s.native.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
at com.tencent.angel.ml.core.utils.JsonUtils$.parseAndUpdateJson(JsonUtils.scala:400)
at org.apache.spark.angelml.param.AngelGraphParams$class.updateFromJson(AngelGraphParams.scala:31)
at org.apache.spark.angelml.classification.AngelClassifier.updateFromJson(AngelClassifier.scala:31)
at org.apache.spark.angelml.classification.AngelClassifier.updateFromJson(AngelClassifier.scala:31)
at org.apache.spark.angelml.param.ParamsHelper$class.finalizeConf(ParamsHelper.scala:43)
at org.apache.spark.angelml.classification.AngelClassifier.finalizeConf(AngelClassifier.scala:31)
at org.apache.spark.angelml.classification.AngelClassifier.train(AngelClassifier.scala:177)
at org.apache.spark.angelml.classification.AngelClassifier.train(AngelClassifier.scala:31)
at org.apache.spark.angelml.Predictor.fit(Predictor.scala:118)
at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply$mcV$sp(AngelClassificationSuite.scala:31)
at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply(AngelClassificationSuite.scala:18)
at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply(AngelClassificationSuite.scala:18)
[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.updateMaster(AngelYarnClient.java:517)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:170)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.sona.core.DriverContext.startAngelAndPSAgent(DriverContext.scala:97)
at com.tencent.angel.sona.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:69)
at com.tencent.angel.sona.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" com.tencent.angel.exception.AngelException: java.io.IOException: Failed to run job : Application application_1591113812497_103797 failed 2times (global limit =3; local limit is =2) due to AM Container for appattempt_1591113812497_103797_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-25 22:33:23.063]Exception from container-launch.
Container id: container_e74_1591113812497_103797_02_000001
Exit code: 1
[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:176)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.sona.core.DriverContext.startAngelAndPSAgent(DriverContext.scala:97)
at com.tencent.angel.sona.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:69)
at com.tencent.angel.sona.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to run job : Application application_1591113812497_103797 failed 2 times (global limit =3; local limit is =2) due to AM Container for appattempt_1591113812497_103797_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-25 22:33:23.063]Exception from container-launch.
Container id: container_e74_1591113812497_103797_02_000001
Exit code: 1
[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0
For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.updateMaster(AngelYarnClient.java:517)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:170)
... 16 more
20/06/25 22:33:15 INFO spark.SparkContext: Invoking stop() from shutdown hook
20/06/25 22:33:15 INFO server.AbstractConnector: Stopped Spark@7a389761{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
20/06/25 22:33:15 INFO ui.SparkUI: Stopped Spark web UI at http://ecs-hn1a-xng-alg-rcmd-edg-1:4041
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
20/06/25 22:33:15 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/06/25 22:33:15 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Stopped
20/06/25 22:33:15 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/25 22:33:15 INFO memory.MemoryStore: MemoryStore cleared
20/06/25 22:33:15 INFO storage.BlockManager: BlockManager stopped
20/06/25 22:33:15 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/06/25 22:33:15 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/25 22:33:15 INFO spark.SparkContext: Successfully stopped SparkContext
20/06/25 22:33:15 INFO util.ShutdownHookManager: Shutdown hook called
20/06/25 22:33:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a689fb57-937d-4d2b-bf50-f59d08e9b6b8
20/06/25 22:33:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2d255dca-b9e1-45ad-ba9d-6b8beae996e3
20/06/25 22:33:15 INFO client.AngelClient: stop the application
20/06/25 22:33:15 INFO client.AngelClient: master is null, just kill the application
20/06/25 22:33:15 INFO impl.YarnClientImpl: Killed application application_1591113812497_103797
sona jar 提交到maven **仓库,方便使用
when the partition number of the training set is 10, the AUC is 0.67;But when the partition number of the training set is 100, the AUC is 0.51. why?
有Spark streaming 在线机器学习的例子吗,是否支持呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.