Code Monkey home page Code Monkey logo

sona's People

Contributors

bluesjjw avatar ccchengff avatar danleivv avatar endymecy avatar ouyangwen-it avatar paynie avatar rachelsunrh avatar strikew avatar wangcaihua avatar xs-li avatar xujie32 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sona's Issues

run demo of sona latest version bug

Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~

2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(deepthought); groups with view permissions: Set(); users  with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO  UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO  ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN  Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO  AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: 
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO  ShutdownHookManager:54 - Shutdown hook called

my SONA-example script:

source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop

$SPARK_HOME/bin/spark-submit \
        --master yarn-cluster \
        --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
        --keytab /home/deepthought/deepthought.keytab \
        --principal deepthought \
        --queue longyuan.p0 \
	--conf spark.ps.jars=$SONA_ANGEL_JARS \
	--conf spark.ps.instances=10 \
	--conf spark.ps.cores=2 \
	--conf spark.ps.memory=6g \
	--jars $SONA_SPARK_JARS\
	--name "LR-spark-on-angel" \
	--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
	--driver-memory 10g \
	--num-executors 10 \
	--executor-cores 2 \
	--executor-memory 4g \
	--class org.apache.spark.angel.examples.JsonRunnerExamples \
	./../lib/angelml-${SONA_VERSION}.jar \
	data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
	modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
	jsonFile:./lr.json \
	lr:0.1

and my spark-on-angel-env.sh:

export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2

...<not changed default content below>...```

running model fit func error

run the demo of this page

import com.tencent.angel.sona.core.DriverContext
import org.apache.spark.angel.ml.classification.AngelClassifier
import org.apache.spark.angel.ml.feature.LabeledPoint
import org.apache.spark.angel.ml.linalg.Vectors
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrameReader, SparkSession}

val spark = SparkSession.builder()
  .master("yarn-cluster")
  .appName("AngelClassification")
  .getOrCreate()

val sparkConf = spark.sparkContext.getConf
val driverCtx = DriverContext.get(sparkConf)

driverCtx.startAngelAndPSAgent()

val libsvm = spark.read.format("libsvmex")
val dummy = spark.read.format("dummy")

val trainData = libsvm.load("./data/angel/a9a/a9a_123d_train.libsvm")

val classifier = new AngelClassifier()
  .setModelJsonFile("./angelml/src/test/jsons/logreg.json")
  .setNumClass(2)
  .setNumBatch(10)
  .setMaxIter(2)
  .setLearningRate(0.1)

val model = classifier.fit(trainData)

model.write.overwrite().save("trained_models/lr")

while run this line

val model = classifier.fit(trainData)
scala> val model = classifier.fit(trainData)
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.Vector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.DenseVector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.SparseVector, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.Matrix, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.DenseMatrix, which is already registered.
19/09/03 13:35:44 WARN UDTRegistration: Cannot register UDT for org.apache.spark.angel.ml.linalg.SparseMatrix, which is already registered.
19/09/03 13:35:45 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12)
java.lang.Exception: Pls. startAngel first!
	at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext$lzycompute(ExecutorContext.scala:32)
	at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext(ExecutorContext.scala:30)
	at com.tencent.angel.sona.core.ExecutorContext$.checkGraphModelPool(ExecutorContext.scala:65)
	at com.tencent.angel.sona.core.ExecutorContext$.toGraphModelPool(ExecutorContext.scala:78)
	at org.apache.spark.angel.ml.common.Trainer.trainOneBatch(Trainer.scala:43)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
	at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1015)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
19/09/03 13:35:45 WARN TaskSetManager: Lost task 0.0 in stage 12.0 (TID 12, localhost, executor driver): java.lang.Exception: Pls. startAngel first!
	at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext$lzycompute(ExecutorContext.scala:32)
	at com.tencent.angel.sona.core.ExecutorContext.sparkWorkerContext(ExecutorContext.scala:30)
	at com.tencent.angel.sona.core.ExecutorContext$.checkGraphModelPool(ExecutorContext.scala:65)
	at com.tencent.angel.sona.core.ExecutorContext$.toGraphModelPool(ExecutorContext.scala:78)
	at org.apache.spark.angel.ml.common.Trainer.trainOneBatch(Trainer.scala:43)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$train$1$$anonfun$apply$mcVI$sp$1$$anonfun$8.apply(AngelClassifier.scala:245)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
	at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1015)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2123)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

19/09/03 13:35:45 ERROR TaskSetManager: Task 0 in stage 12.0 failed 1 times; aborting job

my spark version is 2.3.0
and run this code in spark-shell

spark-shell \
  --conf spark.ps.jars=$SONA_ANGEL_JARS \
  --conf spark.ps.instances=10 \
  --conf spark.ps.cores=2 \
  --conf spark.ps.memory=6g \
  --jars $SONA_SPARK_JARS \
  --name "demo1" \
  --driver-memory 10g \
  --num-executors 10 \
  --executor-cores 2 \
  --executor-memory 4g

had a not serializable result: com.tencent.angel.ml.math2.utils.LabeledData

Serialization stack:
- object not serializable (class: com.tencent.angel.ml.math2.utils.LabeledData, value: com.tencent.angel.ml.math2.utils.LabeledData@4da494b8)
2020-05-09 18:05:47 INFO DAGScheduler:54 - Job 0 failed: count at FTRLExample.scala:109, took 0.395223 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: com.tencent.angel.ml.math2.utils.LabeledData
Serialization stack:
- object not serializable (class: com.tencent.angel.ml.math2.utils.LabeledData, value: com.tencent.angel.ml.math2.utils.LabeledData@4da494b8)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
at org.apache.spark.rdd.RDD.count(RDD.scala:1162)
at com.tencent.angel.sona.examples.online_learning.FTRLExample$.train(FTRLExample.scala:109)
at com.tencent.angel.sona.examples.online_learning.FTRLExample$.main(FTRLExample.scala:51)
at com.tencent.angel.sona.examples.online_learning.FTRLExample.main(FTRLExample.scala)

java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor

Spark 2.4.4
Hadoop3.1
Angel3.0.1
sona 0.1.0
my pararm:

${SPARK_HOME}/bin/spark-submit
--master yarn
--deploy-mode client
--conf spark.ps.jars=${SONA_ANGEL_JARS}
--conf spark.ps.instances=1
--conf spark.ps.cores=1
--conf spark.ps.memory=2g
--conf spark.yarn.queue=default
--jars ${SONA_SPARK_JARS}
--name "LR-spark-on-angel"
--files ${SONA_HOME}/bin/lr.json
--driver-memory 2g
--num-executors 2
--executor-cores 2
--executor-memory 3g
--class org.apache.spark.angel.examples.JsonRunnerExamples
${SONA_HOME}/lib/angelml-${SONA_VERSION}.jar
data:/src/sona/data/angel/a9a/a9a_123d_train.libsvm
modelPath:/test/spark-output-model
jsonFile:./lr.json
lr:0.1
Have the below error log:

2019-10-16 06:28:18,138 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on cc442b7f4080:37167 (size: 8.2 KB, free: 1450.3 MB)
2019-10-16 06:28:18,188 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on cc442b7f4080:37167 (size: 1844.0 B, free: 1450.3 MB)
2019-10-16 06:28:18,214 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

2019-10-16 06:28:18,215 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 2.0 (TID 3, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,227 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 2.0 (TID 3) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 1]
2019-10-16 06:28:18,227 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 2.0 (TID 4, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,237 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 2.0 (TID 4) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 2]
2019-10-16 06:28:18,238 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1, partition 0, PROCESS_LOCAL, 8272 bytes)
2019-10-16 06:28:18,247 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 2.0 (TID 5) on cc442b7f4080, executor 1: java.io.InvalidClassException (com.tencent.angel.sona.core.ExecutorContext; no valid constructor) [duplicate 3]
2019-10-16 06:28:18,248 ERROR scheduler.TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
2019-10-16 06:28:18,249 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 
2019-10-16 06:28:18,251 INFO cluster.YarnClusterScheduler: Cancelling stage 2
2019-10-16 06:28:18,251 INFO cluster.YarnClusterScheduler: Killing all running tasks in stage 2: Stage cancelled
2019-10-16 06:28:18,252 INFO scheduler.DAGScheduler: ResultStage 2 (reduce at AngelClassifier.scala:167) failed in 0.132 s due to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
2019-10-16 06:28:18,253 INFO scheduler.DAGScheduler: Job 2 failed: reduce at AngelClassifier.scala:167, took 0.136431 s
2019-10-16 06:28:18,254 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
	at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
	at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
	at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2019-10-16 06:28:18,258 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
	at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
	at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
	at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
)
2019-10-16 06:28:18,265 INFO spark.SparkContext: Invoking stop() from shutdown hook
2019-10-16 06:28:18,269 INFO server.AbstractConnector: Stopped Spark@3f71da43{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
2019-10-16 06:28:18,270 INFO ui.SparkUI: Stopped Spark web UI at http://cc442b7f4080:34269
2019-10-16 06:28:18,273 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
2019-10-16 06:28:18,275 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
2019-10-16 06:28:18,275 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2019-10-16 06:28:18,278 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
2019-10-16 06:28:18,336 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2019-10-16 06:28:18,376 INFO memory.MemoryStore: MemoryStore cleared
2019-10-16 06:28:18,376 INFO storage.BlockManager: BlockManager stopped
2019-10-16 06:28:18,377 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2019-10-16 06:28:18,379 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2019-10-16 06:28:18,389 INFO spark.SparkContext: Successfully stopped SparkContext
2019-10-16 06:28:18,391 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, cc442b7f4080, executor 1): java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1035)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:167)
	at org.apache.spark.angel.ml.classification.AngelClassifier.train(AngelClassifier.scala:48)
	at org.apache.spark.angel.ml.Predictor.fit(Predictor.scala:118)
	at org.apache.spark.angel.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:109)
	at org.apache.spark.angel.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.InvalidClassException: com.tencent.angel.sona.core.ExecutorContext; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$8.apply(TorrentBroadcast.scala:308)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.angel.ml.util.FeatureStats.partitionStatsWithPS(FeatureStats.scala:106)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.angel.ml.classification.AngelClassifier$$anonfun$4.apply(AngelClassifier.scala:166)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
)
2019-10-16 06:28:18,400 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
2019-10-16 06:28:18,502 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://localhost:9000/user/root/.sparkStaging/application_1571206972109_0001
2019-10-16 06:28:18,527 INFO util.ShutdownHookManager: Shutdown hook called
2019-10-16 06:28:18,527 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1571206972109_0001/spark-04921a2c-c3eb-4a0d-be35-808fccf8473f
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop heartbeat thread!
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop op log merger
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop clock cache
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop matrix cache
2019-10-16 06:28:18,530 INFO psagent.PSAgent: stop user request adapater
2019-10-16 06:28:18,531 INFO psagent.PSAgent: stop rpc dispacher
2019-10-16 06:28:18,531 INFO transport.ChannelManager2: Channel manager stop
2019-10-16 06:28:18,540 INFO broadcast.TorrentBroadcast: Destroying Broadcast(3) (from destroy at DriverContext.scala:128)
2019-10-16 06:28:18,540 WARN util.ShutdownHookManager: ShutdownHook '$anon$1' failed, java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
	at org.apache.spark.broadcast.TorrentBroadcast.doDestroy(TorrentBroadcast.scala:198)
	at org.apache.spark.broadcast.Broadcast.destroy(Broadcast.scala:111)
	at org.apache.spark.broadcast.Broadcast.destroy(Broadcast.scala:98)
	at com.tencent.angel.sona.core.DriverContext$$anonfun$com$tencent$angel$sona$core$DriverContext$$doStopAngel$1.apply(DriverContext.scala:128)
	at com.tencent.angel.sona.core.DriverContext$$anonfun$com$tencent$angel$sona$core$DriverContext$$doStopAngel$1.apply(DriverContext.scala:125)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
	at com.tencent.angel.sona.core.DriverContext.com$tencent$angel$sona$core$DriverContext$$doStopAngel(DriverContext.scala:125)
	at com.tencent.angel.sona.core.DriverContext$$anon$1.run(DriverContext.scala:90)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

More efficient data loading for GBDT

The binning process in GBDT requires to cache two copies of datasets, which is inefficient for memory-limited users.

Implement a two-phase data loading method, which is more memory efficient.

some doc error and data missing

image

in trainer script , the class was predictor

also

gbdt sample data is missing which is dna/* in the angel
image
maybe we can use sample_multiclass_classification_data.txt to replace.

deepfm fail

屏幕快照 2019-10-18 11 26 53

屏幕快照 2019-10-18 11 27 27

spark-submit --master yarn-cluster --conf spark.ps.jars=hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar
--conf spark.ps.instances=2 --conf spark.ps.cores=3 --conf spark.ps.memory=5g
--jars hdfs:///user/brook/sona-0.1.0-bin/lib/fastutil-7.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/htrace-core-2.05.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/sizeof-0.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/kryo-shaded-4.0.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/minlog-1.3.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/memory-0.8.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/commons-pool-1.6.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netty-all-4.1.17.Final.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/hll-1.6.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jniloader-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/native_system-java-1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/arpack_combined_all-0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-1.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-armhf-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-i686-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/netlib-native_system-linux-x86_64-1.1-natives.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/jettison-1.4.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/json4s-native_2.11-3.2.11.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-format-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-mlcore-0.1.2.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-core-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-mllib-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-psf-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-math-0.1.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angel-ps-graph-3.0.1.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/core-0.1.0.jar,hdfs:///user/brook/sona-0.1.0-bin/lib/angelml-0.1.0.jar,hdfs:///user/brook/angel-2.1.0-bin/lib/scala-library-2.11.8.jar
--files ./deepfm.json --driver-memory 2g --num-executors 2 --executor-cores 3 --executor-memory 5g
--class com.tencent.angel.sona.examples.JsonRunnerExamples
../lib/angelml-0.1.0.jar
jsonFile:./deepfm.json
dataFormat:libsvm
data:a9a_123d_train.libsvm
modelPath:model_dfm
predictPath:pred_dfm
actionType:train
numBatch:500
maxIter:2
lr:4.0
numField:39

This is my submit command.
Both wide&deep and deepFm give this error. Looking forward to your help?

JavaNullPointerException

java.lang.NullPointerException
at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:512)
at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
at sun.misc.URLClassPath.getResource(URLClassPath.java:238)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at net.qihoo.spinner.HdfsClassLoader.loadClass(HdfsClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:70)
at net.qihoo.spinner.SpinnerViewFs.initializeProperties(SpinnerViewFs.java:18)
at net.qihoo.spinner.SpinnerDistributedFileSystem.initialize(SpinnerDistributedFileSystem.java:47)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2689)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$cleanupStagingDirInternal$1(Client.scala:199)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:217)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:182)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1523)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:880)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:93)
at net.qihoo.spinner.SpinnerViewFs.initializeProperties(SpinnerViewFs.java:18)
at net.qihoo.spinner.SpinnerDistributedFileSystem.initialize(SpinnerDistributedFileSystem.java:47)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2689)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$cleanupStagingDirInternal$1(Client.scala:199)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:217)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:182)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1523)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:880)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:512)
at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
at sun.misc.URLClassPath.getResource(URLClassPath.java:238)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at net.qihoo.spinner.HdfsClassLoader.loadClass(HdfsClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at net.qihoo.spinner.HYReflection.newInstance(HYReflection.java:70)
... 15 more

想参与sona的研发

想参与sona的研发,如何参与,我现在正在用我们的angel,社区更新慢,想参与进来。

java.lang.NoSuchMethodError in JsonUtils

Need I update the angel-mlcore library or json4s library?

An exception or error caused a run to abort: org.json4s.native.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue; 
java.lang.NoSuchMethodError: org.json4s.native.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
	at com.tencent.angel.ml.core.utils.JsonUtils$.parseAndUpdateJson(JsonUtils.scala:400)
	at org.apache.spark.angelml.param.AngelGraphParams$class.updateFromJson(AngelGraphParams.scala:31)
	at org.apache.spark.angelml.classification.AngelClassifier.updateFromJson(AngelClassifier.scala:31)
	at org.apache.spark.angelml.classification.AngelClassifier.updateFromJson(AngelClassifier.scala:31)
	at org.apache.spark.angelml.param.ParamsHelper$class.finalizeConf(ParamsHelper.scala:43)
	at org.apache.spark.angelml.classification.AngelClassifier.finalizeConf(AngelClassifier.scala:31)
	at org.apache.spark.angelml.classification.AngelClassifier.train(AngelClassifier.scala:177)
	at org.apache.spark.angelml.classification.AngelClassifier.train(AngelClassifier.scala:31)
	at org.apache.spark.angelml.Predictor.fit(Predictor.scala:118)
	at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply$mcV$sp(AngelClassificationSuite.scala:31)
	at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply(AngelClassificationSuite.scala:18)
	at org.apache.spark.angelml.classification.AngelClassificationSuite$$anonfun$1.apply(AngelClassificationSuite.scala:18)

求助 Failed to run job : Application application_1591113812497_103797 failed 2 times 作业运行失败

[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.updateMaster(AngelYarnClient.java:517)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:170)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.sona.core.DriverContext.startAngelAndPSAgent(DriverContext.scala:97)
at com.tencent.angel.sona.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:69)
at com.tencent.angel.sona.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" com.tencent.angel.exception.AngelException: java.io.IOException: Failed to run job : Application application_1591113812497_103797 failed 2times (global limit =3; local limit is =2) due to AM Container for appattempt_1591113812497_103797_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-25 22:33:23.063]Exception from container-launch.
Container id: container_e74_1591113812497_103797_02_000001
Exit code: 1

[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:176)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.sona.core.DriverContext.startAngelAndPSAgent(DriverContext.scala:97)
at com.tencent.angel.sona.examples.JsonRunnerExamples$.main(JsonRunnerExamples.scala:69)
at com.tencent.angel.sona.examples.JsonRunnerExamples.main(JsonRunnerExamples.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to run job : Application application_1591113812497_103797 failed 2 times (global limit =3; local limit is =2) due to AM Container for appattempt_1591113812497_103797_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-25 22:33:23.063]Exception from container-launch.
Container id: container_e74_1591113812497_103797_02_000001
Exit code: 1

[2020-06-25 22:33:23.064]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

[2020-06-25 22:33:23.065]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=100M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=200M; support was removed in 8.0

For more detailed output, check the application tracking page: http://ecs-hn1b-bd-cdp-edg-2:8188/applicationhistory/app/application_1591113812497_103797 Then click on links to logs of each attempt.
. Failing the application.
at com.tencent.angel.client.yarn.AngelYarnClient.updateMaster(AngelYarnClient.java:517)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:170)
... 16 more
20/06/25 22:33:15 INFO spark.SparkContext: Invoking stop() from shutdown hook
20/06/25 22:33:15 INFO server.AbstractConnector: Stopped Spark@7a389761{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
20/06/25 22:33:15 INFO ui.SparkUI: Stopped Spark web UI at http://ecs-hn1a-xng-alg-rcmd-edg-1:4041
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
20/06/25 22:33:15 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/06/25 22:33:15 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
20/06/25 22:33:15 INFO cluster.YarnClientSchedulerBackend: Stopped
20/06/25 22:33:15 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/25 22:33:15 INFO memory.MemoryStore: MemoryStore cleared
20/06/25 22:33:15 INFO storage.BlockManager: BlockManager stopped
20/06/25 22:33:15 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/06/25 22:33:15 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/25 22:33:15 INFO spark.SparkContext: Successfully stopped SparkContext
20/06/25 22:33:15 INFO util.ShutdownHookManager: Shutdown hook called
20/06/25 22:33:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a689fb57-937d-4d2b-bf50-f59d08e9b6b8
20/06/25 22:33:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2d255dca-b9e1-45ad-ba9d-6b8beae996e3
20/06/25 22:33:15 INFO client.AngelClient: stop the application
20/06/25 22:33:15 INFO client.AngelClient: master is null, just kill the application
20/06/25 22:33:15 INFO impl.YarnClientImpl: Killed application application_1591113812497_103797

ftrlFm bug

when the partition number of the training set is 10, the AUC is 0.67;But when the partition number of the training set is 100, the AUC is 0.51. why?

能否告知下scala版本

在使用sona的过程中,使用的是spark2.3.1和scala2.11.8版本,但是报如下错误,
image
查看scala-library-2.11.8.jar ,确实没有这个方法,想知道您用的是哪个版本的Scala
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.