sasha-polev / aerospark Goto Github PK
View Code? Open in Web Editor NEWAerospike Spark Connector
License: Apache License 2.0
Aerospike Spark Connector
License: Apache License 2.0
Not a bug. Just a typo in description.
Hi,
I slightly modified the AerosparkRDD and used that happily. But once one of the nodes had problems and I added timeout for the query. Then I tried to catch the exception which I think should be thrown from:
https://github.com/sasha-polev/aerospark/blob/master/src/main/scala/com/osscube/spark/aerospike/rdd/AerospikeRDD.scala#L84
but the exception is not catched and it kills the executor and then whole job. My code looks like this (simplified):
try {
val res: RecordSet = client.query(qp, stmt)
} catch {
case ae: AerospikeException => System.out.println("E" + ae.toString)
}
Should I catch the exception somewhere else?
Thanks
Hi Sasha,
I'm trying to run the following query:
select field from tmp where id='1' and i'm getting the following exception:
(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
I'm running on Spark 1.5.0 and using aerospike-client aerospike-client-3.1.5.jar.
Thanks,
Sasi
Hi Sasha,
I run the following code:
scf.aeroSInput(host+":"+port, "select......", sqlCtx, 1);
and I got two different response:
Thanks,
Sasi
Hi all,
I need to get the key value from sets, so if I do e.g.
select key from ns.set
then I get no results found.
I'm using Aerospark connector.
Is there any way to get it? When I do select * from ns.set I see it under bin key.
Thanks,
Hi Sasha,
I found a bug that seems to be from the connector.
SparkContextFunctions scf= new SparkContextFunctions(sparkCtx); DataFrame df = scf.aeroSInput(aeroSpikeAddress, query, sqlCtx 4); df.registerTempTable(TableConfiguration.TABLE_NAME);
Looks like the DataFrame doesn't get refresh/sync with Aerospike.
If you run the same scenario when there's data on the Aerospike, then it won't happen.
Sasi
I have created the .jar file as per instructions (aerospike-spark-assembly-1.3.jar). How can I use it in another scal-spark application with this jar as a dependency?
Thanks!
java.lang.NullPointerException
at com.osscube.spark.aerospike.rdd.AeroRelation.schema(AeroRelation.scala:78)
at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:31)
at org.apache.spark.sql.execution.datasources.CreateTempTableUsing.run(ddl.scala:97)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
Compiling the aerospark I got this error. Please advise:
[INFO] Compiling 7 source files to /home/dev/aerospark/target/classes at 1428584618789
[ERROR] /home/dev/aerospark/src/main/scala/com/osscube/spark/aerospike/rdd/AerospikeRDD.scala:77: error: overloaded method value setAggregateFunction with alternatives:
[INFO](x$1: ClassLoader,x$2: String,x$3: String,x$4: String,x$5: <repeated...>[com.aerospike.client.Value])Unit
[INFO](x$1: String,x$2: String,x$3: <repeated...>[com.aerospike.client.Value])Unit
[INFO] cannot be applied to (String, String, Array[com.aerospike.client.Value], Boolean)
[INFO] newSt.setAggregateFunction("spark_filters", "multifilter", udfFilters, true)
Hi,
Is there away to use limit
keyword on queries?
If not, is there a feature request?
Thanks,
Hi,
Is there away to connect multiple Aerospike hosts?
Is there any constant to tell Spark on which host Aerospike is located?
Thanks,
Sasi
Hi,
I'm facing with issue on my environment when my dataframe doesn't display the real status of my Aerospike bin.
I have the following code:
SparkContextFunctions sparkContextFunctions = new SparkContextFunctions(spark_context);
String aeroSpikeAddress = host+port;
String query = "select * from bin";
dataFrame = sparkContextFunctions.aeroSInput(aeroSpikeAddress, query, sqlCtx, 6);
dataFrame .registerTempTable("testTbl");
dataFrame .persist(StorageLevel.MEMORY_ONLY());
After create the tmpTbl I do the following:
subscribersDataFrame.where(whereClause).collect();
At the first time I get the same values of the Aerospike bin, but if I add or delete row I don't get the same status.
If I drop the table and create it again then I get the right status.
Thanks.
Sasi
Hi,
We are saving blob data which is serializable and we want to spilt to bins and records.
Is there away to do so?
Thanks,
Sasi
Hi,
I tried to create RDD from select with not fixed data structure. I mean that one record contans bin "a" and another "b". When I use select * from ..
I get only one bin name in the results (because of caching mentioned below) but when I use all bin names select a,b from ..
i get Exception because Null is coverted to value.
I find out that structure is cached, but even if I commented the caching out the problem is the same.
I am willing to help if this issue is reasonable.
(I couldn't find a way to file the issue in https://github.com/aerospike/aerospark so I'm doing it here so it's not lost.)
When I tried to build aerospark I followed the instructions at https://github.com/aerospike/aerospark#how-to-build, i.e.
$ oss # switch to the directory with OSS projects
$ git clone https://github.com/aerospike/aerospike-helper
$ cd aerospike-helper/java
$ mvn clean install -DskipTests
It finished successfully.
$ oss
$ git clone https://github.com/aerospike/aerospark.git
$ cd aerospark
$ sbt 'set test in assembly := {}' clean assembly
That failed due to the following errors:
[error] 1 error was encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/jacek/dev/oss/aerospark/lib/aerospike-helper-java-1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
/Users/jacek/.m2/repository/com/aerospike/aerospike-helper-java/1.0.6/aerospike-helper-java-1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
/Users/jacek/.ivy2/cache/com.aerospike/aerospike-client/jars/aerospike-client-3.3.0.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
at sbtassembly.Assembly$.applyStrategies(Assembly.scala:140)
at sbtassembly.Assembly$.x$1$lzycompute$1(Assembly.scala:25)
at sbtassembly.Assembly$.x$1$1(Assembly.scala:23)
at sbtassembly.Assembly$.stratMapping$lzycompute$1(Assembly.scala:23)
at sbtassembly.Assembly$.stratMapping$1(Assembly.scala:23)
at sbtassembly.Assembly$.inputs$lzycompute$1(Assembly.scala:67)
at sbtassembly.Assembly$.inputs$1(Assembly.scala:57)
at sbtassembly.Assembly$.apply(Assembly.scala:83)
at sbtassembly.Assembly$$anonfun$assemblyTask$1.apply(Assembly.scala:240)
at sbtassembly.Assembly$$anonfun$assemblyTask$1.apply(Assembly.scala:237)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:237)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/jacek/dev/oss/aerospark/lib/aerospike-helper-java-1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
[error] /Users/jacek/.m2/repository/com/aerospike/aerospike-helper-java/1.0.6/aerospike-helper-java-1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
[error] /Users/jacek/.ivy2/cache/com.aerospike/aerospike-client/jars/aerospike-client-3.3.0.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
[error] Total time: 43 s, completed Oct 22, 2016 12:44:42 AM
Any idea how to fix it?
Hi, using latest git master and spark 1.3.1, received following error below
thanks in advance for the assistance
[amilkowski@localhost aerospark]$ which spark-shell
/opt/local/src/spark/spark-1.3.1/bin/spark-shell
[amilkowski@localhost aerospark]$
[amilkowski@localhost aerospark]$ asinfo
6 : edition
Aerospike Community Edition
7 : version
Aerospike Community Edition build 3.5.14
8 : build
3.5.14
9 : services
executed below
CREATE TEMPORARY TABLE aero USING com.osscube.spark.aerospike.rdd OPTIONS (initialHost "127.0.0.1:3000", select "select column1,column2,intColumn1 from test.one_million where intColumn1 between -1000 and 1000000", partitionsPerServer "2");
select * from aero;
select count(distinct column2) from aero where intColumn1 > -1000000 and intColumn1 < 100000;
> select count(distinct column2) from aero where intColumn1 > -1000000 and intColumn1 < 100000;
UDF Filters applied: [column2, intColumn1],[3, intColumn1, -1000000, 100000],[3, intColumn1, -1000000, 100000]
useUDF: true
UDF params: (3,intColumn1,,List((-1000000,100000)))-(3,intColumn1,,List((-1000000,100000)))
15/07/30 16:31:04 ERROR Executor: Exception in task 0.0 in stage 34.0 (TID 29)
java.lang.Exception: (gen:0),(exp:0),(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:111)
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:92)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:129)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/07/30 16:31:04 WARN TaskSetManager: Lost task 0.0 in stage 34.0 (TID 29, localhost): java.lang.Exception: (gen:0),(exp:0),(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:111)
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:92)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:129)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/07/30 16:31:04 ERROR TaskSetManager: Task 0 in stage 34.0 failed 1 times; aborting job
15/07/30 16:31:04 ERROR SparkSQLDriver: Failed in [select count(distinct column2) from aero where intColumn1 > -1000000 and intColumn1 < 100000]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 34.0 (TID 29, localhost): java.lang.Exception: (gen:0),(exp:0),(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:111)
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:92)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:129)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 34.0 (TID 29, localhost): java.lang.Exception: (gen:0),(exp:0),(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:111)
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:92)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:129)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/07/30 16:31:04 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 34.0 (TID 29, localhost): java.lang.Exception: (gen:0),(exp:0),(bins:(FAILURE:UDF: Execution Error 2 : /opt/aerospike/usr/udf/lua/spark_filters.lua:32: bad argument #1 to 'ipairs' (table expected, got nil)))
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:111)
at com.osscube.spark.aerospike.rdd.AerospikeRDD$$anonfun$compute$4.apply(AerospikeRDD.scala:92)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:129)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Hi Sasha,
I have the following architecture, 1 client (embedded spark in jboss), 1 cluster of Aerospike which contains 3 aerospikes, 3 slaves and 1 master.
If I run the client when all the slaves/aerospikes are up & running i'm getting resutls, but if one of the aerospikes is down i'll get error of no route to host and non result will return from the spark.
[2016-02-15 15:50:28,812] WARN : (com.cxtrm.mgmt.subscriber.spark.SparkDataAccessorBean:148) -[WorkerThread#2[10.0.40.218:60444]] Failed to count subscribers Job aborted due to stage failure: Task 1 in stage 135.0 failed 4 times, most r
ecent failure: Lost task 1.3 in stage 135.0 (TID 196, 10.0.42.168): com.aerospike.client.AerospikeException$Connection: Error Code 11: Failed to connect to host(s):
10.0.42.169:3000 Error Code 11: java.net.NoRouteToHostException: No route to host
at com.aerospike.client.cluster.Cluster.seedNodes(Cluster.java:400)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:269)
at com.aerospike.client.cluster.Cluster.waitTillStabilized(Cluster.java:234)
at com.aerospike.client.cluster.Cluster.initTendThread(Cluster.java:144)
at com.aerospike.client.AerospikeClient.<init>(AerospikeClient.java:208)
at com.aerospike.client.AerospikeClient.<init>(AerospikeClient.java:172)
at com.osscube.spark.aerospike.rdd.AerospikeRDD.compute(AerospikeRDD.scala:83)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Driver stacktrace:
[2016-02-15 15:50:36,002] ERROR : (org.apache.spark.scheduler.TaskSetManager:75) -[task-result-getter-3] Task 0 in stage 140.0 failed 4 times; aborting job
[2016-02-15 15:50:36,004] WARN : (com.cxtrm.mgmt.subscriber.spark.SparkDataAccessorBean:120) -[WorkerThread#5[10.0.40.218:60429]] Failed to add rowJob aborted due to stage failure: Task 0 in stage 140.0 failed 4 times, most recent failure: Lost task 0.3 in stage 140.0 (TID 204, 10.0.42.168): com.aerospike.client.AerospikeException$Connection: Error Code 11: Failed to connect to host(s):
10.0.42.169:3000 Error Code 11: java.net.SocketTimeoutException: connect timed out
at com.aerospike.client.cluster.Cluster.seedNodes(Cluster.java:400)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:269)
at com.aerospike.client.cluster.Cluster.waitTillStabilized(Cluster.java:234)
at com.aerospike.client.cluster.Cluster.initTendThread(Cluster.java:144)
at com.aerospike.client.AerospikeClient.<init>(AerospikeClient.java:208)
at com.aerospike.client.AerospikeClient.<init>(AerospikeClient.java:172)
at com.osscube.spark.aerospike.rdd.AerospikeRDD.compute(AerospikeRDD.scala:83)
after restarting the jboss I get the results.
There's a GUI which display the results acts as a remote client.
Regards,
Sasi
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.