spark-in-action / first-edition Goto Github PK
View Code? Open in Web Editor NEWThe book's repo
Home Page: https://www.manning.com/books/spark-in-action
The book's repo
Home Page: https://www.manning.com/books/spark-in-action
The task is to count the number of buy and sell orders per second.
The code example does not take into account the time stamp at all. How is it possible to know that what is reduced is actually within a second. What i understood was that the mini-batch was every 3 seconds.
Honestly i am quite confused, when we say the number of sell and buy per second, what do we mean exactly ? Do we mean buy and sell that have a time stamp that fall within 1 second of distance, do we mean what we get per second, independently of the time stamp ?
Can this be at least clarified ?
Hi,
sources of dashboard in:
https://github.com/spark-in-action/first-edition/releases/
do not contain sources of dashboard.
Section 1.5.1 (Downloading and starting the VM) talks about installing Oracle VirtualBox and Vagrant.
I already have VirtualBox running. Do I need to install Vagrant on my Mac or in VirtualBox.
`import org.apache.spark._
import org.apache.spark.streaming._
val ssc = new StreamingContext(sc, Seconds(5))
val filestream = ssc.textFileStream("/home/spark/ch06input")
import java.sql.Timestamp
case class Order(time: java.sql.Timestamp, orderId:Long, clientId:Long, symbol:String, amount:Int, price:Double, buy:Boolean)
import java.text.SimpleDateFormat
val orders = filestream.flatMap(line => {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss")
val s = line.split(",")
try {
assert(s(6) == "B" || s(6) == "S")
List(Order(new Timestamp(dateFormat.parse(s(0)).getTime()), s(1).toLong, s(2).toLong, s(3), s(4).toInt, s(5).toDouble, s(6) == "B"))
}
catch {
case e : Throwable => println("Wrong line format ("+e+"): "+line)
List()
}
})
val numPerType = orders.map(o => (o.buy, 1L)).reduceByKey((c1, c2) => c1+c2)
numPerType.repartition(1).saveAsTextFiles("/home/spark/ch06output/output", "txt")
ssc.start()`
I've been following the book in chapter06 step by step, 1. spark-shell --master local[*]. 2. create ssc. 3. ssc.start. 4. excute the shell script. 5. waiting for results. Since I'm in the spark-shell (in VM of course), but couldn't get any results of this case study, all files in ch06output/ directory are empty. Don't know why, anyone can help me...
While doing spark-submit for the realtime dashboard application, i get following error ๐
spark@spark-in-action:~/uc1-docker$ ./run-all.sh
Zookeeper already started
Kafka already started
sia-dashboard already running
Submitting Spark job
Starting Kafka direct stream to broker list: 192.168.10.2:9092
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.sia.loganalyzer.StreamingLogAnalyzer$.main(StreamingLogAnalyzer.scala:76)
at org.sia.loganalyzer.StreamingLogAnalyzer.main(StreamingLogAnalyzer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.