Unable to get Scala code to read blogs.json via the Schema definition provided in the book.
[Code]
`package main.scala.chapter3
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.{col, expr}
object Example3_7 {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.appName("Example-3_7")
.getOrCreate()
if (args.length <= 0) {
println("usage Example3_7 <file path to blogs.json")
System.exit(1)
}
//get the path to the JSON file
val jsonFile = args(0)
//define our schema as before
val schema = StructType(Array(
StructField("Campaigns", ArrayType(StringType), true),
StructField("First", StringType, true),
StructField("Hits", LongType, true),
StructField("Id", LongType, true),
StructField("Last", StringType, true),
StructField("Published", StringType, true),
StructField("Url", StringType, true)
))
//Create a DataFrame by reading from the JSON file a predefined Schema
val blogsDF = spark.read.schema(schema).json(jsonFile)
//show the DataFrame schema as output
blogsDF.show(false)
// print the schemas
println(blogsDF.printSchema)
println(blogsDF.schema)
}
}`
[End Code]
LongType and IntegerType were both used for Hits and Id, but both have generated the following error on my machine
scala> kmontano18@DESKTOP-PRKRT1A:~$ spark-shell -i scalaSchema.scala blogs.json
blogs.json:1: error: identifier expected but integer literal found.
{"Id":1, "First": "Jules", "Last":"Damji", "Url":"https://tinyurl.1", "Published":"1/4/2016", "Hits": 4535, "Campaigns": ["twitter", "LinkedIn"]}
However, when reading the json via Spark's implicit read, it generated the expected schema, albeit in a different order
scala> val df = spark.read.json("blogs.json")
df: org.apache.spark.sql.DataFrame = [Campaigns: array, First: string ... 5 more fields]
scala> df.printSchema()
root
|-- Campaigns: array (nullable = true)
| |-- element: string (containsNull = true)
|-- First: string (nullable = true)
|-- Hits: long (nullable = true)
|-- Id: long (nullable = true)
|-- Last: string (nullable = true)
|-- Published: string (nullable = true)
|-- Url: string (nullable = true)