Hi. I am trying to use this connector in AWS EMR Cluster. I just dow

Hi. I created an uber jar with google client library using sbt. <div class="snippe

That is a different error <a class="user-mention notranslate" data-hovercard-type="use

Yes. I tried that, I changed my build file like this, <div class="snippet-clipboar

Using spark-bigquery connector in AWS EMR Zeppelin about spark-bigquery HOT 4 CLOSED

samelamin commented on July 23, 2024

Using spark-bigquery connector in AWS EMR Zeppelin

from spark-bigquery.

Comments (4)

samelamin commented on July 23, 2024

Looks like you need to create an uber jar because the connector on its own needs the google client. Most clusters already have that but sounds like you need to load it onto zeppelin specifically

I would suggest creating a fat jar and seeing if you can get spark to run a sample application on the EMR, once that works it should be simpler to port it to zeppelin

from spark-bigquery.

Jeeva-Ganesan commented on July 23, 2024

Hi. I created an uber jar with google client library using sbt.

libraryDependencies ++= {
  val sparkVer = "2.2.1"
  val sparkbqVer = "0.2.4"
  Seq(
    "org.apache.spark" %% "spark-core" % sparkVer % "compile" withSources(),
    "org.apache.spark" %% "spark-sql" % sparkVer % "provided", //% "compile" withSources(),
    "org.apache.spark" %% "spark-hive" % sparkVer, //% "provided" withSources(),
    "com.github.samelamin" %% "spark-bigquery" % sparkbqVer,
    "com.google.api-client" % "google-api-client" % "1.23.0"
  )
}

This is the error I am getting when i submit my spark job with spark submit command ,

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
        at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68)
        at com.samelamin.spark.bigquery.BigQueryRelation.getConvertedSchema(BigQueryRelation.scala:19)
        at com.samelamin.spark.bigquery.BigQueryRelation.schema(BigQueryRelation.scala:13)
        at org.apache.spark.sql.execution.datasources.LogicalRelation$.apply(LogicalRelation.scala:77)
        at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:424)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at dataload.pull_gbq_data$.main(pull_gbq_data.scala:18)
        at dataload.pull_gbq_data.main(pull_gbq_data.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

from spark-bigquery.

samelamin commented on July 23, 2024

That is a different error @Jeeva-Ganesan and it has to do with Guava

You need to shade it into the uber jar, if you google it you will see what I mean

I think anything about guava 18 should fix it

from spark-bigquery.

Jeeva-Ganesan commented on July 23, 2024

Yes. I tried that, I changed my build file like this,

assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "config" % "23.6")
    .inProject
)

libraryDependencies ++= {
  val sparkVer = "2.2.1"
  val sparkbqVer = "0.2.4"
  Seq(
    "org.apache.spark" %% "spark-core" % sparkVer % "provided", //compile" withSources(),
    "org.apache.spark" %% "spark-sql" % sparkVer % "provided", //% "compile" withSources(),
    "org.apache.spark" %% "spark-hive" % sparkVer, //% "provided" withSources(),
    "com.github.samelamin" %% "spark-bigquery" % sparkbqVer % "compile",
    "com.google.api-client" % "google-api-client" % "1.23.0" % "compile",
    "com.google.guava" % "guava" % "23.6"
  )
}

Still got the error, ended up downloading latest guava jar and placing it in spark jars folder (deleting the existing one). Then it worked.

from spark-bigquery.

Using spark-bigquery connector in AWS EMR Zeppelin about spark-bigquery HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent