Comments (29)
Hi Galvin,
Try using the code from tagv0.4.3 rather than from branch:master. It will work fine. And at the same time comment dbc_user_name related things in build.sbt to avoid errors. Latest branch contains ML code also.
Thanks.
from spark-sql-perf.
I have execute the tpcds1_4 query with 92/99 passed. And write an instruction to use spark-sql-perf.
Everyone can do as the instruction if you faced any problems, here's the link:
https://galvinyang.github.io/2016/07/09/spark-sql-perf%20test/
from spark-sql-perf.
Hi @GalvinYang
Thanks a ton for your blog. It has been super helpful especially for someone who is starting off from scratch.
But I am having trouble retrieving results if I follow the README file.
tpcds.createResultsTable() gives me createResultsTable is not a member of com.databricks.spark.sql.perf.tpcds.TPCDS error
sqlContext.table("sqlPerformance") gives me org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'sqlperformance' not found in database 'sparktest'.
When I try to get results from a particular run by using - sqlContext.table("sqlPerformance").filter("timestamp = 1476844414082"),
I get this -org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'sqlperformance' not found in database 'sparktest'
This doesn't make sense because, at the very end of the experiment run, I got Results written to table: 'sqlPerformance' at /spark/sql/performance/timestamp=1476844414082.
Do you have any idea how to solve this?
Thanks in advance!
from spark-sql-perf.
Can you paste the errors you are getting by running bin/run --benchmark DatasetPerformance ?
This is the default test suite/test case or benchmark class and once you are able to compile and run this, you will see static output.
from spark-sql-perf.
Build is incomplete. it gives me entire log as an error messages so I am not able to figure what is going wrong in the build. Execution gets stuck after certain step. PFA log.
spark-sql-perf-build-log.txt
from spark-sql-perf.
I don't see any error.
Let the program run completely. This is not complete log.
from spark-sql-perf.
Hi All, I am getting following error
java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/Dataset;
I am using Spark 1.6.1 and Scala 2.11.8 version . Do I need to change the version of scala to get it work ?
from spark-sql-perf.
NoSuchMethodError usually means that you have incompatibility between libraries.....
I think default scala for Spark 1.6.1 is 2.10 (you can try that)
from spark-sql-perf.
I tried with both 2.10.4 and 2.10.5 . I am still facing the same issue.
from spark-sql-perf.
Hi
I am facing below issues, when I am trying to run this code. Could anyone revert on these issues to go ahead.
- For this command, bin/run --benchmark DatasetPerformance-- its getting stuck for hours as in the log spark-sql-perf-build-log.txt attached by npaluskar above
- I am also facing the NoSuchMethodError issue with scala 2.10.4 version and spark 1.6.1. Please let us know the resolution if any.
3)If I am using spark 2.0.0 preview version, then I am able to generate data and create external tables . But getting stuck at val tpcds = new TPCDS (sqlContext = sqlContext) statement due to scala crash as mentioned in #70
from spark-sql-perf.
- For this command, bin/run --benchmark DatasetPerformance-- its getting stuck for hours as in the log spark-sql-perf-build-log.txt attached by npaluskar above --> This happened with me when I ran the command for second time I am not sure why this happens but it happens every time when you run the command for second time but when I ran it first time i had successful run . So you might want to restart the session and try again.
- I am also facing the NoSuchMethodError issue with scala 2.10.4 version and spark 1.6.1. Please let us know the resolution if any. --> I am still trying to figure it out.
3)If I am using spark 2.0.0 preview version, then I am able to generate data and create external tables . But getting stuck at val tpcds = new TPCDS (sqlContext = sqlContext) statement due to scala crash as mentioned in #70 --> I am not aware of this as I am still stuck at step 2
from spark-sql-perf.
can you verify your TPCDS.scala class:
Are you using Spark2.0..
from spark-sql-perf.
Yes . TPCDS.scala is same for me . I am using Spark 1.6.1
from spark-sql-perf.
Yes chawla. I am using same file as you mentioned and its spark2.0.0 I am using.
from spark-sql-perf.
There are more API's in spark 2.0 (esp for spark sql perf)...
from you spark-sql-perf-master directory try sbt
it should give you command prompt ...
then type compile
and then run --benchmark DatasetPerformance
spark-sql-perf-master:> sbt
compile
[warn]....
[success]
run --benchmark DatasetPerformance
or alternately, from spark-sql-perf-master directory try ./bin/run --benchmark DatasetPerformance
from spark-sql-perf.
Yes. I used sbt to compile and created jar file for the spark-sql-perf-master and used the same to login to spark shell using command(bin/spark-shell --jars /home/cloudera/spark-sql-perf-master/target/scala-2.10/spark-sql-perf_2.10-0.4.8-SNAPSHOT.jar)
./bin/run --benchmark DatasetPerformance --ran well this time as suggested by nachiket
and ran the below commands for the experiment:
import com.databricks.spark.sql.perf.tpcds.Tables
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val tables = new Tables(sqlContext, "/home/cloudera/tpcds-kit-master/tools/", 1)
tables.genData("hdfs://192.168.126.130:8020/tmp/temp2", "parquet", false, false, false, false, false)
tables.createExternalTables("hdfs://192.168.126.128:8020/tmp/temp2", "parquet", "sparkperf", false)
// Setup TPC-DS experiment
import com.databricks.spark.sql.perf.tpcds.TPCDS
val tpcds = new TPCDS (sqlContext = sqlContext) This command slain the compiler and causing the spark shell 2.0.0 restart
from spark-sql-perf.
Hi Nachiket,
I tried with spark 2.0.0 preview,scala 2.11.8(changed the build.sbt in spark-sql-perf code and compiled it) and the commands ran fine.
Thanks.
from spark-sql-perf.
Hi, I have tried the spark-sql-perf with spark 2.0 as above, and it fails in
val tpcds = new TPCDS (sqlContext = sqlContext) This command slain the compiler and causing the spark shell 2.0.0 restart
Then I want to try to compile the jar with scala 2.11.8, change scalaVersion := "2.10.4" to "2.11.8" in build.sbt.
But it fails at the libraryDependencies += "com.typesafe" %% "scalalogging-slf4j" % "1.1.0"
The package cannot be found.
Can anyone give a solution?
from spark-sql-perf.
Thanks for your answer,I have checked out v0.4.3 and comment the dbc related lines, then failed at compiling:
[info] Compiling 20 Scala sources to /data/ygmz/sparksqlperf/spark-sql-perf/target/scala-2.10/classes...
[warn] /data/ygmz/sparksqlperf/spark-sql-perf/src/main/scala/com/databricks/spark/sql/perf/CpuProfile.scala:107: non-variable type argument String in type pattern Seq[String] is unchecked since it is eliminated by erasure
[warn] case Row(stackLines: Seq[String], count: Long) => stackLines.map(toStackElement) -> count :: Nil
[warn] ^
[error] /data/ygmz/sparksqlperf/spark-sql-perf/src/main/scala/com/databricks/spark/sql/perf/DatasetPerformance.scala:102: object creation impossible, since:
[error] it has 2 unimplemented members.
[error] /** As seen from anonymous class $anon, the missing signatures are as follows.
[error] * For convenience, these are usable as stub implementations.
[error] */
[error] def bufferEncoder: org.apache.spark.sql.Encoder[com.databricks.spark.sql.perf.SumAndCount] = ???
[error] def outputEncoder: org.apache.spark.sql.Encoder[Double] = ???
[error] val average = new Aggregator[Long, SumAndCount, Double] {
[error] ^
[warn] one warning found
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 328 s, completed 2016-7-7 11:16:06
How to go through this?
from spark-sql-perf.
Hi all:
I try to generate TPC-DS data by spark-perf parallelly, but spark throw exceptions like below:
...
scala> tables.genData("hdfs://ocdpCluster/tpcds", "parquet", true, true, false, true, false)
Pre-clustering with partitioning columns with query
SELECT
cs_sold_date_sk,cs_sold_time_sk,cs_ship_date_sk,cs_bill_customer_sk,cs_bill_cdemo_sk,cs_bill_hdemo_sk,cs_bill_addr_sk,cs_ship_customer_sk,cs_ship_cdemo_sk,cs_ship_hdemo_sk,cs_ship_addr_sk,cs_call_center_sk,cs_catalog_page_sk,cs_ship_mode_sk,cs_warehouse_sk,cs_item_sk,cs_promo_sk,cs_order_number,cs_quantity,cs_wholesale_cost,cs_list_price,cs_sales_price,cs_ext_discount_amt,cs_ext_sales_price,cs_ext_wholesale_cost,cs_ext_list_price,cs_ext_tax,cs_coupon_amt,cs_ext_ship_cost,cs_net_paid,cs_net_paid_inc_tax,cs_net_paid_inc_ship,cs_net_paid_inc_ship_tax,cs_net_profit
FROM
catalog_sales_text
DISTRIBUTE BY
cs_sold_date_sk
.
Generating table catalog_sales in database to hdfs://ocdpCluster/tpcds/catalog_sales with save mode Overwrite.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
java.io.FileNotFoundException: Path is not a file: /tpcds/catalog_sales/cs_sold_date_sk=2450815
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
...
How to resolve this?
Thanks
from spark-sql-perf.
I use spark-sql-perf-0.4.3 .I got error when I gen data:
cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
from spark-sql-perf.
Hi @GalvinYang ,
i saw your blog which is very helpful for me to understand the spark-sql-perf tool. Now i have a question to need your help. if i used spark 1.6.2 for TPC-DS benchmark, it mean that i can't use tags/v0.4.3 since the codes are based on Spark 2.0.0, so i have to used an older version(eg,tags/v0.3.2, also set scalaVersion := "2.10.4" with sparkVersion := "1.6.2" in build.sbt) to compile and got spark-sql-perf jar to launch spark-shell to test ..?
Thanks in advance !
from spark-sql-perf.
Hi Zhou,
Sorry for late.
I have tried it with spark 2.0 before because we need to verify the SQL support in spark 2.0. If you want to test it with spark 1.6., you can try as your method, if it cannot work, try different versions.
After all, I think it won't be necessary to test on spark 1.6. since many people have done it before which you can find on google.
At 2016-09-29 14:58:14, "Yi Zhou" [email protected] wrote:
Hi @GalvinYang ,
i saw your blog which is very helpful for me to understand the spark-sql-perf tool. Now i have a question to need your help. if i used spark 1.6.2 for TPC-DS benchmark, it mean that i can't use tags/v0.4.3 since the codes are based on Spark 2.0.0, so i have to used an older version(eg,tags/v0.3.2, also set scalaVersion := "2.10.4" with sparkVersion := "1.6.2" in build.sbt) to compile and got spark-sql-perf jar to launch spark-shell to test ..?
Thanks in advance !
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
from spark-sql-perf.
Hi @GalvinYang ,
Thanks a lot for your reply and blog ! now i can compile the spark-sql-perf jar with targs/v0.3.2 after reference your experiences in blog. Your blog is very helpful for us : )
from spark-sql-perf.
Hi experts,
Now i am using the spark-sql-perf to generate TPC-DS 1TB data with enabling partitionTables like tables.genData("hdfs://ip:8020/tpctest", "parquet", true, true, false, false, false) . But found some of big tables(e.g., store_sales) got slower to be completed. I observed that firstly all data were put in /tpcds_1t/store_sales/_temporary/0, then move to /tpcds_1t/store_sales on HDFS, these 'move' on HDFS took a lot time to complete...If some guys came cross the same issue like me ? How to resolve it ?
Thanks in advance !
from spark-sql-perf.
@GalvinYang
hi:
I am facing below issues, when I am trying to run this code. For this command
tables.createExternalTables("file:///home/tpctest/", "parquet", "mydata", false)
java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier CREATE found
CREATE DATABASE IF NOT EXISTS mydata
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
..............
I used spark-sql-perf-0.2.4 ,scala-2.10.5 spark-1.6.1;
but this commend:
tables.createTemporaryTables("file:///home/wl/tpctest/", "parquet") has no problem,
and tpcds.createResultsTable() commend has the same with tables.createExternalTables()
can you help me resove this problem?
from spark-sql-perf.
Hello everyone,
Need some help to run the benchmark. While executing the below query I am getting the attached exception in the spark shell. Please help me resolve this.
val experiment = tpcds.runExperiment(tpcds.interactiveQueries)
Results written to table: 'sqlPerformance' at /spark/sql/performance/timestamp=1489665992654
17/03/16 17:37:07 ERROR FileOutputCommitter: Mkdirs failed to create file:/spark/sql/performance/timestamp=1489665992654/_temporary/0
17/03/16 17:37:07 WARN TaskSetManager: Stage 171 contains a task of very large size (330 KB). The maximum recommended task size is 100 KB.
17/03/16 17:37:07 WARN TaskSetManager: Lost task 0.0 in stage 171.0 (TID 5124, 10.6.45.231, executor 0): java.io.IOException: Mkdirs failed to create file:/spark/sql/performance/timestamp=1489665992654/_temporary/0/_temporary/attempt_20170316173707_0171_m_000000_0 (exists=false, cwd=file:/home/taniya/spark/spark-2.1.0-bin-hadoop2.7/work/app-20170316172533-0001/0)
execution.docx
Attached is the full log.
**** The issue is resolved. The error was due to permission issue.
Thanks,
Tania
from spark-sql-perf.
@GalvinYang Thanks for your blog. It helped me a lot to get the test running!
@reshragh I am also facing the similar issue viewing the results. Is it resolved for you?
While retrieving results using tpcds.createResultsTable() it gives me createResultsTable is not a member of com.databricks.spark.sql.perf.tpcds.TPCDS error.
And I figured out from the source code, that there is no such method as createResultsTable in TPCDS.scala.
sqlContext.table("sqlPerformance") gives me org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'sqlperformance' not found in database 'xyz'. even though I got Results written to table: 'sqlPerformance' at /spark/sql/performance/timestamp=1489749887680.
I tried from the console to view the results by importing the json.
val df = spark.read.json("/spark/sql/performance/timestamp=1489749887680/part-00000-8d5f1472-0846-4ec5-81e1-358a7a271840.json")
df.show()
+--------------------+---------+--------------------+------+-------------+
| configuration|iteration| results| tags| timestamp|
+--------------------+---------+--------------------+------+-------------+
|[8,[file:/home/ta...| 1|[[5.54E-4,Wrapped...|[true]|1489749887680|
|[8,[file:/home/ta...| 2|[[5.55E-4,Wrapped...|[true]|1489749887680|
|[8,[file:/home/ta...| 3|[[6.49E-4,Wrapped...|[true]|1489749887680|
+--------------------+---------+--------------------+------+-------------+
But I am not able to interprete the results from here.
Is there any other way to retrieve the results? Any help is highly appretiated.
Thanks in advance!
from spark-sql-perf.
Hi @GalvinYang,
thanks for your blog, is this blog also available in english or any other blog like this if exist?
Thanks in advance
from spark-sql-perf.
Related Issues (20)
- How to put data into external storage?
- suitable exector-memory for spark-sql-perf testing
- Validating the correctness of results HOT 2
- The Query and Generate mismatch
- For spark-3.0.0, there is no method called org.apache.spark.sql.SQLContext.createExternalTable
- build errors due to dependencies HOT 1
- Spark 3.0.0 compile error
- Getting error when analyzing the columns
- Use CHAR/VARCHAR types in TPCDSTables HOT 2
- Error when trying to create binary from source code
- sbt run error with unresolved dependency
- NoSuchMethodError on Spark 3.1 in Databricks HOT 1
- sbt package failed with unresolved dependency HOT 5
- executor_per_core is fixed to 1 vCores in spark-sql-perf on EMR
- Build failed
- Compilation failed for Spark 3.2.0
- command "build/sbt .." failed with unresolved dependency HOT 1
- genData, the data isn`t stored the location I set. HOT 1
- genData,the tpchdata always stored in the dbgen directory.
- does it has plan to support tpcds 3.2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-sql-perf.