The IBM Spark Technology Center has made 86 (out of 99 TPCDS business queries) to run

Can we put all working queries into this test kit? There are 86 out of 99 working in Spark 1.5 about spark-sql-perf HOT 9 CLOSED

jfchen commented on July 4, 2024

Can we put all working queries into this test kit? There are 86 out of 99 working in Spark 1.5

from spark-sql-perf.

Comments (9)

marmbrus commented on July 4, 2024

It would be great to add these. I think the biggest barrier here is going to be adapting the data generation code to produce all of the fact tables. Right now, we use the streaming mode of dsdgen and I don't think you can do this for the tables that have foreign key dependencies.

from spark-sql-perf.

jfchen commented on July 4, 2024

What do you think about de-coupling dsdgen from the test kit itself, and
simply provide instructions on how to run dsdgen by itself?

Is it correct to assume that most users are probably using Hive and have
created these tables and loaded data already?

For our tests, we used externally generated data on HDFS (path passed in),
created DF and used csv to load data, like this:

def importTable(sqlContext: SQLContext, filename: String, schema:
StructType, tablename: String) {
val df = sqlContext.read.format("com.databricks.spark.csv").
schema(schema).option("delimiter", "|").load(filename)
df.registerTempTable(tablename)
}

A few queries do have modifications -- thought to mention that but should
be good enough for this kit.

Will package the queries up and send it soon.

From: Michael Armbrust [email protected]
To: databricks/spark-sql-perf [email protected]
Cc: Jesse F Chen/San Francisco/IBM@IBMUS
Date: 09/17/2015 12:26 PM
Subject: Re: [spark-sql-perf] Can we put all working queries into this
test kit? There are 86 out of 99 working in Spark 1.5 (#23)

It would be great to add these. I think the biggest barrier here is going
to be adapting the data generation code to produce all of the fact tables.
Right now, we use the streaming mode of dsdgen and I don't think you can do
this for the tables that have foreign key dependencies.

—
Reply to this email directly or view it on GitHub.

from spark-sql-perf.

marmbrus commented on July 4, 2024

We don't necessarily need to block adding the queries on adding the data generation, but in my experience generating larger scale factors (SF1500 - SF15000) is actually a significant challenge. So I would defiantly like to add support for generating them in the context of a Spark job.

from spark-sql-perf.

jfchen commented on July 4, 2024

Definitely nice to have data generation done in a Spark job. What the best way to upload a gzip file containing all 86 queries in text files?

from spark-sql-perf.

marmbrus commented on July 4, 2024

I wouldn't upload them as a zip file. I'd do one of the following:

Add the files in src/main/resources/... and create a harness that reads them from the classloader and creates query objects for each. Put this as another trait in the tpcds directory.
Hard code them as strings as we have in the other tpcds files

from spark-sql-perf.

0x0ece commented on July 4, 2024

Do you have any update in this? I'd be interested in testing out the new queries... Thanks! E.

from spark-sql-perf.

jfchen commented on July 4, 2024

This is still being worked on. Stay tuned please. We will implement this as first option from Michael's comment above - that makes sense.

from spark-sql-perf.

0x0ece commented on July 4, 2024

News? :) I may have some free time in the next days, if you could PR the queries I can have a look at how to add some scala glue... thanks!

from spark-sql-perf.

0x0ece commented on July 4, 2024

Great job!

from spark-sql-perf.

Can we put all working queries into this test kit? There are 86 out of 99 working in Spark 1.5 about spark-sql-perf HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent