jianlirong / tpc-ds Goto Github PK
View Code? Open in Web Editor NEWThis project forked from runningjon/tpc-ds
HAWQ and Greenplum TPC-DS benchmark
This project forked from runningjon/tpc-ds
HAWQ and Greenplum TPC-DS benchmark
######################################################################################## TPC-DS benchmark scripts for HAWQ and Greenplum database. ######################################################################################## ######################################################################################## Prerequisites ######################################################################################## 1. Greenplum Database or Apache HAWQ installed and running 2. Connectivity is possible to the MASTER_HOST and from the Data Nodes / Segment Hosts 3. Root access ######################################################################################## Installation ######################################################################################## 1. ssh to the host that will be home to these scripts and files with root ssh root@mdw 2. Download the tpcds.sh file curl https://raw.githubusercontent.com/pivotalguru/TPC-DS/master/tpcds.sh > tpcds.sh chmod 755 tpcds.sh 3. Execute tpcds.sh and specify the scale in Gigabtes and the number of threads that will create the data ./tpcds.sh 100 8 This example will create a 100 GB TPC-DS dataset and run all of the queries. Note: Increase the number of threads when building larger datasets and the host has available cores to process the data. Aditionally, a quiet mode with no user prompts is available when an optional third parameter is provided with any value. Example: ./tpcds.sh 100 8 shhh ######################################################################################## Notes ######################################################################################## - tpch_variables.sh file will be created with variables you can adjust - The default location for the repo is /pivotalguru/TPC-DS but can be changed in the tpch_variables.sh file. - Additional storage options are in /pivotalguru/TPC-DS/functions.sh - Files for the benchmark will be created in /pivotalguru/TPC-DS/data[1-8]/ You can update these directories to be symbolic links to better utilize the disk volumes you have available. - Example of running tpcds as root as a background process: nohup ./tpcds.sh 3000 48 shh > tpcds.log 2>&1 < tpcds.log & ######################################################################################## TPC-DS Minor Modifications ######################################################################################## 1. Change to SQL queries that subtracted or added days were modified slightly: Old: and (cast('2000-02-28' as date) + 30 days) New: and (cast('2000-02-28' as date) + '30 days'::interval) This was done on queries: 12, 16, 20, 21, 32, 37, 40, 77, 80, 82, 92, 94, 95, and 98. 2. Change to queries with ORDER BY on column alias to use sub-select. Old: select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping(s_state)+grouping(s_county) as lochierarchy ,rank() over ( partition by grouping(s_state)+grouping(s_county), case when grouping(s_county) = 0 then s_state end order by sum(ss_net_profit) desc) as rank_within_parent from store_sales ,date_dim d1 ,store where d1.d_month_seq between 1212 and 1212+11 and d1.d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_state in ( select s_state from (select s_state as s_state, rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1212 and 1212+11 and d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk group by s_state ) tmp1 where ranking <= 5 ) group by rollup(s_state,s_county) order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent limit 100; New: select * from ( --new select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping(s_state)+grouping(s_county) as lochierarchy ,rank() over ( partition by grouping(s_state)+grouping(s_county), case when grouping(s_county) = 0 then s_state end order by sum(ss_net_profit) desc) as rank_within_parent from store_sales ,date_dim d1 ,store where d1.d_month_seq between 1212 and 1212+11 and d1.d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_state in ( select s_state from (select s_state as s_state, rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1212 and 1212+11 and d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk group by s_state ) tmp1 where ranking <= 5 ) group by rollup(s_state,s_county) ) AS sub --new order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent limit 100; This was done on queries: 36 and 70.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.