The official TPC-DS tools can be found at tpc.org.
This tpcds-kit is ported from gregrahn/tpcds-kit, thanks to gregrahn, this version is based on v2.7.0 and has been modified to:
- Allow compilation under macOS (commit 2ec45c5)
- Address obvious query template bugs like
Make sure the required development tools are installed:
- Ubuntu:
sudo apt-get install gcc make flex bison byacc git
- CentOS/RHEL:
sudo yum install gcc make flex bison byacc git
- MacOS:
xcode-select --install
git clone https://github.com/huaj1101/tidb-tpcds-kit.git
- Linux:
cd tools && make OS=LINUX -j8 && cd -
- MacOS:
cd tools && make OS=MACOS -j8 && cd -
Data generation is done via dsdgen
:
# "-sc" is used to specify the volume of data to generate in GB.
cd tools && ./dsdgen -sc 100 -f && cd -
mysql -h 192.168.6.128 -P 4000 -u root -p123 -D test -e "drop database if exists tpcds;"
mysql -h 192.168.6.128 -P 4000 -u root -p123 -D test -e "create database tpcds;"
mysql -h 192.168.6.128 -P 4000 -u root -p123 -D tpcds < tools/tpcds.sql
./load_data.sh
mysql -h 192.168.6.128 -P 4000 -u root -p123 -D tpcds < tools/analyze_tables.sql mysql -h 192.168.6.128 -P 4000 -u root -p123 -D tpcds < tools/set_variables.sql
Sample queries from the 100GB scale factor can be found in the queries/
directory. The query-templates/
directory contains the Apache Impala TPC-DS query templates which can be used with dsqgen
(found in the official TPC-DS tools) to generate queries for other scale factors or to generate more queries with different substitution variables.