runningjon / tpc-ds Goto Github PK

Greenplum TPC-DS benchmark

Shell 4.38% HTML 2.64% Smarty 25.17% SQLPL 0.34% PLSQL 0.35% C 61.79% Makefile 2.18% Yacc 0.85% C++ 0.39% Lex 0.40% Objective-C 0.29% PLpgSQL 0.09% TSQL 1.13%

tpc-ds's Introduction

########################################################################################
TPC-DS benchmark scripts for Greenplum database.
########################################################################################
Supported versions:
Greenplum 4.3, 5.*, 6.*
Open Source Greenplum 5.*, 6.*
Beta: PostgreSQL 10.*

########################################################################################
TPC-DS Information
########################################################################################
Version 2.x now uses TPC-DS version 2.1 for the standard queries.  

Version 2.2.x now supports Greenplum version 5.x.

########################################################################################
Query Options
########################################################################################
You can have the queries execute with "EXPLAIN ANALYZE" in order to see exactly the 
query plan used, the cost, the memory used, etc.  This is done in tpcds_variables.sh
like this:
EXPLAIN_ANALYZE="true"

########################################################################################
Storage Options
########################################################################################
Table storage is defined in functions.sh and is configured for optimal performance.

########################################################################################
Prerequisites
########################################################################################
1. Greenplum Database or PostgreSQL 10.x
2. Connectivity is possible to the MASTER_HOST and from the Data Nodes / Segment Hosts
3. Root access

########################################################################################
Installation
########################################################################################
1. ssh to the master host with root
ssh gpadmin@mdw; sudo bash

2. Download the tpcds.sh file
curl https://raw.githubusercontent.com/pivotalguru/TPC-DS/master/tpcds.sh > tpcds.sh
chmod 755 tpcds.sh

########################################################################################
Variables and Configuration
########################################################################################
By default, the installation will create the scripts in /pivotalguru/TPC-DS on the 
Master host.  This can be changed by editing the dynamically configured 
tpcds_variables.sh file that is created the first time tpcds.sh is run.  

Also by default, TPC-DS files are generated on each Segment Host / Data Node in the 
Segement's PGDATA/pivotalguru directory.  If there isn't enough space in this directory
in each Segment, you can create a symbolic link to a drive location that does have 
enough space.

########################################################################################
Execution
########################################################################################
1. Execute tpcds.sh
nohup ./tpcds.sh > tpcds.log 2>&1 < tpcds.log &

########################################################################################
Notes
########################################################################################
- tpcds_variables.sh file will be created with variables you can adjust
- Files for the benchmark will be created in a sub-directory named pivotalguru located 
in each segment directory on each segment host / data node.
You can update these directories to be symbolic links to better utilize the disk 
volumes you have available.
- Example of running tpcds as root as a background process:

########################################################################################
TPC-DS Minor Modifications
########################################################################################
1.  Change to SQL queries that subtracted or added days were modified slightly:

Old:
and (cast('2000-02-28' as date) + 30 days)

New:
and (cast('2000-02-28' as date) + '30 days'::interval)

This was done on queries: 5, 12, 16, 20, 21, 32, 37, 40, 77, 80, 82, 92, 94, 95, and 98.

2.  Change to queries with ORDER BY on column alias to use sub-select.

Old: 
select  
    sum(ss_net_profit) as total_sum
   ,s_state
   ,s_county
   ,grouping(s_state)+grouping(s_county) as lochierarchy
   ,rank() over (
 	partition by grouping(s_state)+grouping(s_county),
 	case when grouping(s_county) = 0 then s_state end 
 	order by sum(ss_net_profit) desc) as rank_within_parent
 from
    store_sales
   ,date_dim       d1
   ,store
 where
    d1.d_month_seq between 1212 and 1212+11
 and d1.d_date_sk = ss_sold_date_sk
 and s_store_sk  = ss_store_sk
 and s_state in
             ( select s_state
               from  (select s_state as s_state,
 			    rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking
                      from   store_sales, store, date_dim
                      where  d_month_seq between 1212 and 1212+11
 			    and d_date_sk = ss_sold_date_sk
 			    and s_store_sk  = ss_store_sk
                      group by s_state
                     ) tmp1 
               where ranking <= 5
             )
 group by rollup(s_state,s_county)
 order by
   lochierarchy desc
  ,case when lochierarchy = 0 then s_state end
  ,rank_within_parent
 limit 100;

New:
select * from ( --new
select  
    sum(ss_net_profit) as total_sum
   ,s_state
   ,s_county
   ,grouping(s_state)+grouping(s_county) as lochierarchy
   ,rank() over (
 	partition by grouping(s_state)+grouping(s_county),
 	case when grouping(s_county) = 0 then s_state end 
 	order by sum(ss_net_profit) desc) as rank_within_parent
 from
    store_sales
   ,date_dim       d1
   ,store
 where
    d1.d_month_seq between 1212 and 1212+11
 and d1.d_date_sk = ss_sold_date_sk
 and s_store_sk  = ss_store_sk
 and s_state in
             ( select s_state
               from  (select s_state as s_state,
 			    rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking
                      from   store_sales, store, date_dim
                      where  d_month_seq between 1212 and 1212+11
 			    and d_date_sk = ss_sold_date_sk
 			    and s_store_sk  = ss_store_sk
                      group by s_state
                     ) tmp1 
               where ranking <= 5
             )
 group by rollup(s_state,s_county)
) AS sub --new
 order by
   lochierarchy desc
  ,case when lochierarchy = 0 then s_state end
  ,rank_within_parent
 limit 100;

This was done on queries: 36 and 70.

3. Query templates were modified to exclude columns not found in the query.  In these cases, the common 
table expression used aliased columns but the dynamic filters included both the alias name as well as the
original name.  Referencing the original column name instead of the alias causes the query parser to not
find the column. 

This was done on query 86.

4.  Added table aliases.
This was done on queries: 2, 14, and 23.

5.  Added "limit 100" to very large result set queries.  For the larger tests (e.g. 15TB), a few of the 
TPC-DS queries can output a very large number of rows which are just discarded.  
This was done on queries: 64, 34, and 71.

tpc-ds's People

Contributors

Stargazers

Watchers

Forkers

cgvarela digoal addisonhuddy jianlirong skahler-vmware xuegang tspannhw zhaodiankui wowjason shangshujie365 iamlyg98 joe2hpimn dcomingore yydzero jemishp hasonhai arsenvlad huxiao64 robins phan-pivotal nunb tom-deng vaquarkhan mixergit mindis szhang1 zhangteng5513 xinchun-wang mdivk dawson2000 nilkanthpatel04 tingweipei lwz9103 mineminemine123 bluejoyer ysung-pivotal owlstudio-ustb ranchoyuan malikbilal pivotaldata rz1970 aubao edespino berlin-ab yangru jernster southpolekat morganwu277 mengjin001 frastellin jimmy777 jiannengsun diarworld tvesely kurt023-github dystudio renjunlong-1 chenbaggio lichongjie lgtianxiawudi ivannovick l-wang xxgoracle clxman phoenixhadoop dshadowzh jerryliu306 se7enshidb gordonyan777 plaidcloud galler-alexander greenbeam2002 gaoyao vo0v ivanievlev zhengtongyan fruitboy1226 tibibok yibowang20 arenadata momo1578 mikekilpatrick dulong41 enchanterblue nandreya gitsurmangen

tpc-ds's Issues

Generate data step hangs

I seem to be experiencing the same issue described in issue #8. The generate data steps seems to hang, and the same .dat files listed in that issue are empty for me as well. I have confirmed connectivity to all the nodes using gpssh-exkeys, and I'm not able to follow what exactly the fix to the issue was by @dimon777. Is there any guidance or additional info available?

execute tpcds.sh error

executing on kslave2 ./start_gpfdist.sh 9001 /data/hawq/segment/pivotalguru_1
Started gpfdist on port 9001
executing on kslave2 ./start_gpfdist.sh 9002 /data/hawq/segment/pivotalguru_2
Started gpfdist on port 9002
executing on kslave2 ./start_gpfdist.sh 9003 /data/hawq/segment/pivotalguru_3
Started gpfdist on port 9003
executing on kslave2 ./start_gpfdist.sh 9004 /data/hawq/segment/pivotalguru_4
Started gpfdist on port 9004
executing on kslave2 ./start_gpfdist.sh 9005 /data/hawq/segment/pivotalguru_5
Started gpfdist on port 9005
executing on kslave2 ./start_gpfdist.sh 9006 /data/hawq/segment/pivotalguru_6
Started gpfdist on port 9006
executing on kslave1 ./start_gpfdist.sh 9001 /data/hawq/segment/pivotalguru_1
Started gpfdist on port 9001
executing on kslave1 ./start_gpfdist.sh 9002 /data/hawq/segment/pivotalguru_2
Started gpfdist on port 9002
executing on kslave1 ./start_gpfdist.sh 9003 /data/hawq/segment/pivotalguru_3
Started gpfdist on port 9003
executing on kslave1 ./start_gpfdist.sh 9004 /data/hawq/segment/pivotalguru_4
Started gpfdist on port 9004
executing on kslave1 ./start_gpfdist.sh 9005 /data/hawq/segment/pivotalguru_5
Started gpfdist on port 9005
executing on kslave1 ./start_gpfdist.sh 9006 /data/hawq/segment/pivotalguru_6
Started gpfdist on port 9006
psql -v ON_ERROR_STOP=ON -f /pivotalguru/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/pivotalguru/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Should 02_init/rollout.sh set search path for ADMIN_USER?

It is running "ALTER USER $USER SET search_path=...", but I think it should be "ALTER USER $ADMIN_USER SET search_path..." as in fact the script is run by ADMIN_USER via "su -l $ADMIN_USER". This will cause that no queries got run when ADMIN_USER is not the login user.

execute on 04_load error

tpcds.log is :

psql -v ON_ERROR_STOP=ON -f /data0/servers/tpcds/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/data0/servers/tpcds/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR:  Could not allocate enough memory! bucket number of result hash table and external table should match each other (cdbdatalocality.c:4225)

hawq-site.xml:

        <property>
                <name>hawq_rm_nvseg_perquery_limit</name>
                <value>512</value>
        </property>

        <property>
                <name>hawq_rm_nvseg_perquery_perseg_limit</name>
                <value>6</value>
        </property>

Thank you

what's the difference with score and qphds

When run tpcds.sh on our greenplum cluster, we get the result bellow:
/data/pivotalguru/TPC-DS/09_score/rollout.sh
Scale Factor 3000
Load 1472.137763
Analyze 170.170859
1 User Queries 74.69139
Concurrent Queries 3100.06968
Q 1485
TPT 373.45695
TTT 3100.06968
TLD 73.606888
Score 670

We are confused on score result. Because we don't know the score meaning, this result is good or bad?

For tpc-ds standard , it use QphDS to measure the cluster's performance. How can I convert score to qphds?

imp option - clarification needed

Does imp option performs test of Impala queries on HAWQ or it performs testing of Impala queries on Impala?

I've run:

./rollout.sh 1 false imp true 5 true true true true true true true true true 1

and see that it initiated tpcds workflow going against HAWQ. I was expecting it to run TPC-DS test against Impala. Please clarify.

I have figured this out: all 3 types of queries are executed against HAWQ, which makes sense. Closing. Thanks.

relation "pg_filespace_entry" does not exist

Greenplum version: PostgreSQL 9.4beta1 (Greenplum Database 6.0.0-alpha.0 build dev-oss) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3), 64-bit compiled on Nov 12 2018 03:35:52

this issue happened both in tpc-h and tpc-ds

请教问题

请问为什么会执行脚本的时候要求输出密码呢？
grammar_support.o keywords.o list.o nulls.o permute.o print.o QgenMain.o query_handler.o r_params.o scaling.o StringBuffer.o substitution.o tdefs.o text.o tokenizer.o w_inventory.o y.tab.o release.o scd.o build_support.o parallel.o -lm
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -c -o checksum.o checksum.c
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -o checksum checksum.o -lm
Password:
gsql: FATAL: Invalid username/password,login denied.

ERROR: http response code 404 from gpfdist

Hello
I am getting an error while running the script.
Help me figure it out
Sincerely, Sergey Berezin.

stop gpfdist on all ports
92962
killing 93075
killing 93128
93166
killing 93204
280453
killing 280505
killing280543
killing280581
killing280670
killing290935
killing291026
killing291069
killing291107
killing291145
executing on vitrina02 ./start_gpfdist.sh 4001 /data1/primary/gpseg0/pivotalguru
Started gpfdist on port 4001
executing on vitrina02 ./start_gpfdist.sh 4002 /data1/primary/gpseg1/pivotalguru
Started gpfdist on port 4002
executing on vitrina02 ./start_gpfdist.sh 4003 /data1/primary/gpseg2/pivotalguru
Started gpfdist on port 4003
executing on vitrina02 ./start_gpfdist.sh 4004 /data1/primary/gpseg3/pivotalguru
Started gpfdist on port 4004
executing on vitrina02 ./start_gpfdist.sh 4005 /data1/primary/gpseg4/pivotalguru
Started gpfdist on port 4005
executing on vitrina03 ./start_gpfdist.sh 4001 /data1/primary/gpseg5/pivotalguru
Started gpfdist on port 4001
executing on vitrina03 ./start_gpfdist.sh 4002 /data1/primary/gpseg6/pivotalguru
Started gpfdist on port 4002
executing on vitrina03 ./start_gpfdist.sh 4003 /data1/primary/gpseg7/pivotalguru
Started gpfdist on port 4003
executing on vitrina03 ./start_gpfdist.sh 4004 /data1/primary/gpseg8/pivotalguru
Started gpfdist on port 4004
executing on vitrina03 ./start_gpfdist.sh 4005 /data1/primary/gpseg9/pivotalguru
Started gpfdist on port 4005
executing on vitrina04 ./start_gpfdist.sh 4001 /data1/primary/gpseg10/pivotalguru
Started gpfdist on port 4001
executing on vitrina04 ./start_gpfdist.sh 4002 /data1/primary/gpseg11/pivotalguru
Started gpfdist on port 4002
executing on vitrina04 ./start_gpfdist.sh 4003 /data1/primary/gpseg12/pivotalguru
Started gpfdist on port 4003
executing on vitrina04 ./start_gpfdist.sh 4004 /data1/primary/gpseg13/pivotalguru
Started gpfdist on port 4004
executing on vitrina04 ./start_gpfdist.sh 4005 /data1/primary/gpseg14/pivotalguru
Started gpfdist on port 4005
psql -v ON_ERROR_STOP=1 -f /pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql:1: ERROR: http response code 404 from gpfdist (gpfdist://vitrina03:4002/time_dim_[0-9]_[0-9 ].dat): HTTP/1.0 404 file not found (seg6 slice1 192.168.11.24:10001 pid=97255)

Sharing TPC-DS test results of HAWQ & SparkSQL

@pivotalguru

Hi Jon,

I ran TPC-DS Benchmark for Hawq & SparkSQL on Hadoop. Could you please help me verify the numbers?

Ref:
https://github.com/pivotalguru/TPC-DS
https://github.com/databricks/spark-sql-perf

A sub-set of 19 queries were executed to evaluate the performance of Hawq & Spark-SQL.

I have attached the system setup and the results of the tests. Please share your comments.
hawq-vs-spark-verify.pdf

Thanks,
Tania

Canceling query because of high VMEM usage.

Hi.
I met Error during the test of TPC-DS and can't continue now:

ERROR:  Canceling query because of high VMEM usage. Used: 4613MB, available 507MB, red zone: 4608MB (runaway_cleaner.c:189)  (seg19 slice5 192.168.65.11:6007 pid=55634) (cdbdisp.c:254)

I tried to modify a lot of parameters but still have no effect, e.g:

gp_vmem_protect_limit：5120
shared_buffers：128MB
work_mem：32MB
maintenance_work_mem：64MB
effective_cache_size：512MB

I set gp_vmem_protect_limit to 8192, 12288, 15360 by gpconfig but still not work.

Env

greenplum version: 5.1.8
operating system: CentOS Linux release 7.3.1611 (Core)

There are 3 nodes in my Greenplum cluster.

node6: 256G mem, 12 * 3.7T disk
node7: 256G mem, 12 * 3.7T disk
node11: 128G mem, 12 * 3.7T disk

[gpadmin@node6 TPC-DS]$ df -h
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cl-root   50G  7.3G   43G  15% /
devtmpfs             126G     0  126G   0% /dev
tmpfs                126G     0  126G   0% /dev/shm
tmpfs                126G  9.4M  126G   1% /run
tmpfs                126G     0  126G   0% /sys/fs/cgroup
/dev/sdc1           1014M  140M  875M  14% /boot
/dev/sdd             3.7T  307G  3.4T   9% /data/data9
/dev/sdl1            3.7T  307G  3.4T   9% /data/data6
/dev/sdj             3.7T  307G  3.4T   9% /data/data11
/dev/sdg1            3.7T  308G  3.4T   9% /data/data2
/dev/sdk1            3.7T  307G  3.4T   9% /data/data5
/dev/sdf             3.7T  307G  3.4T   9% /data/data10
/dev/sdi1            3.7T  307G  3.4T   9% /data/data4
/dev/sdm             3.7T  306G  3.4T   9% /data/data12
/dev/sdb             3.7T  307G  3.4T   9% /data/data8
/dev/sde1            3.7T  2.1T  1.6T  58% /data/data1
/dev/sda             3.7T  307G  3.4T   9% /data/data7
/dev/sdh1            3.7T  307G  3.4T   9% /data/data3
/dev/mapper/cl-home  392G   29G  364G   8% /home
tmpfs                 26G     0   26G   0% /run/user/0
tmpfs                 26G     0   26G   0% /run/user/1000

node6:
              total        used        free      shared  buff/cache   available
Mem:           251G        2.0G        243G        5.3G        5.5G        243G
Swap:          4.0G          0B        4.0G

node7:
              total        used        free      shared  buff/cache   available
Mem:           251G        2.0G        243G        5.3G        5.5G        243G
Swap:          2.2G          0B        2.2G

node11:
              total        used        free      shared  buff/cache   available
Mem:           125G        1.3G        118G        5.1G        5.3G        118G
Swap:          2.2G          0B        2.2G

tpcds_variables.sh

REPO="TPC-DS"
REPO_URL="https://github.com/pivotalguru/TPC-DS"
ADMIN_USER="gpadmin"
INSTALL_DIR="/pivotalguru"
EXPLAIN_ANALYZE="false"
RANDOM_DISTRIBUTION="false"
MULTI_USER_COUNT="5"
GEN_DATA_SCALE="1024"
SINGLE_USER_ITERATIONS="1"
RUN_COMPILE_TPCDS="false"
RUN_GEN_DATA="true"
RUN_INIT="true"
RUN_DDL="true"
RUN_LOAD="true"
RUN_SQL="true"
RUN_SINGLE_USER_REPORT="true"
RUN_MULTI_USER="true"
RUN_MULTI_USER_REPORT="true"
RUN_SCORE="true"

Error log

tail tpcds.log

20190802:18:49:05:011726 analyzedb:node6:gpadmin-[INFO]:-Done.
/pivotalguru/TPC-DS/05_sql/rollout.sh
source ~/.bashrc

psql -v ON_ERROR_STOP=1 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /pivotalguru/TPC-DS/05_sql/101.tpcds.01.sql | wc -l
psql -v ON_ERROR_STOP=1 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /pivotalguru/TPC-DS/05_sql/102.tpcds.02.sql | wc -l
psql:/pivotalguru/TPC-DS/05_sql/102.tpcds.02.sql:59: ERROR:  Canceling query because of high VMEM usage. Used: 4610MB, available 510MB, red zone: 4608MB (runaway_cleaner.c:189)  (seg32 slice2 192.168.65.7:6008 pid=24111) (cdbdisp.c:254)

note:
the 59th line of 102.tpcds.02.sql is :

order by d_week_seq1;

Observation

I observed memory changes during execute 05_sql/102.tpcds.02.sql separately in psql, here are three stage:

In the beginning, the free memory of all nodes drops at approximately 1G.
- node6: 239G -> 120G
- node7: 239G -> 120G
- node11: 111G -> 320M
When node11 drops to 320M, it starts to stay the same, and the other two nodes continue to decline
- node6: 120G -> 90G
- node7: 120G -> 90G
- node11: keep at 320M
node11 occurs Error
- if set gp_vmem_protect_limit to 8192, 12288, its error message like this:

ERROR:  Canceling query because of high VMEM usage. Used: 4613MB, available 507MB, red zone: 4608MB (runaway_cleaner.c:189)  (seg19 slice5 192.168.65.11:6007 pid=55634) (cdbdisp.c:254)

if set gp_vmem_protect_limit to 15360, its error message is "out of memory, cannot allocat ....8192...":

please help

I also tried to modify the resource queue parameters, GEN_DATA_SCALE, but still can not be completed.

What is the difference between data overflowing into the spill file and data in vmem? When will it overflow and when will it be cancelled?

I don't know how to finish the TPC-DS test. please help. :simple_smile:

ERROR: could not open file "../log/rollout_gen_data.log" for reading: No such file or directory

Whatever we did to surpass this problem gave no results so far. Help needed...

This week we've been trying to get some scoring on our Greenplum 6.4.0 test cluster, faced a few challenges with memory and disk space, that was expected... but this is the error that stopped any further progress:

psql -v ON_ERROR_STOP=1 -a -f /pivotalguru/TPC-DS/06_single_user_reports/07.copy.gen_data.sql -v LOGFILE="'/pivotalguru/TPC-DS/06_single_user_reports/../log/rollout_gen_data.log'"
COPY tpcds_reports.gen_data FROM :LOGFILE WITH DELIMITER '|';
psql:/pivotalguru/TPC-DS/06_single_user_reports/07.copy.gen_data.sql:1: ERROR:  could not open file "/pivotalguru/TPC-DS/06_single_user_reports/../log/**rollout_gen_data.log**" for reading: No such file or directory

I tried all sorts of options in tpcds_variables.sh, but it seems that apart from EXPLAIN_ANALYZE logging any other logs don't work as expected. If this is based on where Greenplum stores its own log files -- that changed quite a bit from version 5.x to 6.x

Has anyone ever confirmed this to work with 6.x or this is specific to our environment? Greenplum OSS was deployed with go-gpdb installer. To provide expected env variables I had to create greenplum_path.sh and source it in .bashrc for tpcds.sh to start correctly (go-gpdb originally creates a different env file and sources it in /etc/profile.d/gpdb.profile.sh)

My greenplum_path.sh:

export GPHOME=/usr/local/greenplum-db-6.4.0
export PYTHONPATH=$GPHOME/lib/python
export PYTHONHOME=$GPHOME/ext/python
export PATH=$GPHOME/bin:$PYTHONHOME/bin:$PATH
export LD_LIBRARY_PATH=$GPHOME/lib:$PYTHONHOME/lib:$LD_LIBRARY_PATH
export OPENSSL_CONF=$GPHOME/etc/openssl.cnf
export MASTER_DATA_DIRECTORY=/data/master/gp_6.4.0_20200320124955-1
export PGPORT=3002
export PGDATABASE=gpadmin
export singleOrMulti=multi

Can not execute tpcds.sh in offline environments

Hi,

We try to execute TPC-DS tests in offline GPDB clusters but I stuck at external network check GitHub access check.

I guess it's caused by script_check() in tpcds.sh script.

It would be great if someone can improve TPC-DS scripts to make it workable to run in offline environments smoothly.

Thanks.

Setting RUN_COMPILE_TPCDS="false" does not disable compiling

Hi,

I find that setting RUN_COMPILE_TPCDS=false does not disable compiling.

I guess that 00_compile_tpcds/rollout.sh does not check the variable.

This is a bit inconvenient and hope this can be improved.

Thanks.

Session report not avaialbe

I've run a very small test (just to confirm everything works). I see this in the terminal output (99 rows):

SET
Timing is on.
query_id        session_1       session_2       session_3       session_4       session_5       session_6       session_7       session_8     session_9        session_10
01      1.1033  1.1025  1.1017  1.1031  0.965   0       0       0       0       0
02      47.47847        16.16351        3.3238  3.3301  3.3333  0       0       0       0       0
03      0.859   1.147   1.1487  1.1455  1.1648  0       0       0       0       0
04      4.4209  7.7177  7.7695  5.5675  6.6685  0       0       0       0       0
05      3.3159  3.3548  3.3931  3.33    3.3234  0       0       0       0       0
06      1.1655  1.1672  1.1722  1.1707  1.1647  0       0       0       0       0
...
...
98      0.736   0.763   0.733   0.722   0.772   0       0       0       0       0
99      2.2103  2.2241  2.2124  2.2235  2.2242  0       0       0       0       0
(99 rows)
Time: 9.187 ms

Is this query execution time in seconds for each session? Is there a way to obtain this report in the log directory? I can't find this report in there.

Selected scale factor is NOT valid && Connection timed out

Hello. I am getting some errors, described below. And because of this, the script is not finalized. Please help me to sort out the problem.

psql:/date/TPC-DS/05_sql/171.tpcds.71.sql:40: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.3.11.38" and accepting
TCP/IP connections on port 10029?
(seg77 10.3.11.38:10029)

psql -v ON_ERROR_STOP=0 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /date/TPC-DS/05_sql/172.tpcds.72.sql | wc -l
psql:/date/TPC-DS/05_sql/172.tpcds.72.sql:28: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.3.11.38" and accepting
TCP/IP connections on port 10006?
(seg54 10.3.11.38:10006)

Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
Warning: Selected scale factor is NOT valid for result publication
stream_number: 1
mv /date1/TPC-DS/07_multi_user/query_0.sql /date1/TPC-DS/07_multi_user/1/
stream_number: 2
mv /date1/TPC-DS/07_multi_user/query_1.sql /date1/TPC-DS/07_multi_user/2/
stream_number: 3
mv /date1/TPC-DS/07_multi_user/query_2.sql /date1/TPC-DS/07_multi_user/3/
stream_number: 4
mv /date1/TPC-DS/07_multi_user/query_3.sql /date1/TPC-DS/07_multi_user/4/
stream_number: 5
mv /date1/TPC-DS/07_multi_user/query_4.sql /date1/TPC-DS/07_multi_user/5/
/date1/TPC-DS/07_multi_user/test.sh 5000 1 false
/date1/TPC-DS/07_multi_user/test.sh 5000 2 false
/date1/TPC-DS/07_multi_user/test.sh 5000 3 false
/date1/TPC-DS/07_multi_user/test.sh 5000 4 false
/date1/TPC-DS/07_multi_user/test.sh 5000 5 false

Unable to load more than 50GB data in hdfs through tcpds script

Hi,
I am running hawq tpc-ds benchmarks. I have a cluster with 1 master(512 GB RAM) and 6 segments(130GB RAM/segment). I am using tpcds.sh script to load data in hdfs. I am able to run the tests successfully till 50 GB. But beyond that I am getting the many errors in the segmentdd/pg_logs. Attached is the log.

_"3rd party error log:
2017-02-21 13:39:21.899176, p205200, th139795568425216, ERROR Pipeline: Failed to build pipeline for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, new generation stamp is 0,
Pipeline.cpp: 240: HdfsIOException: Failed to add new datanode into pipeline for block: [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, set ""output.replace-datanode-on-failure"" to ""false"" to disable this feature.
@ Hdfs::Internal::PipelineImpl::
hawq-2017-02-21_130521.txt****

buildForAppendOrRecovery(bool)
@ Hdfs::Internal::PipelineImpl::send(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::sendPacket(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::appendInternal(char const*, long)
@ Hdfs::Internal::OutputStreamImpl::append(char const*, long)
@ hdfsWrite**
--------------------------------------------------_**
"3rd party error log:
2017-02-21 13:39:21.977826, p205198, th139795568425216, ERROR Failed to flush pipeline on datanode ILDSS8(192.168.4.12) for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832864_273847] file /hawq_default/16385/16508/18775/10.
TcpSocket.cpp: 69: HdfsNetworkException: Read 8 bytes failed from ""192.168.4.12:50010"": (errno: 104) Connection reset by peer
@ Hdfs::Internal::TcpSocketImpl::read(char*, int)
@ Hdfs::Internal::BufferedSocketReaderImpl::readVarint32(int, int)
@ Hdfs::Internal::PipelineImpl::processResponse()
@ Hdfs::Internal::PipelineImpl::checkResponse(bool)
@ Hdfs::Internal::PipelineImpl::waitForAcks(bool)
@ Hdfs::Internal::OutputStreamImpl::flushInternal(bool)
@ Hdfs::Internal::OutputStreamImpl::sync()

Do I need to set any parameter in hadoop or tpcds configuration files, or my system parameters need tuning? Not sure if I am missing anything. Please help.

Thanks,
Tania

hawq-2017-02-21_130521.txt

Very poor HDFS throughput

I am observing very low HDFS throughput on queries on 9 nodes cluster, 100GB test run:
Observations:

test run as: ./rollout.sh 100 false tpcds true 5 true true true true true true true true true 1
~90MB/sec for read and writes clusterwide. I've seen 1.9GB/sec during dataload and table analyze phase.
72 postgres processes on each datanode and they consume 80% - 90% of CPU each and 0% MEM, doing very little I/O.
35036 files on HDFS with 2MB size per each file. Is this normal? Our block size is 128MB
Top example on one of the nodes:

Cpu(s):  0.0%us, 77.4%sy,  0.9%ni, 21.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264403536k total, 172780840k used, 91622696k free,  2986328k buffers
Swap:  4194300k total,        0k used,  4194300k free, 155959360k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
486160 gpadmin   39  19  872m  28m  10m D 88.7  0.0   4:32.52 postgres
486926 gpadmin   39  19  872m  28m  10m R 87.4  0.0   4:34.75 postgres
486405 gpadmin   39  19  872m  28m  10m R 86.4  0.0   4:23.99 postgres
487162 gpadmin   39  19  872m  28m  10m R 80.4  0.0   4:30.14 postgres
486761 gpadmin   39  19  872m  28m  10m R 78.8  0.0   4:28.41 postgres
486256 gpadmin   39  19  872m  28m  10m D 76.5  0.0   4:30.63 postgres

Action plan:

Revisit memory configuration for HAWQ
Look at explain plan for queries
Look at vsegs (what exactly?, as I am not so hands on on GP/Hawq)
Where can I start looking to troubleshoot this and what could be the problem here?

hawq_rm_nvseg_perquery_perseg_limit clarification

This is not an issue but a question:
hawq_rm_nvseg_perquery_perseg_limit parameter what is the guideline to set this parameter? I have read documentation on it but don't fully understand implications. For example if I have 40 CPU datanode should I set this parameter to a much higher value than default 6, say 12, 24, 32?

Could not allocate enough memory

I got this error when running rollout.sh:

Started gpfdist on port 9005
executing on hadoop01.home.net ./start_gpfdist.sh 9006 /data/hawq/segmentdd/pivotalguru_6
Started gpfdist on port 9006
psql -v ON_ERROR_STOP=ON -f /home/hawq/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/home/hawq/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR:  Could not allocate enough memory! bucket number of result hash table and external table should match each other (cdbdatalocality.c:4203)

I am using latest hawq repository code. CentOS 7, 8 node CDH Hadoop distribution. Everything seems to run just fine, including Hawq itself.
Please advise.

Generating data takes long time

I am running on RHEL 6.5 (Note 1GB dataset)

./rollout.sh 1 false tpcds false 5 true true true true true true true true true 1

And it already took 5 hours:

Now generating data.  This make take a while.
Generating data......................................................................................................................................................

Where I can check for logs and what maybe wrong?

Changes in Postgresql.conf causing to Stop Greenplum

Have run this test successfully on Greenplum 5.16, however on Greenplum 4.3, because of bug, it is causing Greenplum startup failure, with below message:
2019-04-04 07:23:51.931038 GMT,,,p28056,th845141952,,,,0,,,seg-1,,,,,"FATAL","22023","ORCA is not supported by this build",,,,,,,,"assign_optimizer","guc_gp.c",5957,

On checking found that script(02_init/rollout.sh), inserts multiple records in postgresql.conf.
And below inserted line, causes Greenplum startup failure:
optimizer=on

This is done through below code:
/pivotalguru/TPC-DS/02_init/rollout.sh(L47)
psql -v ON_ERROR_STOP=1 -q -t -A -c "show optimizer" | grep -i "on" | wc -l; exit ${PIPESTATUS[0

Creating socket failed during dataload

All 6 gpfdist logs show this message:

[2017-01-10 11:50:45] [WRN /tmp/build/78017950/hdb_apache/src/bin/gpfdist/src/gpfdist/gpfdist.c:2068] Creating the socket failed

Serving HTTP on port 9006, directory /opt/gpadmin/segmentdd/pivotalguru_6
[2017-01-10 11:54:14] [WRN /tmp/build/78017950/hdb_apache/src/bin/gpfdist/src/gpfdist/gpfdist.c:2187] signal 15 received. gpfdist exits

Is this error critical? The data seem to loaded fine.

Thanks.

Insufficient memory reserved for statement

When I run the script, the 4th query ( 104.tpcds.04.sql ) throws me the following error:

psql: TPC-DS/05_sql/104.tpcds.04.sql:115: ERROR: insufficient memory reserved for statement
HINT: Increase statement memory or reduce the number of Parquet tables to be scanned.

runningjon / tpc-ds Goto Github PK

tpc-ds's Introduction

tpc-ds's People

Contributors

Stargazers

Watchers

Forkers

tpc-ds's Issues

Env

Error log

Observation

please help

Recommend Projects

Recommend Topics

Recommend Org