runningjon / tpc-ds Goto Github PK
View Code? Open in Web Editor NEWGreenplum TPC-DS benchmark
Greenplum TPC-DS benchmark
######################################################################################## TPC-DS benchmark scripts for Greenplum database. ######################################################################################## Supported versions: Greenplum 4.3, 5.*, 6.* Open Source Greenplum 5.*, 6.* Beta: PostgreSQL 10.* ######################################################################################## TPC-DS Information ######################################################################################## Version 2.x now uses TPC-DS version 2.1 for the standard queries. Version 2.2.x now supports Greenplum version 5.x. ######################################################################################## Query Options ######################################################################################## You can have the queries execute with "EXPLAIN ANALYZE" in order to see exactly the query plan used, the cost, the memory used, etc. This is done in tpcds_variables.sh like this: EXPLAIN_ANALYZE="true" ######################################################################################## Storage Options ######################################################################################## Table storage is defined in functions.sh and is configured for optimal performance. ######################################################################################## Prerequisites ######################################################################################## 1. Greenplum Database or PostgreSQL 10.x 2. Connectivity is possible to the MASTER_HOST and from the Data Nodes / Segment Hosts 3. Root access ######################################################################################## Installation ######################################################################################## 1. ssh to the master host with root ssh gpadmin@mdw; sudo bash 2. Download the tpcds.sh file curl https://raw.githubusercontent.com/pivotalguru/TPC-DS/master/tpcds.sh > tpcds.sh chmod 755 tpcds.sh ######################################################################################## Variables and Configuration ######################################################################################## By default, the installation will create the scripts in /pivotalguru/TPC-DS on the Master host. This can be changed by editing the dynamically configured tpcds_variables.sh file that is created the first time tpcds.sh is run. Also by default, TPC-DS files are generated on each Segment Host / Data Node in the Segement's PGDATA/pivotalguru directory. If there isn't enough space in this directory in each Segment, you can create a symbolic link to a drive location that does have enough space. ######################################################################################## Execution ######################################################################################## 1. Execute tpcds.sh nohup ./tpcds.sh > tpcds.log 2>&1 < tpcds.log & ######################################################################################## Notes ######################################################################################## - tpcds_variables.sh file will be created with variables you can adjust - Files for the benchmark will be created in a sub-directory named pivotalguru located in each segment directory on each segment host / data node. You can update these directories to be symbolic links to better utilize the disk volumes you have available. - Example of running tpcds as root as a background process: ######################################################################################## TPC-DS Minor Modifications ######################################################################################## 1. Change to SQL queries that subtracted or added days were modified slightly: Old: and (cast('2000-02-28' as date) + 30 days) New: and (cast('2000-02-28' as date) + '30 days'::interval) This was done on queries: 5, 12, 16, 20, 21, 32, 37, 40, 77, 80, 82, 92, 94, 95, and 98. 2. Change to queries with ORDER BY on column alias to use sub-select. Old: select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping(s_state)+grouping(s_county) as lochierarchy ,rank() over ( partition by grouping(s_state)+grouping(s_county), case when grouping(s_county) = 0 then s_state end order by sum(ss_net_profit) desc) as rank_within_parent from store_sales ,date_dim d1 ,store where d1.d_month_seq between 1212 and 1212+11 and d1.d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_state in ( select s_state from (select s_state as s_state, rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1212 and 1212+11 and d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk group by s_state ) tmp1 where ranking <= 5 ) group by rollup(s_state,s_county) order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent limit 100; New: select * from ( --new select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping(s_state)+grouping(s_county) as lochierarchy ,rank() over ( partition by grouping(s_state)+grouping(s_county), case when grouping(s_county) = 0 then s_state end order by sum(ss_net_profit) desc) as rank_within_parent from store_sales ,date_dim d1 ,store where d1.d_month_seq between 1212 and 1212+11 and d1.d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_state in ( select s_state from (select s_state as s_state, rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1212 and 1212+11 and d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk group by s_state ) tmp1 where ranking <= 5 ) group by rollup(s_state,s_county) ) AS sub --new order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent limit 100; This was done on queries: 36 and 70. 3. Query templates were modified to exclude columns not found in the query. In these cases, the common table expression used aliased columns but the dynamic filters included both the alias name as well as the original name. Referencing the original column name instead of the alias causes the query parser to not find the column. This was done on query 86. 4. Added table aliases. This was done on queries: 2, 14, and 23. 5. Added "limit 100" to very large result set queries. For the larger tests (e.g. 15TB), a few of the TPC-DS queries can output a very large number of rows which are just discarded. This was done on queries: 64, 34, and 71.
I seem to be experiencing the same issue described in issue #8. The generate data steps seems to hang, and the same .dat files listed in that issue are empty for me as well. I have confirmed connectivity to all the nodes using gpssh-exkeys, and I'm not able to follow what exactly the fix to the issue was by @dimon777. Is there any guidance or additional info available?
executing on kslave2 ./start_gpfdist.sh 9001 /data/hawq/segment/pivotalguru_1
Started gpfdist on port 9001
executing on kslave2 ./start_gpfdist.sh 9002 /data/hawq/segment/pivotalguru_2
Started gpfdist on port 9002
executing on kslave2 ./start_gpfdist.sh 9003 /data/hawq/segment/pivotalguru_3
Started gpfdist on port 9003
executing on kslave2 ./start_gpfdist.sh 9004 /data/hawq/segment/pivotalguru_4
Started gpfdist on port 9004
executing on kslave2 ./start_gpfdist.sh 9005 /data/hawq/segment/pivotalguru_5
Started gpfdist on port 9005
executing on kslave2 ./start_gpfdist.sh 9006 /data/hawq/segment/pivotalguru_6
Started gpfdist on port 9006
executing on kslave1 ./start_gpfdist.sh 9001 /data/hawq/segment/pivotalguru_1
Started gpfdist on port 9001
executing on kslave1 ./start_gpfdist.sh 9002 /data/hawq/segment/pivotalguru_2
Started gpfdist on port 9002
executing on kslave1 ./start_gpfdist.sh 9003 /data/hawq/segment/pivotalguru_3
Started gpfdist on port 9003
executing on kslave1 ./start_gpfdist.sh 9004 /data/hawq/segment/pivotalguru_4
Started gpfdist on port 9004
executing on kslave1 ./start_gpfdist.sh 9005 /data/hawq/segment/pivotalguru_5
Started gpfdist on port 9005
executing on kslave1 ./start_gpfdist.sh 9006 /data/hawq/segment/pivotalguru_6
Started gpfdist on port 9006
psql -v ON_ERROR_STOP=ON -f /pivotalguru/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/pivotalguru/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
It is running "ALTER USER $USER SET search_path=...", but I think it should be "ALTER USER $ADMIN_USER SET search_path..." as in fact the script is run by ADMIN_USER via "su -l $ADMIN_USER". This will cause that no queries got run when ADMIN_USER is not the login user.
tpcds.log is :
psql -v ON_ERROR_STOP=ON -f /data0/servers/tpcds/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/data0/servers/tpcds/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR: Could not allocate enough memory! bucket number of result hash table and external table should match each other (cdbdatalocality.c:4225)
hawq-site.xml:
<property>
<name>hawq_rm_nvseg_perquery_limit</name>
<value>512</value>
</property>
<property>
<name>hawq_rm_nvseg_perquery_perseg_limit</name>
<value>6</value>
</property>
Thank you
When run tpcds.sh on our greenplum cluster, we get the result bellow:
/data/pivotalguru/TPC-DS/09_score/rollout.sh
Scale Factor 3000
Load 1472.137763
Analyze 170.170859
1 User Queries 74.69139
Concurrent Queries 3100.06968
Q 1485
TPT 373.45695
TTT 3100.06968
TLD 73.606888
Score 670
We are confused on score result. Because we don't know the score meaning, this result is good or bad?
For tpc-ds standard , it use QphDS to measure the cluster's performance. How can I convert score to qphds?
Does imp option performs test of Impala queries on HAWQ or it performs testing of Impala queries on Impala?
I've run:
./rollout.sh 1 false imp true 5 true true true true true true true true true 1
and see that it initiated tpcds workflow going against HAWQ. I was expecting it to run TPC-DS test against Impala. Please clarify.
I have figured this out: all 3 types of queries are executed against HAWQ, which makes sense. Closing. Thanks.
Greenplum version: PostgreSQL 9.4beta1 (Greenplum Database 6.0.0-alpha.0 build dev-oss) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3), 64-bit compiled on Nov 12 2018 03:35:52
this issue happened both in tpc-h and tpc-ds
请问为什么会执行脚本的时候要求输出密码呢?
grammar_support.o keywords.o list.o nulls.o permute.o print.o QgenMain.o query_handler.o r_params.o scaling.o StringBuffer.o substitution.o tdefs.o text.o tokenizer.o w_inventory.o y.tab.o release.o scd.o build_support.o parallel.o -lm
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -c -o checksum.o checksum.c
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG -DLINUX -g -Wall -o checksum checksum.o -lm
Password:
gsql: FATAL: Invalid username/password,login denied.
Hello
I am getting an error while running the script.
Help me figure it out
Sincerely, Sergey Berezin.
stop gpfdist on all ports
92962
killing 93075
killing 93128
93166
killing 93204
280453
killing 280505
killing280543
killing280581
killing280670
killing290935
killing291026
killing291069
killing291107
killing291145
executing on vitrina02 ./start_gpfdist.sh 4001 /data1/primary/gpseg0/pivotalguru
Started gpfdist on port 4001
executing on vitrina02 ./start_gpfdist.sh 4002 /data1/primary/gpseg1/pivotalguru
Started gpfdist on port 4002
executing on vitrina02 ./start_gpfdist.sh 4003 /data1/primary/gpseg2/pivotalguru
Started gpfdist on port 4003
executing on vitrina02 ./start_gpfdist.sh 4004 /data1/primary/gpseg3/pivotalguru
Started gpfdist on port 4004
executing on vitrina02 ./start_gpfdist.sh 4005 /data1/primary/gpseg4/pivotalguru
Started gpfdist on port 4005
executing on vitrina03 ./start_gpfdist.sh 4001 /data1/primary/gpseg5/pivotalguru
Started gpfdist on port 4001
executing on vitrina03 ./start_gpfdist.sh 4002 /data1/primary/gpseg6/pivotalguru
Started gpfdist on port 4002
executing on vitrina03 ./start_gpfdist.sh 4003 /data1/primary/gpseg7/pivotalguru
Started gpfdist on port 4003
executing on vitrina03 ./start_gpfdist.sh 4004 /data1/primary/gpseg8/pivotalguru
Started gpfdist on port 4004
executing on vitrina03 ./start_gpfdist.sh 4005 /data1/primary/gpseg9/pivotalguru
Started gpfdist on port 4005
executing on vitrina04 ./start_gpfdist.sh 4001 /data1/primary/gpseg10/pivotalguru
Started gpfdist on port 4001
executing on vitrina04 ./start_gpfdist.sh 4002 /data1/primary/gpseg11/pivotalguru
Started gpfdist on port 4002
executing on vitrina04 ./start_gpfdist.sh 4003 /data1/primary/gpseg12/pivotalguru
Started gpfdist on port 4003
executing on vitrina04 ./start_gpfdist.sh 4004 /data1/primary/gpseg13/pivotalguru
Started gpfdist on port 4004
executing on vitrina04 ./start_gpfdist.sh 4005 /data1/primary/gpseg14/pivotalguru
Started gpfdist on port 4005
psql -v ON_ERROR_STOP=1 -f /pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/pivotalguru/TPC-DS/04_load/001.gpdb.time_dim.sql:1: ERROR: http response code 404 from gpfdist (gpfdist://vitrina03:4002/time_dim_[0-9]_[0-9 ].dat): HTTP/1.0 404 file not found (seg6 slice1 192.168.11.24:10001 pid=97255)
@pivotalguru
Hi Jon,
I ran TPC-DS Benchmark for Hawq & SparkSQL on Hadoop. Could you please help me verify the numbers?
Ref:
https://github.com/pivotalguru/TPC-DS
https://github.com/databricks/spark-sql-perf
A sub-set of 19 queries were executed to evaluate the performance of Hawq & Spark-SQL.
I have attached the system setup and the results of the tests. Please share your comments.
hawq-vs-spark-verify.pdf
Thanks,
Tania
Hi.
I met Error during the test of TPC-DS and can't continue now:
ERROR: Canceling query because of high VMEM usage. Used: 4613MB, available 507MB, red zone: 4608MB (runaway_cleaner.c:189) (seg19 slice5 192.168.65.11:6007 pid=55634) (cdbdisp.c:254)
I tried to modify a lot of parameters but still have no effect, e.g:
I set gp_vmem_protect_limit to 8192, 12288, 15360 by gpconfig but still not work.
There are 3 nodes in my Greenplum cluster.
[gpadmin@node6 TPC-DS]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cl-root 50G 7.3G 43G 15% /
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 126G 9.4M 126G 1% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sdc1 1014M 140M 875M 14% /boot
/dev/sdd 3.7T 307G 3.4T 9% /data/data9
/dev/sdl1 3.7T 307G 3.4T 9% /data/data6
/dev/sdj 3.7T 307G 3.4T 9% /data/data11
/dev/sdg1 3.7T 308G 3.4T 9% /data/data2
/dev/sdk1 3.7T 307G 3.4T 9% /data/data5
/dev/sdf 3.7T 307G 3.4T 9% /data/data10
/dev/sdi1 3.7T 307G 3.4T 9% /data/data4
/dev/sdm 3.7T 306G 3.4T 9% /data/data12
/dev/sdb 3.7T 307G 3.4T 9% /data/data8
/dev/sde1 3.7T 2.1T 1.6T 58% /data/data1
/dev/sda 3.7T 307G 3.4T 9% /data/data7
/dev/sdh1 3.7T 307G 3.4T 9% /data/data3
/dev/mapper/cl-home 392G 29G 364G 8% /home
tmpfs 26G 0 26G 0% /run/user/0
tmpfs 26G 0 26G 0% /run/user/1000
node6:
total used free shared buff/cache available
Mem: 251G 2.0G 243G 5.3G 5.5G 243G
Swap: 4.0G 0B 4.0G
node7:
total used free shared buff/cache available
Mem: 251G 2.0G 243G 5.3G 5.5G 243G
Swap: 2.2G 0B 2.2G
node11:
total used free shared buff/cache available
Mem: 125G 1.3G 118G 5.1G 5.3G 118G
Swap: 2.2G 0B 2.2G
tpcds_variables.sh
REPO="TPC-DS"
REPO_URL="https://github.com/pivotalguru/TPC-DS"
ADMIN_USER="gpadmin"
INSTALL_DIR="/pivotalguru"
EXPLAIN_ANALYZE="false"
RANDOM_DISTRIBUTION="false"
MULTI_USER_COUNT="5"
GEN_DATA_SCALE="1024"
SINGLE_USER_ITERATIONS="1"
RUN_COMPILE_TPCDS="false"
RUN_GEN_DATA="true"
RUN_INIT="true"
RUN_DDL="true"
RUN_LOAD="true"
RUN_SQL="true"
RUN_SINGLE_USER_REPORT="true"
RUN_MULTI_USER="true"
RUN_MULTI_USER_REPORT="true"
RUN_SCORE="true"
tail tpcds.log
20190802:18:49:05:011726 analyzedb:node6:gpadmin-[INFO]:-Done.
/pivotalguru/TPC-DS/05_sql/rollout.sh
source ~/.bashrc
psql -v ON_ERROR_STOP=1 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /pivotalguru/TPC-DS/05_sql/101.tpcds.01.sql | wc -l
psql -v ON_ERROR_STOP=1 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /pivotalguru/TPC-DS/05_sql/102.tpcds.02.sql | wc -l
psql:/pivotalguru/TPC-DS/05_sql/102.tpcds.02.sql:59: ERROR: Canceling query because of high VMEM usage. Used: 4610MB, available 510MB, red zone: 4608MB (runaway_cleaner.c:189) (seg32 slice2 192.168.65.7:6008 pid=24111) (cdbdisp.c:254)
note:
the 59th line of 102.tpcds.02.sql is :
order by d_week_seq1;
I observed memory changes during execute 05_sql/102.tpcds.02.sql separately in psql, here are three stage:
ERROR: Canceling query because of high VMEM usage. Used: 4613MB, available 507MB, red zone: 4608MB (runaway_cleaner.c:189) (seg19 slice5 192.168.65.11:6007 pid=55634) (cdbdisp.c:254)
I also tried to modify the resource queue parameters, GEN_DATA_SCALE, but still can not be completed.
What is the difference between data overflowing into the spill file and data in vmem? When will it overflow and when will it be cancelled?
I don't know how to finish the TPC-DS test. please help. :simple_smile:
Whatever we did to surpass this problem gave no results so far. Help needed...
This week we've been trying to get some scoring on our Greenplum 6.4.0 test cluster, faced a few challenges with memory and disk space, that was expected... but this is the error that stopped any further progress:
psql -v ON_ERROR_STOP=1 -a -f /pivotalguru/TPC-DS/06_single_user_reports/07.copy.gen_data.sql -v LOGFILE="'/pivotalguru/TPC-DS/06_single_user_reports/../log/rollout_gen_data.log'"
COPY tpcds_reports.gen_data FROM :LOGFILE WITH DELIMITER '|';
psql:/pivotalguru/TPC-DS/06_single_user_reports/07.copy.gen_data.sql:1: ERROR: could not open file "/pivotalguru/TPC-DS/06_single_user_reports/../log/**rollout_gen_data.log**" for reading: No such file or directory
I tried all sorts of options in tpcds_variables.sh, but it seems that apart from EXPLAIN_ANALYZE logging any other logs don't work as expected. If this is based on where Greenplum stores its own log files -- that changed quite a bit from version 5.x to 6.x
Has anyone ever confirmed this to work with 6.x or this is specific to our environment? Greenplum OSS was deployed with go-gpdb installer. To provide expected env variables I had to create greenplum_path.sh and source it in .bashrc for tpcds.sh to start correctly (go-gpdb originally creates a different env file and sources it in /etc/profile.d/gpdb.profile.sh)
My greenplum_path.sh:
export GPHOME=/usr/local/greenplum-db-6.4.0
export PYTHONPATH=$GPHOME/lib/python
export PYTHONHOME=$GPHOME/ext/python
export PATH=$GPHOME/bin:$PYTHONHOME/bin:$PATH
export LD_LIBRARY_PATH=$GPHOME/lib:$PYTHONHOME/lib:$LD_LIBRARY_PATH
export OPENSSL_CONF=$GPHOME/etc/openssl.cnf
export MASTER_DATA_DIRECTORY=/data/master/gp_6.4.0_20200320124955-1
export PGPORT=3002
export PGDATABASE=gpadmin
export singleOrMulti=multi
Hi,
We try to execute TPC-DS tests in offline GPDB clusters but I stuck at external network check GitHub access check.
I guess it's caused by script_check()
in tpcds.sh script.
It would be great if someone can improve TPC-DS scripts to make it workable to run in offline environments smoothly.
Thanks.
Hi,
I find that setting RUN_COMPILE_TPCDS=false
does not disable compiling.
I guess that 00_compile_tpcds/rollout.sh
does not check the variable.
This is a bit inconvenient and hope this can be improved.
Thanks.
I've run a very small test (just to confirm everything works). I see this in the terminal output (99 rows):
SET
Timing is on.
query_id session_1 session_2 session_3 session_4 session_5 session_6 session_7 session_8 session_9 session_10
01 1.1033 1.1025 1.1017 1.1031 0.965 0 0 0 0 0
02 47.47847 16.16351 3.3238 3.3301 3.3333 0 0 0 0 0
03 0.859 1.147 1.1487 1.1455 1.1648 0 0 0 0 0
04 4.4209 7.7177 7.7695 5.5675 6.6685 0 0 0 0 0
05 3.3159 3.3548 3.3931 3.33 3.3234 0 0 0 0 0
06 1.1655 1.1672 1.1722 1.1707 1.1647 0 0 0 0 0
...
...
98 0.736 0.763 0.733 0.722 0.772 0 0 0 0 0
99 2.2103 2.2241 2.2124 2.2235 2.2242 0 0 0 0 0
(99 rows)
Time: 9.187 ms
Is this query execution time in seconds for each session? Is there a way to obtain this report in the log directory? I can't find this report in there.
Hello. I am getting some errors, described below. And because of this, the script is not finalized. Please help me to sort out the problem.
psql:/date/TPC-DS/05_sql/171.tpcds.71.sql:40: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.3.11.38" and accepting
TCP/IP connections on port 10029?
(seg77 10.3.11.38:10029)
psql -v ON_ERROR_STOP=0 -A -q -t -P pager=off -v EXPLAIN_ANALYZE="" -f /date/TPC-DS/05_sql/172.tpcds.72.sql | wc -l
psql:/date/TPC-DS/05_sql/172.tpcds.72.sql:28: ERROR: failed to acquire resources on one or more segments
DETAIL: could not connect to server: Connection timed out
Is the server running on host "10.3.11.38" and accepting
TCP/IP connections on port 10006?
(seg54 10.3.11.38:10006)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
Warning: Selected scale factor is NOT valid for result publication
stream_number: 1
mv /date1/TPC-DS/07_multi_user/query_0.sql /date1/TPC-DS/07_multi_user/1/
stream_number: 2
mv /date1/TPC-DS/07_multi_user/query_1.sql /date1/TPC-DS/07_multi_user/2/
stream_number: 3
mv /date1/TPC-DS/07_multi_user/query_2.sql /date1/TPC-DS/07_multi_user/3/
stream_number: 4
mv /date1/TPC-DS/07_multi_user/query_3.sql /date1/TPC-DS/07_multi_user/4/
stream_number: 5
mv /date1/TPC-DS/07_multi_user/query_4.sql /date1/TPC-DS/07_multi_user/5/
/date1/TPC-DS/07_multi_user/test.sh 5000 1 false
/date1/TPC-DS/07_multi_user/test.sh 5000 2 false
/date1/TPC-DS/07_multi_user/test.sh 5000 3 false
/date1/TPC-DS/07_multi_user/test.sh 5000 4 false
/date1/TPC-DS/07_multi_user/test.sh 5000 5 false
Hi,
I am running hawq tpc-ds benchmarks. I have a cluster with 1 master(512 GB RAM) and 6 segments(130GB RAM/segment). I am using tpcds.sh script to load data in hdfs. I am able to run the tests successfully till 50 GB. But beyond that I am getting the many errors in the segmentdd/pg_logs. Attached is the log.
_"3rd party error log:
2017-02-21 13:39:21.899176, p205200, th139795568425216, ERROR Pipeline: Failed to build pipeline for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, new generation stamp is 0,
Pipeline.cpp: 240: HdfsIOException: Failed to add new datanode into pipeline for block: [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, set ""output.replace-datanode-on-failure"" to ""false"" to disable this feature.
@ Hdfs::Internal::PipelineImpl::
hawq-2017-02-21_130521.txt****
buildForAppendOrRecovery(bool)
@ Hdfs::Internal::PipelineImpl::send(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::sendPacket(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::appendInternal(char const*, long)
@ Hdfs::Internal::OutputStreamImpl::append(char const*, long)
@ hdfsWrite**
--------------------------------------------------_**
"3rd party error log:
2017-02-21 13:39:21.977826, p205198, th139795568425216, ERROR Failed to flush pipeline on datanode ILDSS8(192.168.4.12) for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832864_273847] file /hawq_default/16385/16508/18775/10.
TcpSocket.cpp: 69: HdfsNetworkException: Read 8 bytes failed from ""192.168.4.12:50010"": (errno: 104) Connection reset by peer
@ Hdfs::Internal::TcpSocketImpl::read(char*, int)
@ Hdfs::Internal::BufferedSocketReaderImpl::readVarint32(int, int)
@ Hdfs::Internal::PipelineImpl::processResponse()
@ Hdfs::Internal::PipelineImpl::checkResponse(bool)
@ Hdfs::Internal::PipelineImpl::waitForAcks(bool)
@ Hdfs::Internal::OutputStreamImpl::flushInternal(bool)
@ Hdfs::Internal::OutputStreamImpl::sync()
Do I need to set any parameter in hadoop or tpcds configuration files, or my system parameters need tuning? Not sure if I am missing anything. Please help.
Thanks,
Tania
I am observing very low HDFS throughput on queries on 9 nodes cluster, 100GB test run:
Observations:
test run as: ./rollout.sh 100 false tpcds true 5 true true true true true true true true true 1
~90MB/sec for read and writes clusterwide. I've seen 1.9GB/sec during dataload and table analyze phase.
72 postgres processes on each datanode and they consume 80% - 90% of CPU each and 0% MEM, doing very little I/O.
35036 files on HDFS with 2MB size per each file. Is this normal? Our block size is 128MB
Top example on one of the nodes:
Cpu(s): 0.0%us, 77.4%sy, 0.9%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264403536k total, 172780840k used, 91622696k free, 2986328k buffers
Swap: 4194300k total, 0k used, 4194300k free, 155959360k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
486160 gpadmin 39 19 872m 28m 10m D 88.7 0.0 4:32.52 postgres
486926 gpadmin 39 19 872m 28m 10m R 87.4 0.0 4:34.75 postgres
486405 gpadmin 39 19 872m 28m 10m R 86.4 0.0 4:23.99 postgres
487162 gpadmin 39 19 872m 28m 10m R 80.4 0.0 4:30.14 postgres
486761 gpadmin 39 19 872m 28m 10m R 78.8 0.0 4:28.41 postgres
486256 gpadmin 39 19 872m 28m 10m D 76.5 0.0 4:30.63 postgres
Action plan:
This is not an issue but a question:
hawq_rm_nvseg_perquery_perseg_limit parameter what is the guideline to set this parameter? I have read documentation on it but don't fully understand implications. For example if I have 40 CPU datanode should I set this parameter to a much higher value than default 6, say 12, 24, 32?
I got this error when running rollout.sh:
Started gpfdist on port 9005
executing on hadoop01.home.net ./start_gpfdist.sh 9006 /data/hawq/segmentdd/pivotalguru_6
Started gpfdist on port 9006
psql -v ON_ERROR_STOP=ON -f /home/hawq/TPC-DS/04_load/051.insert.call_center.sql | grep INSERT | awk -F ' ' '{print $3}'
psql:/home/hawq/TPC-DS/04_load/051.insert.call_center.sql:1: ERROR: Could not allocate enough memory! bucket number of result hash table and external table should match each other (cdbdatalocality.c:4203)
I am using latest hawq repository code. CentOS 7, 8 node CDH Hadoop distribution. Everything seems to run just fine, including Hawq itself.
Please advise.
I am running on RHEL 6.5 (Note 1GB dataset)
./rollout.sh 1 false tpcds false 5 true true true true true true true true true 1
And it already took 5 hours:
Now generating data. This make take a while.
Generating data......................................................................................................................................................
Where I can check for logs and what maybe wrong?
Have run this test successfully on Greenplum 5.16, however on Greenplum 4.3, because of bug, it is causing Greenplum startup failure, with below message:
2019-04-04 07:23:51.931038 GMT,,,p28056,th845141952,,,,0,,,seg-1,,,,,"FATAL","22023","ORCA is not supported by this build",,,,,,,,"assign_optimizer","guc_gp.c",5957,
On checking found that script(02_init/rollout.sh), inserts multiple records in postgresql.conf.
And below inserted line, causes Greenplum startup failure:
optimizer=on
This is done through below code:
/pivotalguru/TPC-DS/02_init/rollout.sh(L47)
psql -v ON_ERROR_STOP=1 -q -t -A -c "show optimizer" | grep -i "on" | wc -l; exit ${PIPESTATUS[0
All 6 gpfdist logs show this message:
[2017-01-10 11:50:45] [WRN /tmp/build/78017950/hdb_apache/src/bin/gpfdist/src/gpfdist/gpfdist.c:2068] Creating the socket failed
Serving HTTP on port 9006, directory /opt/gpadmin/segmentdd/pivotalguru_6
[2017-01-10 11:54:14] [WRN /tmp/build/78017950/hdb_apache/src/bin/gpfdist/src/gpfdist/gpfdist.c:2187] signal 15 received. gpfdist exits
Is this error critical? The data seem to loaded fine.
Thanks.
When I run the script, the 4th query ( 104.tpcds.04.sql ) throws me the following error:
psql: TPC-DS/05_sql/104.tpcds.04.sql:115: ERROR: insufficient memory reserved for statement
HINT: Increase statement memory or reduce the number of Parquet tables to be scanned.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.