Comments (8)
from tpc-ds.
How many nodes? 6 datanodes, 8 nodes total
How much RAM per node? 8GB
How much swap per node? 16GB
How much RAM is being used by other processes? 1.3GB is used by CDH processes on each node. 6.5GB is free.
Are you using using YARN or the default resource manager? For Hawq the default RM is used.
Are you using randomly distributed tables? I am not sure about this, since it is TPC-DS generated tables
What are the values for these GUCs?
hawq_rm_stmt_vseg_memory 128mb
hawq_rm_memory_limit_perseg 4GB
default_hash_table_bucket_number 24
hawq_rm_nvseg_perquery_perseg_limit 6
OS settings:
vm.overcommit_ratio = 50
vm.overcommit = this parameter doesn't exist in CentOS7.1
Thank you.
from tpc-ds.
I see the code says this:
if ((context.resultRelationHashSegNum < context.externTableForceSegNum
&& context.externTableForceSegNum != 0)
|| (context.resultRelationHashSegNum < context.externTableLocationSegNum)) {
elog(ERROR, "Could not allocate enough memory! "
"bucket number of result hash table and external table should match each other");
}
What should I adjust to avoid this assertion? I don't know how to translate these context parameters to GUCs (if my case is indeed a misconfiguration).
from tpc-ds.
from tpc-ds.
Thank you,
I will try these. Unfortunately I have no control to use CDH or Horton or CentOS version (v7 seems to be supported: https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install) . In my tests I have not seen any issues with this configuration. My goal is to test HAWQ on the available Hadoop platform and compare it with Impala.
from tpc-ds.
from tpc-ds.
Yes, I am aware of Cloudera queries adjustment for Impala tpc-ds test. I also read Pivotal article on this. And I fully agree: the way Cloudera did this is unacceptable and misleading for those who rely on TPC-DS benchmark to make the judgment about the platform.
from tpc-ds.
The issue was in my environment .bashrc didn't have GREENPLUM_PATH variable set. After adding this:
export GREENPLUM_PATH=/usr/local/hawq/greenplum_path.sh
and changing RANDOM_DISTRIBUTION flag to true in ./rollout.sh call test started working fine.
from tpc-ds.
Related Issues (20)
- imp option - clarification needed HOT 1
- Creating socket failed during dataload HOT 2
- hawq_rm_nvseg_perquery_perseg_limit clarification HOT 1
- Very poor HDFS throughput HOT 2
- Unable to load more than 50GB data in hdfs through tcpds script HOT 8
- Sharing TPC-DS test results of HAWQ & SparkSQL
- Generate data step hangs HOT 14
- relation "pg_filespace_entry" does not exist HOT 7
- Changes in Postgresql.conf causing to Stop Greenplum HOT 5
- Canceling query because of high VMEM usage. HOT 2
- ERROR: could not open file "../log/rollout_gen_data.log" for reading: No such file or directory HOT 10
- Can not execute tpcds.sh in offline environments HOT 2
- Setting RUN_COMPILE_TPCDS="false" does not disable compiling HOT 2
- 请教问题 HOT 5
- what's the difference with score and qphds HOT 2
- Should 02_init/rollout.sh set search path for ADMIN_USER? HOT 3
- ERROR: http response code 404 from gpfdist HOT 19
- Selected scale factor is NOT valid && Connection timed out HOT 7
- Generating data takes long time HOT 4
- Session report not avaialbe HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tpc-ds.