Comments (8)
HDFS can't keep up which fails the data node. Eventually, this will make other nodes "fail" and then there won't be enough to handle the insert thus it fails. To avoid this, you basically want the write to HDFS to not cause the node to fail but instead, retry. That is why the error message from HAWQ is telling you to change output.replace-datanode-on-failure to false.
If you are using Ambari, go to HAWQ, Advanced, Advanced hdfs-client and change the output.replace-datanode-on-failure to false (uncheck the box).
If you are not using Ambari, go to /usr/local/hawq/etc/ and change the hdfs-client.xml file so that output.replace-datanode-on-failure is set to false. Next, copy that new configuration file to every node in the cluster.
from tpc-ds.
Thanks Jon for your quick response. I am not using Ambari and I tried out the above changes you mentioned. It made things better but I could see some new set of exceptions in hadoop logs and pg_logs. Probably hadoop is failing to load data. I am attaching the logs here for your reference. Thank you once again for your help.
hadoop-bigdata-datanode-ILDSS2_errors.txt
pg_log_ILDSS2_errors.txt
from tpc-ds.
Yes, you have something wrong in HDFS. Maybe it is a misconfiguration.
Review the settings here:
http://hdb.docs.pivotal.io/211/hdb/install/install-cli.html
http://hdb.docs.pivotal.io/211/hawq/requirements/system-requirements.html
Are you using hash distribution or random? This is set in the tpcds_variables.sh file. You should be using random with HAWQ. You could also then reduce the number of virtual segments which would decrease the load on HDFS. This is done by changing hawq_rm_nvseg_perquery_perseg_limit from the default of 6 to 4.
from tpc-ds.
I am using Random distribution. I will review the settings and rerun the test. Thanks a lot for your help.
from tpc-ds.
There were few discrepancies in the system requirements. I have corrected those and its loading the data now. Thanks Jon. :)
from tpc-ds.
hi, I am very surprised to see that when I give a Scale factor of 1000(1TB), the data loaded in the hdfs is only 55/56 % of the SF. (total DFS used 560 GB, total Non DFS 1254 GB).
Afterloading1TBdata.pdf
from tpc-ds.
Compression! All tables are stored in Parquet format and medium and large sized tables are also compressed with Snappy compression. You can look at the size of the raw files that are stored in posix filesystem and see how large it is. It is located in each segment directory in the pivotalguru subdirectory. That should total to 1TB across all nodes.
from tpc-ds.
Oh Sure! I missed that. Thanks for your help.
from tpc-ds.
Related Issues (20)
- imp option - clarification needed HOT 1
- Creating socket failed during dataload HOT 2
- hawq_rm_nvseg_perquery_perseg_limit clarification HOT 1
- Very poor HDFS throughput HOT 2
- Sharing TPC-DS test results of HAWQ & SparkSQL
- Generate data step hangs HOT 14
- relation "pg_filespace_entry" does not exist HOT 7
- Changes in Postgresql.conf causing to Stop Greenplum HOT 5
- Canceling query because of high VMEM usage. HOT 2
- ERROR: could not open file "../log/rollout_gen_data.log" for reading: No such file or directory HOT 10
- Can not execute tpcds.sh in offline environments HOT 2
- Setting RUN_COMPILE_TPCDS="false" does not disable compiling HOT 2
- 请教问题 HOT 5
- what's the difference with score and qphds HOT 2
- Should 02_init/rollout.sh set search path for ADMIN_USER? HOT 3
- ERROR: http response code 404 from gpfdist HOT 19
- Selected scale factor is NOT valid && Connection timed out HOT 7
- Generating data takes long time HOT 4
- Session report not avaialbe HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tpc-ds.