zdata-inc / ambari-extensions Goto Github PK
View Code? Open in Web Editor NEWzData Ambari Stack containing HAWQ, Chorus, and Greenplum
Home Page: http://zdata-inc.github.io/ambari-extensions
zData Ambari Stack containing HAWQ, Chorus, and Greenplum
Home Page: http://zdata-inc.github.io/ambari-extensions
This is addressing the generated output in the logs:
[WARN]:-File permission mismatch. The gpadmin owns the Greenplum Database installation directory.
[WARN]:-You are currently logged in as gpadmin and may not have sufficient
[WARN]:-permissions to run the Greenplum binaries and management utilities.
The issue stems from how gpinitsystem (specifically /usr/local/hawq/bin/lib/gp_bash_functions.sh
, which is called by gpinitsystem) checks the file permissions for the files located in /usr/local/hawq
. It does this by running ls -la
on the file /usr/local/hawq/bin/initdb
and running the sed commands sed -e 's/...\(.\)....../\1/g'
(to retrieve the user executable permission), and -e 's/......\(.\).../\1/g'
(to retrieve the group executable permission).
The issue becomes obvious when looking at the output of the ls command on an SELinux enabled system (even if it is set to permissive):
-rwxr-xr-x. 1 gpadmin root 463,708 Aug 8 2014 initdb*
^ Note the dot
Without the trailing dot (which signifies there are extended SELinux ACLs on the file) the sed used works correctly. With it though the sed returns 'x.', as the last dot is never matched, and therefore isn't replaced.
The role_command_order.json file in the zdata stack version declares not to start the HAWQ service install until HDFS and ZOOKEEPER have been started, however the ActionQueue.py of Ambari is adding the execution command to install the HAWQ service which is starting the HAWQ install prematurely.
This is taken from the ambari-agent.log file on the master:
INFO 2015-03-05 18:44:03,933 ActionQueue.py:110 - Adding EXECUTION_COMMAND for service HAWQ of cluster zdata to the queue.
INFO 2015-03-05 18:44:03,967 ActionQueue.py:203 - Executing command with id = 3-2 for role = HAWQ_MASTER of cluster zdata.
Prior to this log output, the status of both HDFS and ZOOKEEPER came back with NOT running and its status commands threw a component not running exception.
Either this is an Ambari bug, the role_command_order.json is missing information, or there is a timeout or threshold of status failures before moving onto the next command for installing another service.
Add basic Redis service.
Questions to be answered:
Force setting of number of segments per node.
Set port to 5432.
Remove default database name.
Allow Greenplum installations to have a hot standby master.
This is done by allowing a cardinality of 1-2 for master, and setting the second as a hot standby master.
Look into automatically attempting to recover master segment if it is being started and had previously failed. Not sure if possible, the information required to make such a decision may not be available from Ambari to discern between a normal cluster start and a master recovery start. If it is not possible create a custom action.
Log where it breaks
stderr: /var/lib/ambari-agent/data/errors-85.txt
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
self.stop(env)
File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
user=params.admin_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist. is Greenplum instance already stopped?
stdout: /var/lib/ambari-agent/data/output-85.txt
2015-06-10 19:13:53,820 - Could not verify stack version by calling '/usr/bin/distro-select versions > /tmp/tmp_ik32J'. Return Code: 1, Output: .
2015-06-10 19:13:53,824 - Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/; curl -kf -x "" --retry 10 http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2015-06-10 19:13:53,838 - Skipping Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/; curl -kf -x "" --retry 10 http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] due to not_if
2015-06-10 19:13:53,839 - Group['hadoop'] {'ignore_failures': False}
2015-06-10 19:13:53,840 - Modifying group hadoop
2015-06-10 19:13:53,854 - Group['nobody'] {'ignore_failures': False}
2015-06-10 19:13:53,854 - Modifying group nobody
2015-06-10 19:13:53,864 - Group['nagios'] {'ignore_failures': False}
2015-06-10 19:13:53,864 - Modifying group nagios
2015-06-10 19:13:53,877 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-06-10 19:13:53,877 - Modifying user nobody
2015-06-10 19:13:53,888 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-06-10 19:13:53,888 - Modifying user nagios
2015-06-10 19:13:53,898 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2015-06-10 19:13:53,898 - Modifying user ambari-qa
2015-06-10 19:13:53,911 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-06-10 19:13:53,912 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-06-10 19:13:53,920 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-06-10 19:13:53,932 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-06-10 19:13:54,130 - Execute['gpstop -a -M smart -v'] {'user': 'gpadmin'}
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
self.stop(env)
File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
user=params.admin_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
provider_action()
"""
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist. is Greenplum instance already stopped?
Either back out automatically if installation fails, or provide a back out script which, in addition to calling Greenplum's generated backout script if one exists, also backs out the changes made while installing Greenplum by Ambari.
We have overlapping definitions for HAWQ and PXF with PHD 3.0.0. I would like the option to release a tar file that only contains Greenplum and no other service definitions. Also we may not want to package and release Minecraft to customers. It would be nice to have an "includes" where we list which services we want bundled up.
Add GPCC to Ambari, either as its own service (most likely), or as an installation option during Greenplum configuration.
https://wiki.zdatainc.com/index.php/Greenplum_Command_Center
Look into simplifying integration of LDAP and Kerberos with Greenplum.
Move Kernel Parameters, pg_hba.conf, and postgresql.conf to Ambari configurables. Look into disabling the modification of Greenplum settings which cannot be changed after instantiation.
Title is self explanatory. Need to host zData Chorus most likely on S3 and use that URI in the installation by default.
When a single Greenplum segment is attempting to be started assume it failed and attempt to run gpsegrecover. Potentially check if cluster is started (master/standby master are started). If the Greenplum cluster isn't started don't do anything.
Inheriting off off PHD 3.0 stack version won't copy over local repo files, yet inheriting off 2.1 works just fine. It would be nice to inherit off of the latest.
Various modifications can be made to the post_copy_commands section in the Greenplum installation procedure.
The sed to change GPHOME in greenplum_path.sh does not need to be run on all hosts, on the sym link creation does. This should allow a restructuring of the code which will be more clear.
Provide some sort of interface in Ambari to failover a Greenplum cluster to its standby, and Failback to master once it's fixed.
Default path to gp master data directory should be /data1/
.
If you set it to /data1/gpseg-1, then Greenplum will create /data1/gpseg-1/gpseg-1
SELinux status check should be more comprehensive, currently only checks if the file exists, doesn't check its contents (which can be set to 0).
We need to fix hawq_master.py to get the hdfs username and not assume a hardcoded value of "hdfs" for the user that owns the root hdfs directory.
I believe it has to do with the pg_hba.conf file being misconfigured, but not sure yet.
Will look into further and update results here.
Create basic autotools for installing and uninstalling service correctly. This'll alleviate any issues with installing into an already operational Ambari server, and allow how this is done to change (which'll be important when the project transitions from a stack version to its own stack).
Remote datanodes have the following in their logs when started and cannot connect to the namenode. The namenode can be pinged and ssh'd into though.
INFO ipc.Client (Client.java:handleConnectionFailure(783)) - Retrying connect to server: master.ambaricluster.local/172.28.128.3:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
When stopping the Greenplum service on branch feature/Pivotal-Ambari, " I get Host Role in invalid state"
JAVA_HOME is hardcoded in PXF/package/scripts/params.py as a workaround. It's pointing to where Ambari Server installs the jdk at which is at /usr/jdk64/jdk1.7.0_67/jre.
Java may be installed elseware. Using bigtop-detect-javahome to auto detect Java doesn't work as the possible path isn't included. We need to just write something simple to find java on the system and put it in a utility function.
Stack advisor was originally taken from HDP stack, needs a serious overhaul.
Old decrepit code needs to be removed, possibly create a class with general code and inherit from it for zData stack specific code.
Greenplum's gpadmin keys should be distributed manually through the root user. This will simplify the work, and allow gpadmin user to be created without a password.
Install gpperfmon with Greenplum if specified in the configurations. Look at https://wiki.zdatainc.com/index.php/Greenplum_Command_Center#Installation
Current default data directory structure:
/data1/primary/gpsegX
/data2/primary/gpsegX
...
/dataN/primary/gpsegN
New structure should be:
/data1/primary/
gpsegX
gpsegX
/data2/primary/
gpsegX
gpsegX
Will need to implement some sort of pattern expression for data and mirror data directory templates.
Gpseginstall breaks because it doesn't have id_rsa in root's ~/.ssh/ directory for the cluster or if you don't have known_hosts filled out with public keys from the other nodes in the cluster.
Solution: Populate known_hosts before gpseginstall by running something like this:
while read host; do ssh-keyscan $host >> ~/.ssh/known_hosts; done < /usr/local/greenplum-db/greenplum_hosts
Sometimes theres some major errors that happen, but when they get fixed and Ambari retries to install, the log checker will still find some Fatal errors and the installer won't pass.
gpinitsystem runs Pivotal's checkhdfs command, which segfaults. The segfault is assumed to occur because checkhdfs is written by Pivotal, but the hadoop variant is provided by Hortonworks.
The solution should probably just be to stop this command from running during gpinitsystem, and allow it to run when/if we eventually migrate over to PHD (Pivotal HD), from HDP (Hortonworks Data Platform).
Even though the Greenplum segments are controlled by master, so no code is run on the segments in order to stop them. That said, the segments shouldn't report back successful until all their relevant processes have been stopped/started.
Each Greenplum segment should watch its processes on start and stop and return successfully when they all are started or stopped.
Bootstrap files have been cleaned up on development, and those changes cherry-pick'd to release/0.4.x.
Now on development further changes need to be made such as removing the master-bootstrap.sh script, and allowing a selection of functions to be made by passing in a name for the first argument. Examples being: 'pivotal' and 'vanilla'.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.