zdata-inc / ambari-extensions Goto Github PK

zData Ambari Stack containing HAWQ, Chorus, and Greenplum

Home Page: http://zdata-inc.github.io/ambari-extensions

Shell 10.64% Python 88.62% Makefile 0.65% M4 0.09%

ambari-extensions's Issues

Warning thrown during gpinitsystem if initdb has extended SELinux ACLs

This is addressing the generated output in the logs:

[WARN]:-File permission mismatch.  The gpadmin owns the Greenplum Database installation directory.
[WARN]:-You are currently logged in as gpadmin and may not have sufficient
[WARN]:-permissions to run the Greenplum binaries and management utilities.

The issue stems from how gpinitsystem (specifically /usr/local/hawq/bin/lib/gp_bash_functions.sh, which is called by gpinitsystem) checks the file permissions for the files located in /usr/local/hawq. It does this by running ls -la on the file /usr/local/hawq/bin/initdb and running the sed commands sed -e 's/...\(.\)....../\1/g' (to retrieve the user executable permission), and -e 's/......\(.\).../\1/g' (to retrieve the group executable permission).

The issue becomes obvious when looking at the output of the ls command on an SELinux enabled system (even if it is set to permissive):

-rwxr-xr-x. 1 gpadmin root 463,708 Aug  8  2014 initdb*
          ^ Note the dot

Without the trailing dot (which signifies there are extended SELinux ACLs on the file) the sed used works correctly. With it though the sed returns 'x.', as the last dot is never matched, and therefore isn't replaced.

Fresh Install of Hawq fails because HDFS and Zookeeper aren't started

The role_command_order.json file in the zdata stack version declares not to start the HAWQ service install until HDFS and ZOOKEEPER have been started, however the ActionQueue.py of Ambari is adding the execution command to install the HAWQ service which is starting the HAWQ install prematurely.

This is taken from the ambari-agent.log file on the master:

INFO 2015-03-05 18:44:03,933 ActionQueue.py:110 - Adding EXECUTION_COMMAND for service HAWQ of cluster zdata to the queue.
INFO 2015-03-05 18:44:03,967 ActionQueue.py:203 - Executing command with id = 3-2 for role = HAWQ_MASTER of cluster zdata.

Prior to this log output, the status of both HDFS and ZOOKEEPER came back with NOT running and its status commands threw a component not running exception.

Either this is an Ambari bug, the role_command_order.json is missing information, or there is a timeout or threshold of status failures before moving onto the next command for installing another service.

Add Redis Service

Add basic Redis service.

Questions to be answered:

Allow high availability configurations with Sentinel? (Requires more work)
Allow non-restart reconfigurations via set command? (Alternative is faster but requires restarts for any config changes)
Sentinels automatically be installed on all Redis nodes, or have manual distribution as a type of slave?

Update documentation

Update Greenplum code documentation, cleanup.
Update HAWQ code documentation, cleanup.
Update PXF code documentation, cleanup.
Update github site documentation.
Update readme.md.

Change default values to production ready.

Force setting of number of segments per node.
Set port to 5432.
Remove default database name.

Allow the creation of a hot standby master for Greenplum

Allow Greenplum installations to have a hot standby master.
This is done by allowing a cardinality of 1-2 for master, and setting the second as a hot standby master.

Look into automatically attempting to recover master segment if it is being started and had previously failed. Not sure if possible, the information required to make such a decision may not be available from Ambari to discern between a normal cluster start and a master recovery start. If it is not possible create a custom action.

Clicking "Restart All" while the cluster is already stopped fails and blows up

Log where it breaks

stderr:   /var/lib/ambari-agent/data/errors-85.txt

2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?
stdout:   /var/lib/ambari-agent/data/output-85.txt

2015-06-10 19:13:53,820 - Could not verify stack version by calling '/usr/bin/distro-select versions > /tmp/tmp_ik32J'. Return Code: 1, Output: .
2015-06-10 19:13:53,824 - Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2015-06-10 19:13:53,838 - Skipping Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] due to not_if
2015-06-10 19:13:53,839 - Group['hadoop'] {'ignore_failures': False}
2015-06-10 19:13:53,840 - Modifying group hadoop
2015-06-10 19:13:53,854 - Group['nobody'] {'ignore_failures': False}
2015-06-10 19:13:53,854 - Modifying group nobody
2015-06-10 19:13:53,864 - Group['nagios'] {'ignore_failures': False}
2015-06-10 19:13:53,864 - Modifying group nagios
2015-06-10 19:13:53,877 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-06-10 19:13:53,877 - Modifying user nobody
2015-06-10 19:13:53,888 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-06-10 19:13:53,888 - Modifying user nagios
2015-06-10 19:13:53,898 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2015-06-10 19:13:53,898 - Modifying user ambari-qa
2015-06-10 19:13:53,911 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-06-10 19:13:53,912 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-06-10 19:13:53,920 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-06-10 19:13:53,932 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-06-10 19:13:54,130 - Execute['gpstop -a -M smart -v'] {'user': 'gpadmin'}
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
"""  
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?

Add backout code for Greenplum installation failures.

Either back out automatically if installation fails, or provide a back out script which, in addition to calling Greenplum's generated backout script if one exists, also backs out the changes made while installing Greenplum by Ambari.

Use our make script to define which services should be packaged up

We have overlapping definitions for HAWQ and PXF with PHD 3.0.0. I would like the option to release a tar file that only contains Greenplum and no other service definitions. Also we may not want to package and release Minecraft to customers. It would be nice to have an "includes" where we list which services we want bundled up.

Add GPCC to installation of Greenplum?

Add GPCC to Ambari, either as its own service (most likely), or as an installation option during Greenplum configuration.

https://wiki.zdatainc.com/index.php/Greenplum_Command_Center

Greenplum Kerberos/LDAP integration

Look into simplifying integration of LDAP and Kerberos with Greenplum.

http://dewoods.com/blog/greenplum-ldap-guide

Make Greenplum more configurable

Move Kernel Parameters, pg_hba.conf, and postgresql.conf to Ambari configurables. Look into disabling the modification of Greenplum settings which cannot be changed after instantiation.

Host zData Chorus and use it by default in Chorus installation

Title is self explanatory. Need to host zData Chorus most likely on S3 and use that URI in the installation by default.

Run gpsegrecover when starting a single Greenplum segment

When a single Greenplum segment is attempting to be started assume it failed and attempt to run gpsegrecover. Potentially check if cluster is started (master/standby master are started). If the Greenplum cluster isn't started don't do anything.

Inheriting of off PHD 3.0 stack version won't copy over local repo files

Inheriting off off PHD 3.0 stack version won't copy over local repo files, yet inheriting off 2.1 works just fine. It would be nice to inherit off of the latest.

Refactor post_copy_commands in Greenplum installation

Various modifications can be made to the post_copy_commands section in the Greenplum installation procedure.
The sed to change GPHOME in greenplum_path.sh does not need to be run on all hosts, on the sym link creation does. This should allow a restructuring of the code which will be more clear.

Provide GUI for Greenplum Failover and Failback

Provide some sort of interface in Ambari to failover a Greenplum cluster to its standby, and Failback to master once it's fixed.

Greenplum master data directory

Default path to gp master data directory should be /data1/.
If you set it to /data1/gpseg-1, then Greenplum will create /data1/gpseg-1/gpseg-1

Run benchmark post installation of Greenplum

Look into using fios and iperf instead of gpcheckperf, or all of the above.

Also potentially make it a custom command?

Make selinux status check more comprehensive in before-START hook

SELinux status check should be more comprehensive, currently only checks if the file exists, doesn't check its contents (which can be set to 0).

hawq_master.py hardcoding hdfs user

We need to fix hawq_master.py to get the hdfs username and not assume a hardcoded value of "hdfs" for the user that owns the root hdfs directory.

Cannot create HAWQ segment on any server other than HAWQ master

I believe it has to do with the pg_hba.conf file being misconfigured, but not sure yet.

Will look into further and update results here.

Add basic makefile for installation

Create basic autotools for installing and uninstalling service correctly. This'll alleviate any issues with installing into an already operational Ambari server, and allow how this is done to change (which'll be important when the project transitions from a stack version to its own stack).

Remote datanodes cannot connect to namenode with vagrant environment

Remote datanodes have the following in their logs when started and cannot connect to the namenode. The namenode can be pinged and ssh'd into though.

INFO  ipc.Client (Client.java:handleConnectionFailure(783)) - Retrying connect to server: master.ambaricluster.local/172.28.128.3:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

Host Role in invalid state

When stopping the Greenplum service on branch feature/Pivotal-Ambari, " I get Host Role in invalid state"

Find a better way to learn JAVA_HOME

JAVA_HOME is hardcoded in PXF/package/scripts/params.py as a workaround. It's pointing to where Ambari Server installs the jdk at which is at /usr/jdk64/jdk1.7.0_67/jre.

Java may be installed elseware. Using bigtop-detect-javahome to auto detect Java doesn't work as the possible path isn't included. We need to just write something simple to find java on the system and put it in a utility function.

Cleanup stack_advisor

Stack advisor was originally taken from HDP stack, needs a serious overhaul.
Old decrepit code needs to be removed, possibly create a class with general code and inherit from it for zData stack specific code.

Distribute gpadmin keys manually for Greenplum installation.

Greenplum's gpadmin keys should be distributed manually through the root user. This will simplify the work, and allow gpadmin user to be created without a password.

Install gpperfmon with Greenplum

Install gpperfmon with Greenplum if specified in the configurations. Look at https://wiki.zdatainc.com/index.php/Greenplum_Command_Center#Installation

Only create two data directories

Current default data directory structure:

/data1/primary/gpsegX
/data2/primary/gpsegX
...
/dataN/primary/gpsegN

New structure should be:

/data1/primary/
    gpsegX
    gpsegX
/data2/primary/
    gpsegX
    gpsegX

Will need to implement some sort of pattern expression for data and mirror data directory templates.

Known_host issue arising when Greenplum master is not installed on same machine as Ambari

Gpseginstall breaks because it doesn't have id_rsa in root's ~/.ssh/ directory for the cluster or if you don't have known_hosts filled out with public keys from the other nodes in the cluster.

Solution: Populate known_hosts before gpseginstall by running something like this:

while read host; do ssh-keyscan $host >> ~/.ssh/known_hosts; done < /usr/local/greenplum-db/greenplum_hosts

zdata-inc / ambari-extensions Goto Github PK

ambari-extensions's Issues

Recommend Projects

Recommend Topics

Recommend Org