zdata-inc / ambari-extensions Goto Github PK

zData Ambari Stack containing HAWQ, Chorus, and Greenplum

Home Page: http://zdata-inc.github.io/ambari-extensions

Shell 10.64% Python 88.62% Makefile 0.65% M4 0.09%

ambari-extensions's Introduction

zData Ambari Extensions

Ambari is a tool which makes provisioning, managing, and monitoring of Apache Hadoop deployments easy. zData's Ambari Extensions builds atop Ambari to provide easy deployment and management of HAWQ, Chorus, and soon other Pivotal technologies.

The master branch contains code to work with Apache Ambari. There is a pivotal branch which works with Pivotal Ambari.

Visit the project's documentation for quick start guides and more information.

Getting started with Vagrant

Requires the following plugins: vagrant-hostmanager, vagrant-reload

vagrant plugin install vagrant-hostmanager
vagrant plugin install vagrant-reload

vagrant plugin install vagrant-aws # Optional, provision on AWS
vagrant plugin install vagrant-cachier # Optional, cache downloaded packages to speed up provisioning

Create boxes with Virtualbox
```
vagrant up # Bring up master, slave1
```
Note: Copy vagrant-env.conf.sample to vagrant-env.conf and modify the values to change various vagrant settings such as the number of slave machines.

Connect, vms created: master, slave0, slave1

master.ambaricluster.local
slave1.ambaricluster.local

vagrant ssh master
vagrant ssh slave1

Additional steps to deploy to AWS:

Install a dummy box:

vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

Configure Vagrant with your unique Amazon access key and secret. Creating a user Vagrantfile in ~/.vagrant.d/Vagrantfile with:

Vagrant.configure('2') do |config|
    config.vm.provider :aws do |aws, override|
        aws.access_key_id = ENV['AWS_KEY']
        aws.secret_access_key = ENV['AWS_SECRET']
    end
end

Add environmental variables with your AWS_KEY and AWS_SECRET in your ~/.bashrc:

export AWS_KEY="THEKEY"
export AWS_SECRET="THESECRET"

Configure Vagrant with a keypair it can use to communicate with the created boxes. Generate a new keypair. In your user Vagrantfile add the following lines in the config.vm.provider :aws block:

aws.keypair_name = 'vagrant'
override.ssh.private_key_path = '~/.ssh/aws-vagrant'

The variable keypair_name should be the name of the keypair on AWS, the variable private_key_path should be the path to the private key on your local computer.

Your user Vagrantfile should now look like:

Vagrant.configure('2') do |config|
    config.vm.provider :aws do |aws, override|
        aws.access_key_id = ENV['AWS_KEY']
        aws.secret_access_key = ENV['AWS_SECRET']
        aws.keypair_name = 'vagrant'
        override.ssh.private_key_path = '~/.ssh/aws-vagrant'
    end
end

Create the boxes on AWS

vagrant up --provider=aws --no-parallel
vagrant hostmanager

More information about getting started with Ambari using vagrant is available here.

Services

Greenplum

Installs and manages the Pivotal Greenplum database software.

What The Service Does Not Do

Does not automatically create or setup XFS filesystem.
Does not specifiy an IO scheduler of deadline.
Does not configure read-ahead.
Does not disable transparent hugepage.

Chorus

Installs and manages zData Chorus.

Minimum Tuning Values

inimum_memory = 256M
maximum_memory = 256M
young_heap_size = 128M
max_perm_size = 256M

HAWQ

Installs and manages the Pivotal HAWQ Hadoop SQL engine.

PXF

Installs and manages the Pivotal Extension Framework, patches it to work with Hortonwork's Hadoop.

Development

Writing features for both Vanilla and Pivotal

Sometimes it's possible to use the same code for both Pivotal and Apache Ambari, when this is the case you can use git to help simplify merging a feature branch to both branches.

git checkout master
git checkout -b feature/##

# Do feature

# Pivotal port
git checkout -b feature/##_pivotal_port
git rebase --onto pivotal master feature/##_pivotal_port
git checkout pivotal
git merge --ff-only feature/##_pivotal_port
git branch -D feature/##_pivotal_port

# Merge to master
git checkout master
git merge feature/##_pivotal_port

Running tests

Tests are written with unittest2, and they require the ambari source code be available.

Set an environment variable so unittests can find the Ambari's source directory:

git clone https://git-wip-us.apache.org/repos/asf/ambari.git ~/source-ambari
export AMBARI_SOURCE="$HOME/source-ambari"

Install necessary python packages
```
pip install unittest2 mock
```

Run tests:

(cd tests; python -m unittest discover)

Retrieve Artifacts

To install HAWQ you will need some files from Pivotal. You can find these files at https://network.pivotal.io/products/pivotal-hd. Create an account if you don't have one already, and place the downloaded files in the artifacts folder located in the project root.

Licensing

Ambari is an open source deployment tool, users must follow license agreements provided by Hortonworks and Pivotal software.

ambari-extensions's People

Contributors

Stargazers

Watchers

Forkers

joe2hpimn michael-ruan super-sponge thuzarwin chnnoodle

ambari-extensions's Issues

Host Role in invalid state

When stopping the Greenplum service on branch feature/Pivotal-Ambari, " I get Host Role in invalid state"

Greenplum Kerberos/LDAP integration

Look into simplifying integration of LDAP and Kerberos with Greenplum.

http://dewoods.com/blog/greenplum-ldap-guide

Only create two data directories

Current default data directory structure:

/data1/primary/gpsegX
/data2/primary/gpsegX
...
/dataN/primary/gpsegN

New structure should be:

/data1/primary/
    gpsegX
    gpsegX
/data2/primary/
    gpsegX
    gpsegX

Will need to implement some sort of pattern expression for data and mirror data directory templates.

Make Greenplum more configurable

Move Kernel Parameters, pg_hba.conf, and postgresql.conf to Ambari configurables. Look into disabling the modification of Greenplum settings which cannot be changed after instantiation.

Run gpsegrecover when starting a single Greenplum segment

When a single Greenplum segment is attempting to be started assume it failed and attempt to run gpsegrecover. Potentially check if cluster is started (master/standby master are started). If the Greenplum cluster isn't started don't do anything.

During Greenplum stops and starts segments should monitor processes

Even though the Greenplum segments are controlled by master, so no code is run on the segments in order to stop them. That said, the segments shouldn't report back successful until all their relevant processes have been stopped/started.

Each Greenplum segment should watch its processes on start and stop and return successfully when they all are started or stopped.

Fresh Install of Hawq fails because HDFS and Zookeeper aren't started

The role_command_order.json file in the zdata stack version declares not to start the HAWQ service install until HDFS and ZOOKEEPER have been started, however the ActionQueue.py of Ambari is adding the execution command to install the HAWQ service which is starting the HAWQ install prematurely.

This is taken from the ambari-agent.log file on the master:

INFO 2015-03-05 18:44:03,933 ActionQueue.py:110 - Adding EXECUTION_COMMAND for service HAWQ of cluster zdata to the queue.
INFO 2015-03-05 18:44:03,967 ActionQueue.py:203 - Executing command with id = 3-2 for role = HAWQ_MASTER of cluster zdata.

Prior to this log output, the status of both HDFS and ZOOKEEPER came back with NOT running and its status commands threw a component not running exception.

Either this is an Ambari bug, the role_command_order.json is missing information, or there is a timeout or threshold of status failures before moving onto the next command for installing another service.

Distribute gpadmin keys manually for Greenplum installation.

Greenplum's gpadmin keys should be distributed manually through the root user. This will simplify the work, and allow gpadmin user to be created without a password.

Inheriting of off PHD 3.0 stack version won't copy over local repo files

Inheriting off off PHD 3.0 stack version won't copy over local repo files, yet inheriting off 2.1 works just fine. It would be nice to inherit off of the latest.

hawq_master.py hardcoding hdfs user

We need to fix hawq_master.py to get the hdfs username and not assume a hardcoded value of "hdfs" for the user that owns the root hdfs directory.

Host zData Chorus and use it by default in Chorus installation

Title is self explanatory. Need to host zData Chorus most likely on S3 and use that URI in the installation by default.

Known_host issue arising when Greenplum master is not installed on same machine as Ambari

Gpseginstall breaks because it doesn't have id_rsa in root's ~/.ssh/ directory for the cluster or if you don't have known_hosts filled out with public keys from the other nodes in the cluster.

Solution: Populate known_hosts before gpseginstall by running something like this:

while read host; do ssh-keyscan $host >> ~/.ssh/known_hosts; done < /usr/local/greenplum-db/greenplum_hosts

Refactor post_copy_commands in Greenplum installation

Various modifications can be made to the post_copy_commands section in the Greenplum installation procedure.
The sed to change GPHOME in greenplum_path.sh does not need to be run on all hosts, on the sym link creation does. This should allow a restructuring of the code which will be more clear.

Clicking "Restart All" while the cluster is already stopped fails and blows up

Log where it breaks

stderr:   /var/lib/ambari-agent/data/errors-85.txt

2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?
stdout:   /var/lib/ambari-agent/data/output-85.txt

2015-06-10 19:13:53,820 - Could not verify stack version by calling '/usr/bin/distro-select versions > /tmp/tmp_ik32J'. Return Code: 1, Output: .
2015-06-10 19:13:53,824 - Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2015-06-10 19:13:53,838 - Skipping Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] due to not_if
2015-06-10 19:13:53,839 - Group['hadoop'] {'ignore_failures': False}
2015-06-10 19:13:53,840 - Modifying group hadoop
2015-06-10 19:13:53,854 - Group['nobody'] {'ignore_failures': False}
2015-06-10 19:13:53,854 - Modifying group nobody
2015-06-10 19:13:53,864 - Group['nagios'] {'ignore_failures': False}
2015-06-10 19:13:53,864 - Modifying group nagios
2015-06-10 19:13:53,877 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-06-10 19:13:53,877 - Modifying user nobody
2015-06-10 19:13:53,888 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-06-10 19:13:53,888 - Modifying user nagios
2015-06-10 19:13:53,898 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2015-06-10 19:13:53,898 - Modifying user ambari-qa
2015-06-10 19:13:53,911 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-06-10 19:13:53,912 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-06-10 19:13:53,920 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-06-10 19:13:53,932 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-06-10 19:13:54,130 - Execute['gpstop -a -M smart -v'] {'user': 'gpadmin'}
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
"""  
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?

Make selinux status check more comprehensive in before-START hook

SELinux status check should be more comprehensive, currently only checks if the file exists, doesn't check its contents (which can be set to 0).

Add backout code for Greenplum installation failures.

Either back out automatically if installation fails, or provide a back out script which, in addition to calling Greenplum's generated backout script if one exists, also backs out the changes made while installing Greenplum by Ambari.

Change default values to production ready.

Force setting of number of segments per node.
Set port to 5432.
Remove default database name.

Cannot create HAWQ segment on any server other than HAWQ master

I believe it has to do with the pg_hba.conf file being misconfigured, but not sure yet.

Will look into further and update results here.

Find a better way to learn JAVA_HOME

JAVA_HOME is hardcoded in PXF/package/scripts/params.py as a workaround. It's pointing to where Ambari Server installs the jdk at which is at /usr/jdk64/jdk1.7.0_67/jre.

Java may be installed elseware. Using bigtop-detect-javahome to auto detect Java doesn't work as the possible path isn't included. We need to just write something simple to find java on the system and put it in a utility function.

Update documentation

Update Greenplum code documentation, cleanup.
Update HAWQ code documentation, cleanup.
Update PXF code documentation, cleanup.
Update github site documentation.
Update readme.md.

Use our make script to define which services should be packaged up

We have overlapping definitions for HAWQ and PXF with PHD 3.0.0. I would like the option to release a tar file that only contains Greenplum and no other service definitions. Also we may not want to package and release Minecraft to customers. It would be nice to have an "includes" where we list which services we want bundled up.

Install gpperfmon with Greenplum

Install gpperfmon with Greenplum if specified in the configurations. Look at https://wiki.zdatainc.com/index.php/Greenplum_Command_Center#Installation

Add Redis Service

Add basic Redis service.

Questions to be answered:

Allow high availability configurations with Sentinel? (Requires more work)
Allow non-restart reconfigurations via set command? (Alternative is faster but requires restarts for any config changes)
Sentinels automatically be installed on all Redis nodes, or have manual distribution as a type of slave?

Add basic makefile for installation

Create basic autotools for installing and uninstalling service correctly. This'll alleviate any issues with installing into an already operational Ambari server, and allow how this is done to change (which'll be important when the project transitions from a stack version to its own stack).

Add GPCC to installation of Greenplum?

Add GPCC to Ambari, either as its own service (most likely), or as an installation option during Greenplum configuration.

https://wiki.zdatainc.com/index.php/Greenplum_Command_Center

checkhdfs segfaults during gpinitsystem

gpinitsystem runs Pivotal's checkhdfs command, which segfaults. The segfault is assumed to occur because checkhdfs is written by Pivotal, but the hadoop variant is provided by Hortonworks.

The solution should probably just be to stop this command from running during gpinitsystem, and allow it to run when/if we eventually migrate over to PHD (Pivotal HD), from HDP (Hortonworks Data Platform).

Allow the creation of a hot standby master for Greenplum

Allow Greenplum installations to have a hot standby master.
This is done by allowing a cardinality of 1-2 for master, and setting the second as a hot standby master.

Look into automatically attempting to recover master segment if it is being started and had previously failed. Not sure if possible, the information required to make such a decision may not be available from Ambari to discern between a normal cluster start and a master recovery start. If it is not possible create a custom action.

Remote datanodes cannot connect to namenode with vagrant environment

Remote datanodes have the following in their logs when started and cannot connect to the namenode. The namenode can be pinged and ssh'd into though.

INFO  ipc.Client (Client.java:handleConnectionFailure(783)) - Retrying connect to server: master.ambaricluster.local/172.28.128.3:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

Run benchmark post installation of Greenplum

Look into using fios and iperf instead of gpcheckperf, or all of the above.

Also potentially make it a custom command?

Cleanup bootstrap files for Vagrant/ Development

Bootstrap files have been cleaned up on development, and those changes cherry-pick'd to release/0.4.x.

Now on development further changes need to be made such as removing the master-bootstrap.sh script, and allowing a selection of functions to be made by passing in a name for the first argument. Examples being: 'pivotal' and 'vanilla'.

Greenplum master data directory

Default path to gp master data directory should be /data1/.
If you set it to /data1/gpseg-1, then Greenplum will create /data1/gpseg-1/gpseg-1

Warning thrown during gpinitsystem if initdb has extended SELinux ACLs

This is addressing the generated output in the logs:

[WARN]:-File permission mismatch.  The gpadmin owns the Greenplum Database installation directory.
[WARN]:-You are currently logged in as gpadmin and may not have sufficient
[WARN]:-permissions to run the Greenplum binaries and management utilities.

The issue stems from how gpinitsystem (specifically /usr/local/hawq/bin/lib/gp_bash_functions.sh, which is called by gpinitsystem) checks the file permissions for the files located in /usr/local/hawq. It does this by running ls -la on the file /usr/local/hawq/bin/initdb and running the sed commands sed -e 's/...\(.\)....../\1/g' (to retrieve the user executable permission), and -e 's/......\(.\).../\1/g' (to retrieve the group executable permission).

The issue becomes obvious when looking at the output of the ls command on an SELinux enabled system (even if it is set to permissive):

-rwxr-xr-x. 1 gpadmin root 463,708 Aug  8  2014 initdb*
          ^ Note the dot

Without the trailing dot (which signifies there are extended SELinux ACLs on the file) the sed used works correctly. With it though the sed returns 'x.', as the last dot is never matched, and therefore isn't replaced.