Code Monkey home page Code Monkey logo

ambari-extensions's Introduction

zData Ambari Extensions

Ambari is a tool which makes provisioning, managing, and monitoring of Apache Hadoop deployments easy. zData's Ambari Extensions builds atop Ambari to provide easy deployment and management of HAWQ, Chorus, and soon other Pivotal technologies.

The master branch contains code to work with Apache Ambari. There is a pivotal branch which works with Pivotal Ambari.

Visit the project's documentation for quick start guides and more information.

Getting started with Vagrant

  1. Requires the following plugins: vagrant-hostmanager, vagrant-reload

    vagrant plugin install vagrant-hostmanager
    vagrant plugin install vagrant-reload
    
    vagrant plugin install vagrant-aws # Optional, provision on AWS
    vagrant plugin install vagrant-cachier # Optional, cache downloaded packages to speed up provisioning
  2. Create boxes with Virtualbox

    vagrant up # Bring up master, slave1

    Note: Copy vagrant-env.conf.sample to vagrant-env.conf and modify the values to change various vagrant settings such as the number of slave machines.

  3. Connect, vms created: master, slave0, slave1

    master.ambaricluster.local
    slave1.ambaricluster.local
    
    vagrant ssh master
    vagrant ssh slave1
    

Additional steps to deploy to AWS:

  1. Install a dummy box:

    vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box
  2. Configure Vagrant with your unique Amazon access key and secret. Creating a user Vagrantfile in ~/.vagrant.d/Vagrantfile with:

    Vagrant.configure('2') do |config|
        config.vm.provider :aws do |aws, override|
            aws.access_key_id = ENV['AWS_KEY']
            aws.secret_access_key = ENV['AWS_SECRET']
        end
    end

    Add environmental variables with your AWS_KEY and AWS_SECRET in your ~/.bashrc:

    export AWS_KEY="THEKEY"
    export AWS_SECRET="THESECRET"
  3. Configure Vagrant with a keypair it can use to communicate with the created boxes. Generate a new keypair. In your user Vagrantfile add the following lines in the config.vm.provider :aws block:

    aws.keypair_name = 'vagrant'
    override.ssh.private_key_path = '~/.ssh/aws-vagrant'

    The variable keypair_name should be the name of the keypair on AWS, the variable private_key_path should be the path to the private key on your local computer.

    Your user Vagrantfile should now look like:

    Vagrant.configure('2') do |config|
        config.vm.provider :aws do |aws, override|
            aws.access_key_id = ENV['AWS_KEY']
            aws.secret_access_key = ENV['AWS_SECRET']
            aws.keypair_name = 'vagrant'
            override.ssh.private_key_path = '~/.ssh/aws-vagrant'
        end
    end
  4. Create the boxes on AWS

    vagrant up --provider=aws --no-parallel
    vagrant hostmanager

More information about getting started with Ambari using vagrant is available here.

Services

Greenplum

Installs and manages the Pivotal Greenplum database software.

What The Service Does Not Do
  • Does not automatically create or setup XFS filesystem.
  • Does not specifiy an IO scheduler of deadline.
  • Does not configure read-ahead.
  • Does not disable transparent hugepage.

Chorus

Installs and manages zData Chorus.

Minimum Tuning Values
  • inimum_memory = 256M
  • maximum_memory = 256M
  • young_heap_size = 128M
  • max_perm_size = 256M

HAWQ

Installs and manages the Pivotal HAWQ Hadoop SQL engine.

PXF

Installs and manages the Pivotal Extension Framework, patches it to work with Hortonwork's Hadoop.

Development

Writing features for both Vanilla and Pivotal

Sometimes it's possible to use the same code for both Pivotal and Apache Ambari, when this is the case you can use git to help simplify merging a feature branch to both branches.

git checkout master
git checkout -b feature/##

# Do feature

# Pivotal port
git checkout -b feature/##_pivotal_port
git rebase --onto pivotal master feature/##_pivotal_port
git checkout pivotal
git merge --ff-only feature/##_pivotal_port
git branch -D feature/##_pivotal_port

# Merge to master
git checkout master
git merge feature/##_pivotal_port

Running tests

Tests are written with unittest2, and they require the ambari source code be available.

  1. Set an environment variable so unittests can find the Ambari's source directory:

    git clone https://git-wip-us.apache.org/repos/asf/ambari.git ~/source-ambari
    export AMBARI_SOURCE="$HOME/source-ambari"
    
  2. Install necessary python packages

    pip install unittest2 mock
    
  3. Run tests:

    (cd tests; python -m unittest discover)
    

Retrieve Artifacts

To install HAWQ you will need some files from Pivotal. You can find these files at https://network.pivotal.io/products/pivotal-hd. Create an account if you don't have one already, and place the downloaded files in the artifacts folder located in the project root.

Licensing

Ambari is an open source deployment tool, users must follow license agreements provided by Hortonworks and Pivotal software.

ambari-extensions's People

Contributors

bdelamotte avatar tommyku2081 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ambari-extensions's Issues

Host Role in invalid state

When stopping the Greenplum service on branch feature/Pivotal-Ambari, " I get Host Role in invalid state"

Only create two data directories

Current default data directory structure:

/data1/primary/gpsegX
/data2/primary/gpsegX
...
/dataN/primary/gpsegN

New structure should be:

/data1/primary/
    gpsegX
    gpsegX
/data2/primary/
    gpsegX
    gpsegX

Will need to implement some sort of pattern expression for data and mirror data directory templates.

Make Greenplum more configurable

Move Kernel Parameters, pg_hba.conf, and postgresql.conf to Ambari configurables. Look into disabling the modification of Greenplum settings which cannot be changed after instantiation.

Run gpsegrecover when starting a single Greenplum segment

When a single Greenplum segment is attempting to be started assume it failed and attempt to run gpsegrecover. Potentially check if cluster is started (master/standby master are started). If the Greenplum cluster isn't started don't do anything.

During Greenplum stops and starts segments should monitor processes

Even though the Greenplum segments are controlled by master, so no code is run on the segments in order to stop them. That said, the segments shouldn't report back successful until all their relevant processes have been stopped/started.

Each Greenplum segment should watch its processes on start and stop and return successfully when they all are started or stopped.

Fresh Install of Hawq fails because HDFS and Zookeeper aren't started

The role_command_order.json file in the zdata stack version declares not to start the HAWQ service install until HDFS and ZOOKEEPER have been started, however the ActionQueue.py of Ambari is adding the execution command to install the HAWQ service which is starting the HAWQ install prematurely.

This is taken from the ambari-agent.log file on the master:

INFO 2015-03-05 18:44:03,933 ActionQueue.py:110 - Adding EXECUTION_COMMAND for service HAWQ of cluster zdata to the queue.
INFO 2015-03-05 18:44:03,967 ActionQueue.py:203 - Executing command with id = 3-2 for role = HAWQ_MASTER of cluster zdata.

Prior to this log output, the status of both HDFS and ZOOKEEPER came back with NOT running and its status commands threw a component not running exception.

Either this is an Ambari bug, the role_command_order.json is missing information, or there is a timeout or threshold of status failures before moving onto the next command for installing another service.

hawq_master.py hardcoding hdfs user

We need to fix hawq_master.py to get the hdfs username and not assume a hardcoded value of "hdfs" for the user that owns the root hdfs directory.

Known_host issue arising when Greenplum master is not installed on same machine as Ambari

Gpseginstall breaks because it doesn't have id_rsa in root's ~/.ssh/ directory for the cluster or if you don't have known_hosts filled out with public keys from the other nodes in the cluster.

Solution: Populate known_hosts before gpseginstall by running something like this:

while read host; do ssh-keyscan $host >> ~/.ssh/known_hosts; done < /usr/local/greenplum-db/greenplum_hosts

Refactor post_copy_commands in Greenplum installation

Various modifications can be made to the post_copy_commands section in the Greenplum installation procedure.
The sed to change GPHOME in greenplum_path.sh does not need to be run on all hosts, on the sym link creation does. This should allow a restructuring of the code which will be more clear.

Clicking "Restart All" while the cluster is already stopped fails and blows up

Log where it breaks

stderr:   /var/lib/ambari-agent/data/errors-85.txt

2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?
stdout:   /var/lib/ambari-agent/data/output-85.txt

2015-06-10 19:13:53,820 - Could not verify stack version by calling '/usr/bin/distro-select versions > /tmp/tmp_ik32J'. Return Code: 1, Output: .
2015-06-10 19:13:53,824 - Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2015-06-10 19:13:53,838 - Skipping Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] due to not_if
2015-06-10 19:13:53,839 - Group['hadoop'] {'ignore_failures': False}
2015-06-10 19:13:53,840 - Modifying group hadoop
2015-06-10 19:13:53,854 - Group['nobody'] {'ignore_failures': False}
2015-06-10 19:13:53,854 - Modifying group nobody
2015-06-10 19:13:53,864 - Group['nagios'] {'ignore_failures': False}
2015-06-10 19:13:53,864 - Modifying group nagios
2015-06-10 19:13:53,877 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-06-10 19:13:53,877 - Modifying user nobody
2015-06-10 19:13:53,888 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-06-10 19:13:53,888 - Modifying user nagios
2015-06-10 19:13:53,898 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2015-06-10 19:13:53,898 - Modifying user ambari-qa
2015-06-10 19:13:53,911 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-06-10 19:13:53,912 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-06-10 19:13:53,920 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-06-10 19:13:53,932 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-06-10 19:13:54,130 - Execute['gpstop -a -M smart -v'] {'user': 'gpadmin'}
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
"""  
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?

Add backout code for Greenplum installation failures.

Either back out automatically if installation fails, or provide a back out script which, in addition to calling Greenplum's generated backout script if one exists, also backs out the changes made while installing Greenplum by Ambari.

Find a better way to learn JAVA_HOME

JAVA_HOME is hardcoded in PXF/package/scripts/params.py as a workaround. It's pointing to where Ambari Server installs the jdk at which is at /usr/jdk64/jdk1.7.0_67/jre.

Java may be installed elseware. Using bigtop-detect-javahome to auto detect Java doesn't work as the possible path isn't included. We need to just write something simple to find java on the system and put it in a utility function.

Update documentation

  • Update Greenplum code documentation, cleanup.
  • Update HAWQ code documentation, cleanup.
  • Update PXF code documentation, cleanup.
  • Update github site documentation.
  • Update readme.md.

Use our make script to define which services should be packaged up

We have overlapping definitions for HAWQ and PXF with PHD 3.0.0. I would like the option to release a tar file that only contains Greenplum and no other service definitions. Also we may not want to package and release Minecraft to customers. It would be nice to have an "includes" where we list which services we want bundled up.

Add Redis Service

Add basic Redis service.

Questions to be answered:

  • Allow high availability configurations with Sentinel? (Requires more work)
  • Allow non-restart reconfigurations via set command? (Alternative is faster but requires restarts for any config changes)
  • Sentinels automatically be installed on all Redis nodes, or have manual distribution as a type of slave?

Add basic makefile for installation

Create basic autotools for installing and uninstalling service correctly. This'll alleviate any issues with installing into an already operational Ambari server, and allow how this is done to change (which'll be important when the project transitions from a stack version to its own stack).

checkhdfs segfaults during gpinitsystem

gpinitsystem runs Pivotal's checkhdfs command, which segfaults. The segfault is assumed to occur because checkhdfs is written by Pivotal, but the hadoop variant is provided by Hortonworks.

The solution should probably just be to stop this command from running during gpinitsystem, and allow it to run when/if we eventually migrate over to PHD (Pivotal HD), from HDP (Hortonworks Data Platform).

Allow the creation of a hot standby master for Greenplum

Allow Greenplum installations to have a hot standby master.
This is done by allowing a cardinality of 1-2 for master, and setting the second as a hot standby master.

Look into automatically attempting to recover master segment if it is being started and had previously failed. Not sure if possible, the information required to make such a decision may not be available from Ambari to discern between a normal cluster start and a master recovery start. If it is not possible create a custom action.

Remote datanodes cannot connect to namenode with vagrant environment

Remote datanodes have the following in their logs when started and cannot connect to the namenode. The namenode can be pinged and ssh'd into though.

INFO  ipc.Client (Client.java:handleConnectionFailure(783)) - Retrying connect to server: master.ambaricluster.local/172.28.128.3:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

Cleanup bootstrap files for Vagrant/ Development

Bootstrap files have been cleaned up on development, and those changes cherry-pick'd to release/0.4.x.

Now on development further changes need to be made such as removing the master-bootstrap.sh script, and allowing a selection of functions to be made by passing in a name for the first argument. Examples being: 'pivotal' and 'vanilla'.

Greenplum master data directory

Default path to gp master data directory should be /data1/.
If you set it to /data1/gpseg-1, then Greenplum will create /data1/gpseg-1/gpseg-1

Warning thrown during gpinitsystem if initdb has extended SELinux ACLs

This is addressing the generated output in the logs:

[WARN]:-File permission mismatch.  The gpadmin owns the Greenplum Database installation directory.
[WARN]:-You are currently logged in as gpadmin and may not have sufficient
[WARN]:-permissions to run the Greenplum binaries and management utilities.

The issue stems from how gpinitsystem (specifically /usr/local/hawq/bin/lib/gp_bash_functions.sh, which is called by gpinitsystem) checks the file permissions for the files located in /usr/local/hawq. It does this by running ls -la on the file /usr/local/hawq/bin/initdb and running the sed commands sed -e 's/...\(.\)....../\1/g' (to retrieve the user executable permission), and -e 's/......\(.\).../\1/g' (to retrieve the group executable permission).

The issue becomes obvious when looking at the output of the ls command on an SELinux enabled system (even if it is set to permissive):

-rwxr-xr-x. 1 gpadmin root 463,708 Aug  8  2014 initdb*
          ^ Note the dot

Without the trailing dot (which signifies there are extended SELinux ACLs on the file) the sed used works correctly. With it though the sed returns 'x.', as the last dot is never matched, and therefore isn't replaced.

Cleanup stack_advisor

Stack advisor was originally taken from HDP stack, needs a serious overhaul.
Old decrepit code needs to be removed, possibly create a class with general code and inherit from it for zData stack specific code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.