Code Monkey home page Code Monkey logo

ambari-extensions's Issues

Warning thrown during gpinitsystem if initdb has extended SELinux ACLs

This is addressing the generated output in the logs:

[WARN]:-File permission mismatch.  The gpadmin owns the Greenplum Database installation directory.
[WARN]:-You are currently logged in as gpadmin and may not have sufficient
[WARN]:-permissions to run the Greenplum binaries and management utilities.

The issue stems from how gpinitsystem (specifically /usr/local/hawq/bin/lib/gp_bash_functions.sh, which is called by gpinitsystem) checks the file permissions for the files located in /usr/local/hawq. It does this by running ls -la on the file /usr/local/hawq/bin/initdb and running the sed commands sed -e 's/...\(.\)....../\1/g' (to retrieve the user executable permission), and -e 's/......\(.\).../\1/g' (to retrieve the group executable permission).

The issue becomes obvious when looking at the output of the ls command on an SELinux enabled system (even if it is set to permissive):

-rwxr-xr-x. 1 gpadmin root 463,708 Aug  8  2014 initdb*
          ^ Note the dot

Without the trailing dot (which signifies there are extended SELinux ACLs on the file) the sed used works correctly. With it though the sed returns 'x.', as the last dot is never matched, and therefore isn't replaced.

Fresh Install of Hawq fails because HDFS and Zookeeper aren't started

The role_command_order.json file in the zdata stack version declares not to start the HAWQ service install until HDFS and ZOOKEEPER have been started, however the ActionQueue.py of Ambari is adding the execution command to install the HAWQ service which is starting the HAWQ install prematurely.

This is taken from the ambari-agent.log file on the master:

INFO 2015-03-05 18:44:03,933 ActionQueue.py:110 - Adding EXECUTION_COMMAND for service HAWQ of cluster zdata to the queue.
INFO 2015-03-05 18:44:03,967 ActionQueue.py:203 - Executing command with id = 3-2 for role = HAWQ_MASTER of cluster zdata.

Prior to this log output, the status of both HDFS and ZOOKEEPER came back with NOT running and its status commands threw a component not running exception.

Either this is an Ambari bug, the role_command_order.json is missing information, or there is a timeout or threshold of status failures before moving onto the next command for installing another service.

Add Redis Service

Add basic Redis service.

Questions to be answered:

  • Allow high availability configurations with Sentinel? (Requires more work)
  • Allow non-restart reconfigurations via set command? (Alternative is faster but requires restarts for any config changes)
  • Sentinels automatically be installed on all Redis nodes, or have manual distribution as a type of slave?

Update documentation

  • Update Greenplum code documentation, cleanup.
  • Update HAWQ code documentation, cleanup.
  • Update PXF code documentation, cleanup.
  • Update github site documentation.
  • Update readme.md.

Allow the creation of a hot standby master for Greenplum

Allow Greenplum installations to have a hot standby master.
This is done by allowing a cardinality of 1-2 for master, and setting the second as a hot standby master.

Look into automatically attempting to recover master segment if it is being started and had previously failed. Not sure if possible, the information required to make such a decision may not be available from Ambari to discern between a normal cluster start and a master recovery start. If it is not possible create a custom action.

Clicking "Restart All" while the cluster is already stopped fails and blows up

Log where it breaks

stderr:   /var/lib/ambari-agent/data/errors-85.txt

2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?
stdout:   /var/lib/ambari-agent/data/output-85.txt

2015-06-10 19:13:53,820 - Could not verify stack version by calling '/usr/bin/distro-select versions > /tmp/tmp_ik32J'. Return Code: 1, Output: .
2015-06-10 19:13:53,824 - Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] {'environment': ..., 'not_if': 'test -e /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip', 'ignore_failures': True, 'path': ['/bin', '/usr/bin/']}
2015-06-10 19:13:53,838 - Skipping Execute['mkdir -p /var/lib/ambari-agent/data/tmp/AMBARI-artifacts/;     curl -kf -x "" --retry 10     http://master.ambaricluster.local:8080/resources//UnlimitedJCEPolicyJDK7.zip -o /var/lib/ambari-agent/data/tmp/AMBARI-artifacts//UnlimitedJCEPolicyJDK7.zip'] due to not_if
2015-06-10 19:13:53,839 - Group['hadoop'] {'ignore_failures': False}
2015-06-10 19:13:53,840 - Modifying group hadoop
2015-06-10 19:13:53,854 - Group['nobody'] {'ignore_failures': False}
2015-06-10 19:13:53,854 - Modifying group nobody
2015-06-10 19:13:53,864 - Group['nagios'] {'ignore_failures': False}
2015-06-10 19:13:53,864 - Modifying group nagios
2015-06-10 19:13:53,877 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-06-10 19:13:53,877 - Modifying user nobody
2015-06-10 19:13:53,888 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-06-10 19:13:53,888 - Modifying user nagios
2015-06-10 19:13:53,898 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2015-06-10 19:13:53,898 - Modifying user ambari-qa
2015-06-10 19:13:53,911 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-06-10 19:13:53,912 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-06-10 19:13:53,920 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-06-10 19:13:53,932 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-06-10 19:13:54,130 - Execute['gpstop -a -M smart -v'] {'user': 'gpadmin'}
2015-06-10 19:13:54,272 - Error while executing command 'restart':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 232, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/PHD/9.9.9.zData/services/GREENPLUM/package/scripts/master.py", line 51, in stop
    user=params.admin_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
"""  
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'gpstop -a -M smart -v' returned 2. 20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -a -M smart -v
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20150610:19:13:54:030140 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20150610:19:13:54:030140 gpstop:master:gpadmin-[DEBUG]:-Read from postgresql.conf port=6543
20150610:19:13:54:030140 gpstop:master:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Greenplum instance already stopped?

Add backout code for Greenplum installation failures.

Either back out automatically if installation fails, or provide a back out script which, in addition to calling Greenplum's generated backout script if one exists, also backs out the changes made while installing Greenplum by Ambari.

Use our make script to define which services should be packaged up

We have overlapping definitions for HAWQ and PXF with PHD 3.0.0. I would like the option to release a tar file that only contains Greenplum and no other service definitions. Also we may not want to package and release Minecraft to customers. It would be nice to have an "includes" where we list which services we want bundled up.

Make Greenplum more configurable

Move Kernel Parameters, pg_hba.conf, and postgresql.conf to Ambari configurables. Look into disabling the modification of Greenplum settings which cannot be changed after instantiation.

Run gpsegrecover when starting a single Greenplum segment

When a single Greenplum segment is attempting to be started assume it failed and attempt to run gpsegrecover. Potentially check if cluster is started (master/standby master are started). If the Greenplum cluster isn't started don't do anything.

Refactor post_copy_commands in Greenplum installation

Various modifications can be made to the post_copy_commands section in the Greenplum installation procedure.
The sed to change GPHOME in greenplum_path.sh does not need to be run on all hosts, on the sym link creation does. This should allow a restructuring of the code which will be more clear.

Greenplum master data directory

Default path to gp master data directory should be /data1/.
If you set it to /data1/gpseg-1, then Greenplum will create /data1/gpseg-1/gpseg-1

hawq_master.py hardcoding hdfs user

We need to fix hawq_master.py to get the hdfs username and not assume a hardcoded value of "hdfs" for the user that owns the root hdfs directory.

Add basic makefile for installation

Create basic autotools for installing and uninstalling service correctly. This'll alleviate any issues with installing into an already operational Ambari server, and allow how this is done to change (which'll be important when the project transitions from a stack version to its own stack).

Remote datanodes cannot connect to namenode with vagrant environment

Remote datanodes have the following in their logs when started and cannot connect to the namenode. The namenode can be pinged and ssh'd into though.

INFO  ipc.Client (Client.java:handleConnectionFailure(783)) - Retrying connect to server: master.ambaricluster.local/172.28.128.3:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

Host Role in invalid state

When stopping the Greenplum service on branch feature/Pivotal-Ambari, " I get Host Role in invalid state"

Find a better way to learn JAVA_HOME

JAVA_HOME is hardcoded in PXF/package/scripts/params.py as a workaround. It's pointing to where Ambari Server installs the jdk at which is at /usr/jdk64/jdk1.7.0_67/jre.

Java may be installed elseware. Using bigtop-detect-javahome to auto detect Java doesn't work as the possible path isn't included. We need to just write something simple to find java on the system and put it in a utility function.

Cleanup stack_advisor

Stack advisor was originally taken from HDP stack, needs a serious overhaul.
Old decrepit code needs to be removed, possibly create a class with general code and inherit from it for zData stack specific code.

Only create two data directories

Current default data directory structure:

/data1/primary/gpsegX
/data2/primary/gpsegX
...
/dataN/primary/gpsegN

New structure should be:

/data1/primary/
    gpsegX
    gpsegX
/data2/primary/
    gpsegX
    gpsegX

Will need to implement some sort of pattern expression for data and mirror data directory templates.

Known_host issue arising when Greenplum master is not installed on same machine as Ambari

Gpseginstall breaks because it doesn't have id_rsa in root's ~/.ssh/ directory for the cluster or if you don't have known_hosts filled out with public keys from the other nodes in the cluster.

Solution: Populate known_hosts before gpseginstall by running something like this:

while read host; do ssh-keyscan $host >> ~/.ssh/known_hosts; done < /usr/local/greenplum-db/greenplum_hosts

checkhdfs segfaults during gpinitsystem

gpinitsystem runs Pivotal's checkhdfs command, which segfaults. The segfault is assumed to occur because checkhdfs is written by Pivotal, but the hadoop variant is provided by Hortonworks.

The solution should probably just be to stop this command from running during gpinitsystem, and allow it to run when/if we eventually migrate over to PHD (Pivotal HD), from HDP (Hortonworks Data Platform).

During Greenplum stops and starts segments should monitor processes

Even though the Greenplum segments are controlled by master, so no code is run on the segments in order to stop them. That said, the segments shouldn't report back successful until all their relevant processes have been stopped/started.

Each Greenplum segment should watch its processes on start and stop and return successfully when they all are started or stopped.

Cleanup bootstrap files for Vagrant/ Development

Bootstrap files have been cleaned up on development, and those changes cherry-pick'd to release/0.4.x.

Now on development further changes need to be made such as removing the master-bootstrap.sh script, and allowing a selection of functions to be made by passing in a name for the first argument. Examples being: 'pivotal' and 'vanilla'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.