Code Monkey home page Code Monkey logo

ansible-spark-cluster's Issues

jdk download fails when running common/tasks/java.yml

The following failure is occurring when attempt to run the setup-ambari-cluster playbook:

fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u144-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.rpm"}

After thinking this might be because 8u144 is no longer available, I attempted to update the files to use 8u152 after determining the appropriate build identifier and md5 hash, but got the same kind of issue:

fatal: [elyra-wtf2]: FAILED! => {"changed": false, "dest": "/tmp/ansible-install/jdk-8u152-linux-x64.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "http://download.oracle.com/otn-pub/java/jdk/8u152-b16/b6979be30bdc4077dc93cd99134ad84d/jdk-8u152-linux-x64.rpm"}

Since there doesn't seem to be a way to "get the latest", it would be nice to figure out a better way to determine the download url and what is causing the 404 exception.

The workaround is to download the appropriate rpm file to /tmp/ansible-install on each node, then update roles/common/defaults/main.yml with any file name changes, roles/common/tasks/main.yml to not delete the install_temp_dir directory, and roles/common/tasks/java.yml to not delete the rpm from the install_temp_dir and not perform the download.

Unable to see submitted job on spark history journal?

I am trying to run some jobs on the spark-cluster, the job finish but i am not able to see submitted jobs on spark history journal:

screen shot 2018-09-09 at 12 59 33 pm

But nothing show up on spark history:
screen shot 2018-09-09 at 1 23 15 pm

Just to add to this, i installed pyspark on anaconda by running conda install -c conda-forge pyspark to be able to load pyspark module.

Can't run `setup-enterprise-gateway.yml` with remote hosts (Mac -> Ambari cluster)

When running playbook setup-enterprise-gateway.yml on my Mac to setup Enterprise Gateway on a remote Ambari cluster, it fails at TASK [notebook : download and install elyra] with error:

TASK [notebook : download and install elyra] *******************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect 
to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n", 
"unreachable": true}

A bit more context (no errors prior):

...

TASK [notebook : debug] *****************************************************************************************************************************************************
ok: [notagain-node-1] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-2] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-3] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}
ok: [notagain-node-4] => {
    "msg": "Downloading Elyra: http://9.30.252.137/dist/elyra/jupyter_enterprise_gateway-0.7.0.dev0-py2.py3-none-any.whl"
}

TASK [notebook : download and install elyra] ********************************************************************************************************************************
fatal: [notagain-node-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host localhost port 22: Connection refused\r\n", "unreachable": true}

NO MORE HOSTS LEFT **********************************************************************************************************************************************************
	to retry, use: --limit @/Users/ckadner/PycharmProjects/spark-cluster-install/setup-enterprise-gateway.retry

PLAY RECAP ******************************************************************************************************************************************************************
notagain-node-1            : ok=24   changed=11   unreachable=1    failed=0   
notagain-node-2            : ok=22   changed=9    unreachable=0    failed=0   
notagain-node-3            : ok=22   changed=9    unreachable=0    failed=0   
notagain-node-4            : ok=22   changed=9    unreachable=0    failed=0   

Detailed steps for setting up a working development environment

Use standard ansible scripts to setup the environment on a brand new stack/cluster with 4 nodes based on Redhat 7.3. Here are the steps from @sxguo to setup the environment on the master node:

  • SSH into the master node as root.
  • Installing Ansible on RHEL - Execute the following steps on the master node:
  • Updating Ansible configuration on the master node
    • Add(read uncomment) the following configuration in /etc/ansible/ansible.cfg

      [defaults]
      host_key_checking = False
      hash_behaviour = merge

  • git clone https://github.com/lresende/spark-cluster-install on your local machine
  • Zip up spark-cluster-install folder on local machine and upload the archive to the master node
  • Unzip the archive to create spark-cluster-install folder on the master node
  • cd spark-cluster-install on the master node
  • Edit hosts-fyre-spark with node names/ips for your cluster on the master node
  • Execute ansible-playbook --verbose setup-ambari-cluster.yml -i hosts-fyre-spark on the master node

Once this is done, start Enterprise Gateway on the master node as shown below:

$ cd /opt/elyra/bin
$ start_elyra.sh

This will result in the following exception:

[E 2017-10-13 15:10:28.215 EnterpriseGatewayApp] Exception 'AuthenticationException' 
occurred when creating a SSHClient connecting to '172.16.193.76' with user 'elyra', 
message='Authentication failed.'.

Note that EG_REMOTE_USER is set to elyra in the /opt/elyra/bin/start_elyra.sh. Change the vaue of EG_REMOTE_USER to root in /opt/elyra/bin/start_elyra.sh and run it again. This time, it will launch Enterprise Gateway successfully.

Stuck on task "verify connection to ambari-server port 8081"

My hosts:

[master]
holycow-node-1   ansible_host=holycow-node-1.fyre.ibm.com   ansible_host_id=1

[nodes]
holycow-node-2   ansible_host=holycow-node-2.fyre.ibm.com   ansible_host_id=2
holycow-node-3   ansible_host=holycow-node-3.fyre.ibm.com   ansible_host_id=3
holycow-node-4   ansible_host=holycow-node-4.fyre.ibm.com   ansible_host_id=4

Command:

ansible-playbook --verbose setup-ambari.yml -i hosts-fyre -c paramiko

Log:

TASK [ambari : restart ambari-server on master node] **************************************************************************************************************************************
skipping: [holycow-node-2] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-3] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
skipping: [holycow-node-4] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
changed: [holycow-node-1] => {"changed": true, "cmd": "/usr/sbin/ambari-server restart", "delta": "0:00:20.229347", "end": "2017-11-09 17:05:04.540365", "failed": false, "rc": 0, "start": "2017-11-09 17:04:44.311018", "stderr": "", "stderr_lines": [], "stdout": "Using python  /usr/bin/python\nRestarting ambari-server\nAmbari Server is not running\nAmbari Server running with administrator privileges.\nOrganizing resource files at /var/lib/ambari-server/resources...\nAmbari database consistency check started...\nServer PID at: /var/run/ambari-server/ambari-server.pid\nServer out at: /var/log/ambari-server/ambari-server.out\nServer log at: /var/log/ambari-server/ambari-server.log\nWaiting for server start......................\nServer started listening on 8081\n\nDB configs consistency check: no errors and warnings were found.", "stdout_lines": ["Using python  /usr/bin/python", "Restarting ambari-server", "Ambari Server is not running", "Ambari Server running with administrator privileges.", "Organizing resource files at /var/lib/ambari-server/resources...", "Ambari database consistency check started...", "Server PID at: /var/run/ambari-server/ambari-server.pid", "Server out at: /var/log/ambari-server/ambari-server.out", "Server log at: /var/log/ambari-server/ambari-server.log", "Waiting for server start......................", "Server started listening on 8081", "", "DB configs consistency check: no errors and warnings were found."]}

TASK [ambari : verify connection to ambari-server port 8081] ******************************************************************************************************************************
Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': Enter passphrase for key '/Users/ckadner/.ssh/id_rsa': ok: [holycow-node-1] => {"attempts": 1, "cache_control": "no-store", "changed": false, "connection": "close", "content_type": "text/plain", "cookies": {"AMBARISESSIONID": "5v***ew2a63t"}, "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed": false, "msg": "OK (unknown bytes)", "pragma": "no-cache", "redirected": false, "set_cookie": "AMBARISESSIONID=5v***ew2a63t;Path=/;HttpOnly", "status": 200, 
"url": "http://holycow-node-1:8081/api/v1/hosts", "user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "vary": "Accept-Encoding, User-Agent", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}

Notice URL "url": "http://holycow-node-1:8081/api/v1/hosts"

Should be "url": "http://holycow-node-1.fyre.ibm.com:8081/api/v1/hosts"

The problem:

https://github.com/lresende/spark-cluster-install/blob/9fea4448ef5ff927194a83dd3f33d9bd992db1a6/roles/ambari/tasks/install.yml#L62

which is missing the ansible_domain qualifier like so:

url: "http://{{ groups['master'][0] }}.{{ ansible_domain }}:8081/api/v1/hosts"

I will create a PR shortly.

@marcindulak -- FYI

Enable to load kernels using entreprise_gateway?

Thanks for this solution, we love the idea of connecting jupyter notebooks to spark cluster. I gone through the following ansible playbook and i was able to get the setup up and running on gcloud compute engine. Right now i am facing an issue while trying to connecting my jupyter notebook to the cluster [W 21:49:37.661 NotebookApp] Error loading kernelspec 'python3' and i am not able to catch what i am missing. I follow up the following instruction to connect my notebook:

export KG_URL=http://spark-master:8888 export KG_HTTP_USER=elyra export KG_HTTP_PASS= export KG_REQUEST_TIMEOUT=30 export KERNEL_USERNAME=${KG_HTTP_USER} jupyter notebook \ --NotebookApp.session_manager_class=nb2kg.managers.SessionManager \ --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \ --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager

Any help on this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.