Code Monkey home page Code Monkey logo

buildfarm_deployment's People

Contributors

ayrton04 avatar cottsay avatar cr7pt0gr4ph7 avatar dirk-thomas avatar gavanderhoorn avatar j-rivero avatar jjekircp avatar jonazpiazu avatar lucasw avatar mathias-luedtke avatar mikaelarguedas avatar nuclearsandwich avatar patrickcjh avatar pjreed avatar randominsano avatar rayman avatar tfoote avatar wjwwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

buildfarm_deployment's Issues

Make puppet more verbose on errors

Currently when running ./reconfigure.bash machine this script just exits silently even though puppet error-red. To verify that puppet installed correctly one needs to examine "/var/log/puppet.log" currently.
The proposal would be to propagate puppet error messages to the wrapper bash script or to clearly signal an error and point the user to examine the "/var/log/puppet.log" file.

New version of rtyler/jenkins module doesn't work for master deployment

If rtyler/jenkins 1.6.1 is pulled in via librarian-puppet, the master deployment fails with the following error in puppet.log:

2015-10-15 15:29:23 -0400 Puppet (err): Duplicate declaration: User[jenkins] is already declared; cannot redeclare at /root/buildfarm_deployment/master/manifests/site.pp:425 on node my-master
2015-10-15 15:29:23 -0400 Puppet (err): Duplicate declaration: User[jenkins] is already declared; cannot redeclare at /root/buildfarm_deployment/master/manifests/site.pp:425 on node my-master

There is no issue with rtyler/jenkins 1.5.0. I worked around the issue with the following change to master/Puppetfile, but did not obtain a root cause:

-mod 'rtyler/jenkins'
+mod 'rtyler/jenkins', '1.5.0'

new aufs systems take more disk space.

We need to find a way to clean up the aufs databases or else up the recommended space reqiured.

From a machine which was originally running devicemapper then switched to aufs.

182G    aufs
21M containers
838M    devicemapper
1.5M    execdriver
90M graph
14M init
148K    linkgraph.db
28K repositories-aufs
4.0K    repositories-devicemapper
4.0K    tmp
8.0K    trust
60M vfs
7.7M    volumes
183G    total

Improve documentation about ssh keys

Based on the recent question on the mailing list: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/ros-sig-buildfarm/P32OdoGFYkg/awEAi8Bx4jgJ

The description of the ssh_keys for all three machine types is pretty confusing.
I would recommend something like this:

 * Configure as many public ssh keys as you want for administrators to log in.
 * On the repo machine make sure there is public ssh key which matches the private ssh key `jenkins::private_ssh_key` provisioned on the master.

The key jenkins-slave::authorized_keys described for the repo machine seems to not exist in the config repository.
Is this a private ssh key? If so the documentation should state that explicitly.

Also the indentation of the configuration option for the repo machine seem to be wrong.

protect "newest" images from garbage collection

As a follow up on #39 a certain set of new images (e.g. not older than 12 hours) should never be deleted - even if desired free disc space constraints are not fulfilled.

This prevents images from being deleted while they are still in use.

on a fresh install master's /var/lib/jenkins/jobs is owned by root

drwxr-xr-x 4 root root 4096 Feb 27 19:23 jobs

It looks to be coming from these two files being generated:

root@master:~/buildfarm_deployment_config# ls -l /var/lib/jenkins/jobs/
total 8
drwxr-xr-x 3 root root 4096 Feb 27 19:23 indigo_rosdistro-cache
drwxr-xr-x 3 root root 4096 Feb 27 19:23 jade_rosdistro-cache

Reconfiguring will fail until jenkins has permissions on the jobs directory.

jenkins remoting issue on sourcdeb jobs

The connection was dropped at the very end and caused a failure?

http://build.ros.org:8080/job/Isrc_uS__libntcan__ubuntu_saucy__source/4/console

# BEGIN SECTION: Clean up to save disk space on slaves
06:36:46 + rm -fr sourcedeb/source
06:36:46 + echo # END SECTION
06:36:46 # END SECTION
06:36:46 SSH: Connecting from host [ip-172-31-6-103]
06:36:46 SSH: Connecting with configuration [repo] ...
06:36:46 SSH: Disconnecting configuration [repo] ...
06:36:46 SSH: Transferred 4 file(s)
06:36:46 Build step 'Send files or execute commands over SSH' changed build result to SUCCESS
06:36:51 Waiting for the completion of Irel_import-package
06:44:24 Irel_import-package #2844 completed. Result was SUCCESS
06:44:24 FATAL: java.io.IOException: Backing channel is disconnected.
06:44:24 hudson.remoting.RemotingSystemException: java.io.IOException: Backing channel is disconnected.
06:44:24    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:266)
06:44:24    at com.sun.proxy.$Proxy51.stop(Unknown Source)
06:44:24    at com.cloudbees.jenkins.plugins.sshagent.SSHAgentBuildWrapper$SSHAgentEnvironment.tearDown(SSHAgentBuildWrapper.java:407)
06:44:24    at hudson.model.Build$BuildExecution.doRun(Build.java:173)
06:44:24    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
06:44:24    at hudson.model.Run.execute(Run.java:1741)
06:44:24    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
06:44:24    at hudson.model.ResourceController.execute(ResourceController.java:98)
06:44:24    at hudson.model.Executor.run(Executor.java:408)
06:44:24 Caused by: java.io.IOException: Backing channel is disconnected.
06:44:24    at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:185)
06:44:24    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:250)
06:44:24    ... 8 more
06:44:24 [description-setter] Could not determine description.
06:44:24 ERROR: Step ‘E-mail Notification’ failed: no workspace for Isrc_uS__libntcan__ubuntu_saucy__source #4
06:44:24 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
06:44:27 Finished: FAILURE

Watching the web UI, I've also seen executors briefly reporting "Dead" in red which might be releated.

setup squid with refresh pattern

We need to support changing the refresh_pattern

From our old config we used:

# Always check for stale downloads (allow 1 minute)
refresh_pattern . 0 0 1

priority sorter plugin has new options required for job specific priorites

See changelog upgrading from 2.x: https://wiki.jenkins-ci.org/display/JENKINS/Priority+Sorter+Plugin

jenkins@ip-172-31-5-54:~$ cat jenkins.advancedqueue.PriorityConfiguration.xml
<?xml version='1.0' encoding='UTF-8'?>
<jenkins.advancedqueue.PriorityConfiguration plugin="[email protected]">
  <jobGroups class="linked-list">
    <jenkins.advancedqueue.JobGroup>
      <id>0</id>
      <priority>-1</priority>
      <jobGroupStrategy class="jenkins.advancedqueue.jobinclusion.strategy.AllJobsJobInclusionStrategy"/>
      <description>All</description>
      <runExclusive>false</runExclusive>
      <useJobFilter>false</useJobFilter>
      <jobPattern>.*</jobPattern>
      <usePriorityStrategies>true</usePriorityStrategies>
      <priorityStrategies>
        <jenkins.advancedqueue.JobGroup_-PriorityStrategyHolder>
          <id>0</id>
          <priorityStrategy class="jenkins.advancedqueue.priority.strategy.JobPropertyStrategy"/>
        </jenkins.advancedqueue.JobGroup_-PriorityStrategyHolder>
      </priorityStrategies>
    </jenkins.advancedqueue.JobGroup>
  </jobGroups>
</jenkins.advancedqueue.PriorityConfiguration>

docker cleanup script needs to delete images by name

Multiply tagged images will not delete by ID. All the tags need to be cleaned first.

foote/squid-in-a-can_pr_11                                                      latest              27bb194efb3c        13 days ago         231.9 MB
osrf/ubuntu_armhf                                                                trusty              fd2852b4b035        2 weeks ago         237.6 MB
osrf/ubuntu_32bit                                                                saucy               d6a6e4bd19d5        2 weeks ago         151.7 MB
osrf/ubuntu_32bit                                                                trusty              0978a1b4c8d1        2 weeks ago         181.6 MB
ubuntu                                                                           utopic              cfaba6b5fefe        3 weeks ago         194.4 MB
ubuntu                                                                           trusty              5ba9dab47459        3 weeks ago         188.3 MB
devel_build_and_test__indigo_rail_maps                                           latest              07e2b0812a8a        5 weeks ago         648 MB
devel_build_and_test__indigo_cob_substitute                                      latest              07e2b0812a8a        5 weeks ago         648 MB
devel_build_and_install__indigo_world_canvas                                     latest              b39ca217cff6        5 weeks ago         648 MB
devel_build_and_install__indigo_rail_maps                                        latest              b39ca217cff6        5 weeks ago         648 MB
devel_build_and_install__indigo_robot_upstart                                    latest              b39ca217cff6        5 weeks ago         648 MB
devel_build_and_install__indigo_xdot                                             latest              b39ca217cff6        5 weeks ago         648 MB
devel_build_and_install__indigo_cob_substitute                                   latest              b39ca217cff6        5 weeks ago         648 MB
devel_build_and_install__indigo_cmake_modules                                    latest              fc7af26294ac        5 weeks ago         443.6 MB
devel_build_and_install__indigo_rosprofiler                                      latest              fc7af26294ac        5 weeks ago         443.6 MB
devel_build_and_install__indigo_rosh_desktop_plugins                             latest              fc7af26294ac        5 weeks ago         443.6 MB
root@ip-172-31-10-242:~# docker rmi fc7af26294ac
Error response from daemon: Conflict, cannot delete image fc7af26294ac because it is tagged in multiple repositories, use -f to force
FATA[0000] Error: failed to remove one or more images   
root@ip-172-31-10-242:~# docker rmi devel_build_and_install__indigo_cmake_modules
Untagged: devel_build_and_install__indigo_cmake_modules:latest
root@ip-172-31-10-242:~# docker rmi fc7af26294ac
Error response from daemon: Conflict, cannot delete image fc7af26294ac because it is tagged in multiple repositories, use -f to force
FATA[0000] Error: failed to remove one or more images   
root@ip-172-31-10-242:~# docker rmi devel_build_and_install__indigo_rosprofiler devel_build_and_install__indigo_rosh_desktop_plugins
Untagged: devel_build_and_install__indigo_rosprofiler:latest
Untagged: devel_build_and_install__indigo_rosh_desktop_plugins:latest
Deleted: fc7af26294ac0fe922b7b63916a6d652b634ca04a471232eca1f34600913e296
Deleted: d9f50372f7fa6da91826d53419c7acfa5d83f4bb98548008735e61557b0b95bc
Deleted: 0bc6fb6b449bebbb2cefd80024b810132a157a8a5eb04237996a04a0ac612392
Deleted: 60742dae31c766d429f2082638d27a4f704b03b0e88c1ea54fdcbaa7e1f662be
Deleted: b11e744f87e2c68170bd53ce9d45c747a3ffe449d8b6e724b912d346a2f42776
Deleted: ae2709197dce8bc483a83883c3d64dfdb8f4eceeaff6476e5a754637e881801a
Deleted: ba555191ac6af3bcc2942c52740a497a402f89da722ec8edda94fc986f85e82c
Deleted: 94c90dee9322c07d302a08bc87b8ff21d5098027e46499fa5dc51062a0a88011
Deleted: 0b0f559469d6ae80c283e0023591e70fab5b988b622f35344d52ae7fd2d33ad6
Deleted: d9bd6d9f58d3e478fba08996372e07c7ba4f73f29f86a02bcd169aeb14505ea1
Deleted: 354569f142469561c417e1b0ceeb133a49f1c5aa729ca64b4f4b8e67d3e51883
Deleted: af43ff1f17e552f549e7caca31e598433e0504037ea09929d13ff6fb2e51f998
Deleted: eb967f4f0f9d1e22d4ab1df50fed8e46cdd3736fd1c3f252295beb537c5627ff
Deleted: 1781096ed21c936a987a13180a5c159b2251c8c706164d5e7996e8aaec787393
Deleted: 57300e65ca04e20af0752a8a7650bca134fef234b21951636bfbf9b3e0ff87c6
Deleted: f5dcca35b84c77d8ca0c3c7bdf0915930bee88fb30168ef043b58160be0081b2
Deleted: d25efa59157f57e7547d1169a7a92d4d7f3e4816832e059313fd12e20a767832
Deleted: 079d69bb15f729b75f336799ccf18af199491a6353c5be3cdd2fa320a2b273bb
Deleted: b5eb9f7fad4543bc8d34b6a8ea309633836e307cee8bce546f0f1e90c668ab46
Deleted: 83fa0bdfb111c233dc08601f924383e448e3dee175598f464397b2cdd4f3a325
Deleted: 01811a121776367df9d6c2853f5b5ac485133ea190c59250ab1cae3bdaaee8bf
Deleted: c52a6079eb4aad0a1bad431a8fb6992f445337f6d407defee03dfb78bc94a583
Deleted: 8eaa4ff06b53ff7730c4d7a7e21b4426a4b46dee064ca2d5d90d757dc7ea040a
Deleted: f62feddc05dc67da9b725361f97d7ae72a32e355ce1585f9a60d090289120f73
Deleted: 607c5d1cca71dd3b6c04327c3903363079b72ab3e5e4289d74fb00a9ac7ec2aa
Deleted: 3b363fd9d7dab4db9591058a3f43e806f6fa6f7e2744b63b2df4b84eadb0685a

docker cleanup script still interfering with builds

The following build failed: http://54.183.26.131:8080/job/Ibin_uT32__rtmbuild__ubuntu_trusty_i386__binary/3/console

The excerpt from the build console:

14:14:47 Step 23 : RUN echo "ros-indigo-message-generation: 0.2.10-0trusty" && python3 -u /tmp/wrapper_scripts/apt-get.py update-and-install -q -y ros-indigo-message-generation
14:14:47  ---> Running in 6391121c452d
14:14:48 ros-indigo-message-generation: 0.2.10-0trusty
...
14:15:01  ---> fe19c151741d
14:15:01 Removing intermediate container 6391121c452d
14:15:01 Step 24 : RUN echo "ros-indigo-openrtm-aist: 1.1.0-25trusty" && python3 -u /tmp/wrapper_scripts/apt-get.py update-and-install -q -y ros-indigo-openrtm-aist
14:15:02  ---> Running in 91aefaee0645
14:15:02 time="2015-02-07T22:15:02Z" level="info" msg="open /var/lib/docker/containers/91aefaee06458d1ffb797b37825832564e3bdc0902197254f94ef5190fa5dbba/resolv.conf: no such file or directory" 
14:15:02 Build step 'Execute shell' marked build as failure

The corresponding output from the cleanup log file:

2015-02-07 22:15:02,082 removing container 91aefaee0645
2015-02-07 22:15:02,226 successfully removed container: 91aefaee0645

Security settings do no persist beyond a

We want this snippet in the config.xml

It appears if you run the cli command for full_control, but does not

  <useSecurity>true</useSecurity>
  <authorizationStrategy class="hudson.security.FullControlOnceLoggedInAuthorizationStrategy"/>
  <securityRealm class="hudson.security.HudsonPrivateSecurityRealm">
    <disableSignup>true</disableSignup>
    <enableCaptcha>false</enableCaptcha>
  </securityRealm>

For some reason the config.xml is not being generated until you manually visit the configuration page. And without the config.xml the cli settings of this value does not persist.

docker cleanup issues

The list of currently known issues is:

  • the current state of the cleanup script still interferes with the builds: #46 (comment)
  • cleaning up until 50% disk space is free is wasting valuable resources on the slave
  • the disk cleanup is slower in freeing disk space than new data is generated while the farm is busy (which disables slaves regularly - but that should only be the exception)
  • the cleanup script is trying to remove base images which is not desired
  • does not terminate cleanly: #50 (comment)

This ticket should stay open until any remaining issues related to the docker cleanup script have been resolved. One use case which should work before closing this ticket is to perform a series of full rebuilds on the farm:

  • without any jobs failing due to docker cleanup related issues as well as
  • the farm running continuously without having to disable slaves due to running low on disk space.

jobs occationally don't have internet connecttivity

Example failed jobs:
http://54.183.26.131:8080/job/Jrel_release-status-page/1685/console
http://54.183.26.131:8080/job/Jrel_arm_release-status-page/1685/console
http://54.183.26.131:8080/job/Irel_release-status-page/1685/console
http://54.183.26.131:8080/job/Irel_arm_release-status-page/1688/console

6. Run Dockerfile - status page

Hide Details
# BEGIN SECTION: Run Dockerfile - status page 19:17:47 + rm -fr /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache 19:17:47 + rm -fr /home/jenkins-slave/workspace/Jrel_release-status-page/status_page 19:17:47 + mkdir -p /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache 19:17:47 + mkdir -p /home/jenkins-slave/workspace/Jrel_release-status-page/status_page 19:17:47 + docker run --cidfile=/home/jenkins-slave/workspace/Jrel_release-status-page/docker_generate_status_page/docker.cid --net=host -v /home/jenkins-slave/workspace/Jrel_release-status-page/ros_buildfarm:/tmp/ros_buildfarm:ro -v /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache:/tmp/debian_repo_cache -v /home/jenkins-slave/workspace/Jrel_release-status-page/status_page:/tmp/status_page status_page_generation 19:17:48 The build file contains the following targets: 19:17:48 - trusty source 19:17:48 - trusty amd64 19:17:48 - trusty i386 19:17:48 - utopic source 19:17:48 - utopic amd64 19:17:48 - utopic i386 19:17:48 - vivid source 19:17:48 - vivid amd64 19:17:48 - vivid i386 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 34, in <module> 19:18:20 main() 19:18:20 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 30, in main 19:18:20 args.cache_dir, args.output_dir, copy_resources=args.copy_resources) 19:18:20 File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 50, in build_release_status_page 19:18:20 dist = get_cached_distribution(index, rosdistro_name) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 146, in get_cached_distribution 19:18:20 cache = get_distribution_cache(index, dist_name) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 172, in get_distribution_cache 19:18:20 yaml_gz_str = load_url(url, skip_decode=True) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:18:20 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:18:20 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 59, in load_url 19:18:20 raise URLError(str(e) + ' (%s)' % url) 19:18:20 urllib.error.URLError: <urlopen error <urlopen error timed out> (http://ros.org/rosdistro/jade-cache.yaml.gz)> 19:18:20 Build step 'Execute shell' marked build as failure

A slightly different error:

6. Run Dockerfile - status page

Hide Details
# BEGIN SECTION: Run Dockerfile - status page 19:15:12 + rm -fr /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache 19:15:12 + rm -fr /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page 19:15:12 + mkdir -p /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache 19:15:12 + mkdir -p /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page 19:15:12 + docker run --cidfile=/home/jenkins-slave/workspace/Irel_arm_release-status-page/docker_generate_status_page/docker.cid --net=host -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/ros_buildfarm:/tmp/ros_buildfarm:ro -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache:/tmp/debian_repo_cache -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page:/tmp/status_page status_page_generation 19:15:14 The build file contains the following targets: 19:15:14 - trusty source 19:15:14 - trusty armhf 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:15:42 h.request(req.get_method(), req.selector, req.data, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:15:42 self._send_request(method, url, body, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:15:42 self.endheaders(body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:15:42 self._send_output(message_body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:15:42 self.send(msg) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:15:42 self.connect() 19:15:42 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:15:42 self.timeout, self.source_address) 19:15:42 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:15:42 raise err 19:15:42 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:15:42 sock.connect(sa) 19:15:42 socket.timeout: timed out 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:15:42 fh = urlopen(url, timeout=timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:15:42 return opener.open(url, data, timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 461, in open 19:15:42 response = meth(req, response) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response 19:15:42 'http', request, response, code, msg, hdrs) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 493, in error 19:15:42 result = self._call_chain(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302 19:15:42 return self.parent.open(new, timeout=req.timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:15:42 response = self._open(req, data) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:15:42 '_open', req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:15:42 return self.do_open(http.client.HTTPConnection, req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:15:42 raise URLError(err) 19:15:42 urllib.error.URLError: <urlopen error timed out> 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:15:42 h.request(req.get_method(), req.selector, req.data, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:15:42 self._send_request(method, url, body, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:15:42 self.endheaders(body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:15:42 self._send_output(message_body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:15:42 self.send(msg) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:15:42 self.connect() 19:15:42 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:15:42 self.timeout, self.source_address) 19:15:42 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:15:42 raise err 19:15:42 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:15:42 sock.connect(sa) 19:15:42 socket.timeout: timed out 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:15:42 fh = urlopen(url, timeout=timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:15:42 return opener.open(url, data, timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:15:42 response = self._open(req, data) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:15:42 '_open', req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:15:42 return self.do_open(http.client.HTTPConnection, req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:15:42 raise URLError(err) 19:15:42 urllib.error.URLError: <urlopen error timed out> 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 34, in <module> 19:15:42 main() 19:15:42 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 30, in main 19:15:42 args.cache_dir, args.output_dir, copy_resources=args.copy_resources) 19:15:42 File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 50, in build_release_status_page 19:15:42 dist = get_cached_distribution(index, rosdistro_name) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 146, in get_cached_distribution 19:15:42 cache = get_distribution_cache(index, dist_name) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 172, in get_distribution_cache 19:15:42 yaml_gz_str = load_url(url, skip_decode=True) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:15:42 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:15:42 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 65, in load_url 19:15:42 return contents.decode('utf-8') 19:15:42 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

squid service does not start on startup

It only starts after the first reconfigure on the */15

It should start at startup, or the iptables rules should wait until squid is verified to be working.

Incompatibility with tracywebtech-pip v1.3.4

Running reconfgure.bash master | slave | repo results in (from /var/log/puppet.log):

Could not find dependency Exec[install-pip3] for Exec[install-docker-py]
at /root/buildfarm_deployment/slave/modules/pip/manifests/install.pp:33

I tracked this error back to TracyWebTech/puppet-pip#5

It works if I freeze tracywebtech-pip to version 1.3.2 in Puppetfile:

mod 'tracywebtech-pip', '1.3.2'

Another solution would be to provide Exec[install-pip3] with checks for pip3, perhaps using pip::installation.

Package 'jenkins' has no installation candidate

I just wanted to try the buildfarm and run the local deployment test.

The master isn't starting, i looked in the puppet log file and noticed, I'm missing the jenkins installation candidate:

2015-08-31 10:06:50 +0000 Puppet (err): Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install jenkins' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Package jenkins is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'jenkins' has no installation candidate
2015-08-31 10:06:50 +0000 /Stage[main]/Jenkins::Package/Package[jenkins]/ensure (err): change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install jenkins' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Package jenkins is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'jenkins' has no installation candidate
2015-08-31 10:06:50 +0000 /Stage[main]/Main/File/etc/default/jenkins: Dependency Package[jenkins] has failures: true

I don't know what to do.

run jenkins slave on master

Please update the master configuration as well as the master on the test farm to run a Jenkins slave with the label "slave-on-master".

clean up still interfering with builds

Folluw up of #43 and #40. It looks like that the latest builds are still failing: http://54.183.26.131:8080/job/Jbin_uT32__tf2_ros__ubuntu_trusty_i386__binary/3/consoleFull

I logged into the slave but couldn't find any hashes in the log file matching the errors in the console output, e.g.:

Error removing intermediate container bed26c5d4211: The given container is <nil>

But it looks like there are still multiple cron jobs deleting docker stuff:

# HEADER: This file was autogenerated at 2015-02-03 00:30:17 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: docker_cleanup_images
15 */2 * * * bash -c "python3 -u /home/jenkins-slave/cleanup_docker_images.py"
# Puppet Name: docker_cleanup_containers
5 */2 * * * bash -c "docker ps -aq | xargs -L1 docker rm "

Upgrade to docker 1.9

It's supposed to fix disk space leaking.
It's supposed to fix hang on docker pull.

If it's not working roll back to 1.7.1

/etc/init.d/jenkins-slave: No such file or directory

buildfarmdeploymentconfig_slave_1 log file:
bash: /etc/init.d/jenkins-slave: No such file or directory
puppet.log:

2015-09-02 11:30:38 +0000 Puppet (info): Computing checksum on file /etc/init.d/docker
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker] (info): Filebucketed /etc/init.d/docker to puppet with sum 870b6e52007eca22390f36e3b2547954
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker]/ensure (notice): ensure changed 'file' to 'link'
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker] (info): Scheduling refresh of Service[docker]
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/Service[docker] (err): Could not evaluate: undefined method `[]' for nil:NilClass
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/Service[docker] (notice): Triggered 'refresh' from 2 events
2015-09-02 11:30:38 +0000 /User[jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /User[jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/cleanup_docker_images.py] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/cleanup_docker_images.py] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Exec[get_swarm_client] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Exec[get_swarm_client] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/.ccache] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/.ccache] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_images] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_images] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/default/jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/default/jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/init.d/jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/init.d/jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_containers] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_containers] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 Puppet (info): Computing checksum on file /etc/dbus-1/system.d/Upstart.conf
2015-09-02 11:30:38 +0000 /Stage[main]/Upstart::Config/File[/etc/dbus-1/system.d/Upstart.conf] (info): Filebucketed /etc/dbus-1/system.d/Upstart.conf to puppet with sum 64be74cddb0c74b7d98202b40389784c
2015-09-02 11:30:38 +0000 /Stage[main]/Upstart::Config/File[/etc/dbus-1/system.d/Upstart.conf]/content (notice): content changed '{md5}64be74cddb0c74b7d98202b40389784c' to '{md5}0e7eadb0a62687e1ebb1b35021ca97cf'
2015-09-02 11:30:38 +0000 /Package[daemon] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Package[daemon] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Service[jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Service[jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 Puppet (info): Creating state file /var/lib/puppet/state/state.yaml
2015-09-02 11:30:38 +0000 Puppet (notice): Finished catalog run in 103.56 seconds

This error ( /Stage[main]/Docker::Service/Servicedocker: Could not evaluate: undefined method `[]' for nil:NilClass) appears in all three puppet.log-files, but master and repo are starting on fig up.
I'm running the example config for deployment testing.

I'm not sure if I'm searching on the right place and how to handle this problem.

update puppet forge API

warning observed: Replacing Puppet Forge API URL to use v3 https://forgeapi.puppetlabs.com. You should update your Puppetfile

deploy jenkins admin user via jenkins puppet api

Instead of using the xml templats.

Under the hood this uses the jenkins cli tools which will do the password hashing etc and allow us to more easily change the password instead of needing to manually compute or harvest the hash.

Wrong puppet-concat version

When running ./reconfigure.bash (https://github.com/ros-infrastructure/buildfarm_deployment_config/blob/master/reconfigure.bash) repo the following error occurs:

Error executing puppet module install. Check that this command succeeds:

puppet module install --version 2.0.1 --target-dir /root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --module_repository https://forgeapi.puppetlabs.com --modulepath /root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --module_working_dir
/root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --ignore-dependencies puppetlabs-concat

    Error:

    Error: Could not install 'puppetlabs-concat' (v2.0.1)

      No releases matching '2.0.1' are available from https://forgeapi.puppetlabs.com

Current fix:
Freeze puppetlabs-concat to version 1.2.3 in Puppetfile (https://github.com/ipa-mdl/buildfarm_deployment/blob/master/master/Puppetfile)

make repo, master and slave non colliding

A common use case will be a very small system which one machine is likely enough. We should update the configs to support everything running on the same machine. DIND is not reliable enough to be used for more than basic testing.

This will require making sure all the puppet config elements don't collide. And that all the configuration elements also don't collide.

This will require refactoring a lot of the things into proper puppet classes so they can share resources and not collide on definitions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.