ros-infrastructure / buildfarm_deployment Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Currently when running ./reconfigure.bash machine
this script just exits silently even though puppet error-red. To verify that puppet installed correctly one needs to examine "/var/log/puppet.log" currently.
The proposal would be to propagate puppet error messages to the wrapper bash script or to clearly signal an error and point the user to examine the "/var/log/puppet.log" file.
If rtyler/jenkins 1.6.1 is pulled in via librarian-puppet, the master deployment fails with the following error in puppet.log:
2015-10-15 15:29:23 -0400 Puppet (err): Duplicate declaration: User[jenkins] is already declared; cannot redeclare at /root/buildfarm_deployment/master/manifests/site.pp:425 on node my-master
2015-10-15 15:29:23 -0400 Puppet (err): Duplicate declaration: User[jenkins] is already declared; cannot redeclare at /root/buildfarm_deployment/master/manifests/site.pp:425 on node my-master
There is no issue with rtyler/jenkins 1.5.0. I worked around the issue with the following change to master/Puppetfile
, but did not obtain a root cause:
-mod 'rtyler/jenkins'
+mod 'rtyler/jenkins', '1.5.0'
We need to find a way to clean up the aufs databases or else up the recommended space reqiured.
From a machine which was originally running devicemapper then switched to aufs.
182G aufs
21M containers
838M devicemapper
1.5M execdriver
90M graph
14M init
148K linkgraph.db
28K repositories-aufs
4.0K repositories-devicemapper
4.0K tmp
8.0K trust
60M vfs
7.7M volumes
183G total
Having all data stored under /var/repos
might not be a good approach in order to host it on different subdomains. Also the naming could be changed in that process.
Based on the recent question on the mailing list: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/ros-sig-buildfarm/P32OdoGFYkg/awEAi8Bx4jgJ
The description of the ssh_keys
for all three machine types is pretty confusing.
I would recommend something like this:
* Configure as many public ssh keys as you want for administrators to log in.
* On the repo machine make sure there is public ssh key which matches the private ssh key `jenkins::private_ssh_key` provisioned on the master.
The key jenkins-slave::authorized_keys
described for the repo machine seems to not exist in the config repository.
Is this a private ssh key? If so the documentation should state that explicitly.
Also the indentation of the configuration option for the repo machine seem to be wrong.
Follow up coming out of #41
It's currently unset. Needed so that #34 will not go into spam.
As a follow up on #39 a certain set of new images (e.g. not older than 12 hours) should never be deleted - even if desired free disc space constraints are not fulfilled.
This prevents images from being deleted while they are still in use.
More info:
docker/docker-py#525
https://stackoverflow.com/questions/27341064/how-do-i-fix-importerror-cannot-import-name-incompleteread
https://bugs.launchpad.net/ubuntu/+source/python-pip/+bug/1306991
workaround is to install pip from pypi instead of ubuntu
drwxr-xr-x 4 root root 4096 Feb 27 19:23 jobs
It looks to be coming from these two files being generated:
root@master:~/buildfarm_deployment_config# ls -l /var/lib/jenkins/jobs/
total 8
drwxr-xr-x 3 root root 4096 Feb 27 19:23 indigo_rosdistro-cache
drwxr-xr-x 3 root root 4096 Feb 27 19:23 jade_rosdistro-cache
Reconfiguring will fail until jenkins has permissions on the jobs directory.
Also keep the logs from multiple invocation.
If necessary compress them to avoid to much disc usage.
The workaround I brought in does not seem to be working correctly.
I manually added jenkins to docker to get it to run. 6bb2646#diff-19552b043be83d06edbf5f4122b95bf3R427
The connection was dropped at the very end and caused a failure?
http://build.ros.org:8080/job/Isrc_uS__libntcan__ubuntu_saucy__source/4/console
# BEGIN SECTION: Clean up to save disk space on slaves
06:36:46 + rm -fr sourcedeb/source
06:36:46 + echo # END SECTION
06:36:46 # END SECTION
06:36:46 SSH: Connecting from host [ip-172-31-6-103]
06:36:46 SSH: Connecting with configuration [repo] ...
06:36:46 SSH: Disconnecting configuration [repo] ...
06:36:46 SSH: Transferred 4 file(s)
06:36:46 Build step 'Send files or execute commands over SSH' changed build result to SUCCESS
06:36:51 Waiting for the completion of Irel_import-package
06:44:24 Irel_import-package #2844 completed. Result was SUCCESS
06:44:24 FATAL: java.io.IOException: Backing channel is disconnected.
06:44:24 hudson.remoting.RemotingSystemException: java.io.IOException: Backing channel is disconnected.
06:44:24 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:266)
06:44:24 at com.sun.proxy.$Proxy51.stop(Unknown Source)
06:44:24 at com.cloudbees.jenkins.plugins.sshagent.SSHAgentBuildWrapper$SSHAgentEnvironment.tearDown(SSHAgentBuildWrapper.java:407)
06:44:24 at hudson.model.Build$BuildExecution.doRun(Build.java:173)
06:44:24 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
06:44:24 at hudson.model.Run.execute(Run.java:1741)
06:44:24 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
06:44:24 at hudson.model.ResourceController.execute(ResourceController.java:98)
06:44:24 at hudson.model.Executor.run(Executor.java:408)
06:44:24 Caused by: java.io.IOException: Backing channel is disconnected.
06:44:24 at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:185)
06:44:24 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:250)
06:44:24 ... 8 more
06:44:24 [description-setter] Could not determine description.
06:44:24 ERROR: Step ‘E-mail Notification’ failed: no workspace for Isrc_uS__libntcan__ubuntu_saucy__source #4
06:44:24 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
06:44:27 Finished: FAILURE
Watching the web UI, I've also seen executors briefly reporting "Dead" in red which might be releated.
We need to support changing the refresh_pattern
From our old config we used:
# Always check for stale downloads (allow 1 minute)
refresh_pattern . 0 0 1
See changelog upgrading from 2.x: https://wiki.jenkins-ci.org/display/JENKINS/Priority+Sorter+Plugin
jenkins@ip-172-31-5-54:~$ cat jenkins.advancedqueue.PriorityConfiguration.xml
<?xml version='1.0' encoding='UTF-8'?>
<jenkins.advancedqueue.PriorityConfiguration plugin="[email protected]">
<jobGroups class="linked-list">
<jenkins.advancedqueue.JobGroup>
<id>0</id>
<priority>-1</priority>
<jobGroupStrategy class="jenkins.advancedqueue.jobinclusion.strategy.AllJobsJobInclusionStrategy"/>
<description>All</description>
<runExclusive>false</runExclusive>
<useJobFilter>false</useJobFilter>
<jobPattern>.*</jobPattern>
<usePriorityStrategies>true</usePriorityStrategies>
<priorityStrategies>
<jenkins.advancedqueue.JobGroup_-PriorityStrategyHolder>
<id>0</id>
<priorityStrategy class="jenkins.advancedqueue.priority.strategy.JobPropertyStrategy"/>
</jenkins.advancedqueue.JobGroup_-PriorityStrategyHolder>
</priorityStrategies>
</jenkins.advancedqueue.JobGroup>
</jobGroups>
</jenkins.advancedqueue.PriorityConfiguration>
we need python3 jenkinsapi from pip
Multiply tagged images will not delete by ID. All the tags need to be cleaned first.
foote/squid-in-a-can_pr_11 latest 27bb194efb3c 13 days ago 231.9 MB
osrf/ubuntu_armhf trusty fd2852b4b035 2 weeks ago 237.6 MB
osrf/ubuntu_32bit saucy d6a6e4bd19d5 2 weeks ago 151.7 MB
osrf/ubuntu_32bit trusty 0978a1b4c8d1 2 weeks ago 181.6 MB
ubuntu utopic cfaba6b5fefe 3 weeks ago 194.4 MB
ubuntu trusty 5ba9dab47459 3 weeks ago 188.3 MB
devel_build_and_test__indigo_rail_maps latest 07e2b0812a8a 5 weeks ago 648 MB
devel_build_and_test__indigo_cob_substitute latest 07e2b0812a8a 5 weeks ago 648 MB
devel_build_and_install__indigo_world_canvas latest b39ca217cff6 5 weeks ago 648 MB
devel_build_and_install__indigo_rail_maps latest b39ca217cff6 5 weeks ago 648 MB
devel_build_and_install__indigo_robot_upstart latest b39ca217cff6 5 weeks ago 648 MB
devel_build_and_install__indigo_xdot latest b39ca217cff6 5 weeks ago 648 MB
devel_build_and_install__indigo_cob_substitute latest b39ca217cff6 5 weeks ago 648 MB
devel_build_and_install__indigo_cmake_modules latest fc7af26294ac 5 weeks ago 443.6 MB
devel_build_and_install__indigo_rosprofiler latest fc7af26294ac 5 weeks ago 443.6 MB
devel_build_and_install__indigo_rosh_desktop_plugins latest fc7af26294ac 5 weeks ago 443.6 MB
root@ip-172-31-10-242:~# docker rmi fc7af26294ac
Error response from daemon: Conflict, cannot delete image fc7af26294ac because it is tagged in multiple repositories, use -f to force
FATA[0000] Error: failed to remove one or more images
root@ip-172-31-10-242:~# docker rmi devel_build_and_install__indigo_cmake_modules
Untagged: devel_build_and_install__indigo_cmake_modules:latest
root@ip-172-31-10-242:~# docker rmi fc7af26294ac
Error response from daemon: Conflict, cannot delete image fc7af26294ac because it is tagged in multiple repositories, use -f to force
FATA[0000] Error: failed to remove one or more images
root@ip-172-31-10-242:~# docker rmi devel_build_and_install__indigo_rosprofiler devel_build_and_install__indigo_rosh_desktop_plugins
Untagged: devel_build_and_install__indigo_rosprofiler:latest
Untagged: devel_build_and_install__indigo_rosh_desktop_plugins:latest
Deleted: fc7af26294ac0fe922b7b63916a6d652b634ca04a471232eca1f34600913e296
Deleted: d9f50372f7fa6da91826d53419c7acfa5d83f4bb98548008735e61557b0b95bc
Deleted: 0bc6fb6b449bebbb2cefd80024b810132a157a8a5eb04237996a04a0ac612392
Deleted: 60742dae31c766d429f2082638d27a4f704b03b0e88c1ea54fdcbaa7e1f662be
Deleted: b11e744f87e2c68170bd53ce9d45c747a3ffe449d8b6e724b912d346a2f42776
Deleted: ae2709197dce8bc483a83883c3d64dfdb8f4eceeaff6476e5a754637e881801a
Deleted: ba555191ac6af3bcc2942c52740a497a402f89da722ec8edda94fc986f85e82c
Deleted: 94c90dee9322c07d302a08bc87b8ff21d5098027e46499fa5dc51062a0a88011
Deleted: 0b0f559469d6ae80c283e0023591e70fab5b988b622f35344d52ae7fd2d33ad6
Deleted: d9bd6d9f58d3e478fba08996372e07c7ba4f73f29f86a02bcd169aeb14505ea1
Deleted: 354569f142469561c417e1b0ceeb133a49f1c5aa729ca64b4f4b8e67d3e51883
Deleted: af43ff1f17e552f549e7caca31e598433e0504037ea09929d13ff6fb2e51f998
Deleted: eb967f4f0f9d1e22d4ab1df50fed8e46cdd3736fd1c3f252295beb537c5627ff
Deleted: 1781096ed21c936a987a13180a5c159b2251c8c706164d5e7996e8aaec787393
Deleted: 57300e65ca04e20af0752a8a7650bca134fef234b21951636bfbf9b3e0ff87c6
Deleted: f5dcca35b84c77d8ca0c3c7bdf0915930bee88fb30168ef043b58160be0081b2
Deleted: d25efa59157f57e7547d1169a7a92d4d7f3e4816832e059313fd12e20a767832
Deleted: 079d69bb15f729b75f336799ccf18af199491a6353c5be3cdd2fa320a2b273bb
Deleted: b5eb9f7fad4543bc8d34b6a8ea309633836e307cee8bce546f0f1e90c668ab46
Deleted: 83fa0bdfb111c233dc08601f924383e448e3dee175598f464397b2cdd4f3a325
Deleted: 01811a121776367df9d6c2853f5b5ac485133ea190c59250ab1cae3bdaaee8bf
Deleted: c52a6079eb4aad0a1bad431a8fb6992f445337f6d407defee03dfb78bc94a583
Deleted: 8eaa4ff06b53ff7730c4d7a7e21b4426a4b46dee064ca2d5d90d757dc7ea040a
Deleted: f62feddc05dc67da9b725361f97d7ae72a32e355ce1585f9a60d090289120f73
Deleted: 607c5d1cca71dd3b6c04327c3903363079b72ab3e5e4289d74fb00a9ac7ec2aa
Deleted: 3b363fd9d7dab4db9591058a3f43e806f6fa6f7e2744b63b2df4b84eadb0685a
We deployed the new config(ros-infrastructure/buildfarm_deployment_config#4), but the old configs remained. So it triggered the union of all the configs, new and old.
To enable remote login of adminstrators to debug or administer the machines.
The following build failed: http://54.183.26.131:8080/job/Ibin_uT32__rtmbuild__ubuntu_trusty_i386__binary/3/console
The excerpt from the build console:
14:14:47 Step 23 : RUN echo "ros-indigo-message-generation: 0.2.10-0trusty" && python3 -u /tmp/wrapper_scripts/apt-get.py update-and-install -q -y ros-indigo-message-generation
14:14:47 ---> Running in 6391121c452d
14:14:48 ros-indigo-message-generation: 0.2.10-0trusty
...
14:15:01 ---> fe19c151741d
14:15:01 Removing intermediate container 6391121c452d
14:15:01 Step 24 : RUN echo "ros-indigo-openrtm-aist: 1.1.0-25trusty" && python3 -u /tmp/wrapper_scripts/apt-get.py update-and-install -q -y ros-indigo-openrtm-aist
14:15:02 ---> Running in 91aefaee0645
14:15:02 time="2015-02-07T22:15:02Z" level="info" msg="open /var/lib/docker/containers/91aefaee06458d1ffb797b37825832564e3bdc0902197254f94ef5190fa5dbba/resolv.conf: no such file or directory"
14:15:02 Build step 'Execute shell' marked build as failure
The corresponding output from the cleanup log file:
2015-02-07 22:15:02,082 removing container 91aefaee0645
2015-02-07 22:15:02,226 successfully removed container: 91aefaee0645
We want this snippet in the config.xml
It appears if you run the cli command for full_control, but does not
<useSecurity>true</useSecurity>
<authorizationStrategy class="hudson.security.FullControlOnceLoggedInAuthorizationStrategy"/>
<securityRealm class="hudson.security.HudsonPrivateSecurityRealm">
<disableSignup>true</disableSignup>
<enableCaptcha>false</enableCaptcha>
</securityRealm>
For some reason the config.xml is not being generated until you manually visit the configuration page. And without the config.xml the cli settings of this value does not persist.
The list of currently known issues is:
This ticket should stay open until any remaining issues related to the docker cleanup script have been resolved. One use case which should work before closing this ticket is to perform a series of full rebuilds on the farm:
Example failed jobs:
http://54.183.26.131:8080/job/Jrel_release-status-page/1685/console
http://54.183.26.131:8080/job/Jrel_arm_release-status-page/1685/console
http://54.183.26.131:8080/job/Irel_release-status-page/1685/console
http://54.183.26.131:8080/job/Irel_arm_release-status-page/1688/console
6. Run Dockerfile - status page
Hide Details
# BEGIN SECTION: Run Dockerfile - status page 19:17:47 + rm -fr /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache 19:17:47 + rm -fr /home/jenkins-slave/workspace/Jrel_release-status-page/status_page 19:17:47 + mkdir -p /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache 19:17:47 + mkdir -p /home/jenkins-slave/workspace/Jrel_release-status-page/status_page 19:17:47 + docker run --cidfile=/home/jenkins-slave/workspace/Jrel_release-status-page/docker_generate_status_page/docker.cid --net=host -v /home/jenkins-slave/workspace/Jrel_release-status-page/ros_buildfarm:/tmp/ros_buildfarm:ro -v /home/jenkins-slave/workspace/Jrel_release-status-page/debian_repo_cache:/tmp/debian_repo_cache -v /home/jenkins-slave/workspace/Jrel_release-status-page/status_page:/tmp/status_page status_page_generation 19:17:48 The build file contains the following targets: 19:17:48 - trusty source 19:17:48 - trusty amd64 19:17:48 - trusty i386 19:17:48 - utopic source 19:17:48 - utopic amd64 19:17:48 - utopic i386 19:17:48 - vivid source 19:17:48 - vivid amd64 19:17:48 - vivid i386 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:18:20 h.request(req.get_method(), req.selector, req.data, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:18:20 self._send_request(method, url, body, headers) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:18:20 self.endheaders(body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:18:20 self._send_output(message_body) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:18:20 self.send(msg) 19:18:20 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:18:20 self.connect() 19:18:20 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:18:20 self.timeout, self.source_address) 19:18:20 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:18:20 raise err 19:18:20 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:18:20 sock.connect(sa) 19:18:20 socket.timeout: timed out 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:18:20 fh = urlopen(url, timeout=timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:18:20 return opener.open(url, data, timeout) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:18:20 response = self._open(req, data) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:18:20 '_open', req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:18:20 result = func(*args) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:18:20 return self.do_open(http.client.HTTPConnection, req) 19:18:20 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:18:20 raise URLError(err) 19:18:20 urllib.error.URLError: <urlopen error timed out> 19:18:20 19:18:20 During handling of the above exception, another exception occurred: 19:18:20 19:18:20 Traceback (most recent call last): 19:18:20 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 34, in <module> 19:18:20 main() 19:18:20 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 30, in main 19:18:20 args.cache_dir, args.output_dir, copy_resources=args.copy_resources) 19:18:20 File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 50, in build_release_status_page 19:18:20 dist = get_cached_distribution(index, rosdistro_name) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 146, in get_cached_distribution 19:18:20 cache = get_distribution_cache(index, dist_name) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 172, in get_distribution_cache 19:18:20 yaml_gz_str = load_url(url, skip_decode=True) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:18:20 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:18:20 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:18:20 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 59, in load_url 19:18:20 raise URLError(str(e) + ' (%s)' % url) 19:18:20 urllib.error.URLError: <urlopen error <urlopen error timed out> (http://ros.org/rosdistro/jade-cache.yaml.gz)> 19:18:20 Build step 'Execute shell' marked build as failure
A slightly different error:
6. Run Dockerfile - status page
Hide Details
# BEGIN SECTION: Run Dockerfile - status page 19:15:12 + rm -fr /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache 19:15:12 + rm -fr /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page 19:15:12 + mkdir -p /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache 19:15:12 + mkdir -p /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page 19:15:12 + docker run --cidfile=/home/jenkins-slave/workspace/Irel_arm_release-status-page/docker_generate_status_page/docker.cid --net=host -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/ros_buildfarm:/tmp/ros_buildfarm:ro -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/debian_repo_cache:/tmp/debian_repo_cache -v /home/jenkins-slave/workspace/Irel_arm_release-status-page/status_page:/tmp/status_page status_page_generation 19:15:14 The build file contains the following targets: 19:15:14 - trusty source 19:15:14 - trusty armhf 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:15:42 h.request(req.get_method(), req.selector, req.data, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:15:42 self._send_request(method, url, body, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:15:42 self.endheaders(body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:15:42 self._send_output(message_body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:15:42 self.send(msg) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:15:42 self.connect() 19:15:42 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:15:42 self.timeout, self.source_address) 19:15:42 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:15:42 raise err 19:15:42 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:15:42 sock.connect(sa) 19:15:42 socket.timeout: timed out 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:15:42 fh = urlopen(url, timeout=timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:15:42 return opener.open(url, data, timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 461, in open 19:15:42 response = meth(req, response) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response 19:15:42 'http', request, response, code, msg, hdrs) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 493, in error 19:15:42 result = self._call_chain(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302 19:15:42 return self.parent.open(new, timeout=req.timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:15:42 response = self._open(req, data) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:15:42 '_open', req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:15:42 return self.do_open(http.client.HTTPConnection, req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:15:42 raise URLError(err) 19:15:42 urllib.error.URLError: <urlopen error timed out> 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open 19:15:42 h.request(req.get_method(), req.selector, req.data, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1065, in request 19:15:42 self._send_request(method, url, body, headers) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1103, in _send_request 19:15:42 self.endheaders(body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 1061, in endheaders 19:15:42 self._send_output(message_body) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 906, in _send_output 19:15:42 self.send(msg) 19:15:42 File "/usr/lib/python3.4/http/client.py", line 841, in send 19:15:42 self.connect() 19:15:42 File "/usr/lib/python3.4/http/client.py", line 819, in connect 19:15:42 self.timeout, self.source_address) 19:15:42 File "/usr/lib/python3.4/socket.py", line 509, in create_connection 19:15:42 raise err 19:15:42 File "/usr/lib/python3.4/socket.py", line 500, in create_connection 19:15:42 sock.connect(sa) 19:15:42 socket.timeout: timed out 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 48, in load_url 19:15:42 fh = urlopen(url, timeout=timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen 19:15:42 return opener.open(url, data, timeout) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 455, in open 19:15:42 response = self._open(req, data) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open 19:15:42 '_open', req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain 19:15:42 result = func(*args) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open 19:15:42 return self.do_open(http.client.HTTPConnection, req) 19:15:42 File "/usr/lib/python3.4/urllib/request.py", line 1235, in do_open 19:15:42 raise URLError(err) 19:15:42 urllib.error.URLError: <urlopen error timed out> 19:15:42 19:15:42 During handling of the above exception, another exception occurred: 19:15:42 19:15:42 Traceback (most recent call last): 19:15:42 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 34, in <module> 19:15:42 main() 19:15:42 File "/tmp/ros_buildfarm/scripts/status/build_release_status_page.py", line 30, in main 19:15:42 args.cache_dir, args.output_dir, copy_resources=args.copy_resources) 19:15:42 File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 50, in build_release_status_page 19:15:42 dist = get_cached_distribution(index, rosdistro_name) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 146, in get_cached_distribution 19:15:42 cache = get_distribution_cache(index, dist_name) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 172, in get_distribution_cache 19:15:42 yaml_gz_str = load_url(url, skip_decode=True) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:15:42 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 58, in load_url 19:15:42 return load_url(url, retry=retry - 1, retry_period=retry_period, timeout=timeout) 19:15:42 File "/usr/lib/python3/dist-packages/rosdistro/loader.py", line 65, in load_url 19:15:42 return contents.decode('utf-8') 19:15:42 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
this should be possible with an exec unless
Will remove the need for 3aa0a3d
and https://github.com/ros-infrastructure/buildfarm_deployment/blob/master/repo/Dockerfile#L29-L35
Instead of #20.
It only starts after the first reconfigure on the */15
It should start at startup, or the iptables rules should wait until squid is verified to be working.
In the main config I had to enable 99 levels and set the default to 99.
(It was 5 and 3 and jobs were not being prioritized.)
And the http://54.183.26.131:8080/advanced-build-queue/ settings need to be updated too.
There needs to be a default group giving default priority.
And add a policy to take the priority from the job.
I've fixed it on the farm, but the tempaltes need to be updated with these changes.
Please also deploy and test this on the current test farm.
Running reconfgure.bash master | slave | repo results in (from /var/log/puppet.log):
Could not find dependency Exec[install-pip3] for Exec[install-docker-py]
at /root/buildfarm_deployment/slave/modules/pip/manifests/install.pp:33
I tracked this error back to TracyWebTech/puppet-pip#5
It works if I freeze tracywebtech-pip to version 1.3.2 in Puppetfile:
mod 'tracywebtech-pip', '1.3.2'
Another solution would be to provide Exec[install-pip3] with checks for pip3, perhaps using pip::installation.
#5 was an exploration in that direction.
It's nobody@nowhere by default.
The ROS Build Farm [email protected]
This should be a parameter in the templates and exposed in the config.
Or at least it should be documented to be updated by the user.
It builds up slower, but still needs to be cleaned up.
It also needs docker-py and python3-dateutils to support that.
It will make our lives simpler to reproduce and test.
Related blog post: http://blog.trifork.com/2014/03/11/using-supervisor-with-docker-to-manage-processes-supporting-image-inheritance/
The package appeared to be installed on the filesystem, but jenkins listed it as "available" and many jobs were crashing because it was not installed.
The work around is to install it via the gui.
we should expose jenkins on port 80.
It's recommended to use a reverse proxy to expose it there so that it can be integrated with other web services on the same machine. https://wiki.jenkins-ci.org/display/JENKINS/Running+Jenkins+behind+Apache
Note we could change jenkins to use port 80 directly but then it would not support potentially other hosting via apache. ala #42
I just wanted to try the buildfarm and run the local deployment test.
The master isn't starting, i looked in the puppet log file and noticed, I'm missing the jenkins installation candidate:
2015-08-31 10:06:50 +0000 Puppet (err): Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install jenkins' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Package jenkins is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'jenkins' has no installation candidate
2015-08-31 10:06:50 +0000 /Stage[main]/Jenkins::Package/Package[jenkins]/ensure (err): change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install jenkins' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Package jenkins is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'jenkins' has no installation candidate
2015-08-31 10:06:50 +0000 /Stage[main]/Main/File/etc/default/jenkins: Dependency Package[jenkins] has failures: true
I don't know what to do.
Please update the master configuration as well as the master on the test farm to run a Jenkins slave with the label "slave-on-master".
Folluw up of #43 and #40. It looks like that the latest builds are still failing: http://54.183.26.131:8080/job/Jbin_uT32__tf2_ros__ubuntu_trusty_i386__binary/3/consoleFull
I logged into the slave but couldn't find any hashes in the log file matching the errors in the console output, e.g.:
Error removing intermediate container bed26c5d4211: The given container is <nil>
But it looks like there are still multiple cron jobs deleting docker stuff:
# HEADER: This file was autogenerated at 2015-02-03 00:30:17 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: docker_cleanup_images
15 */2 * * * bash -c "python3 -u /home/jenkins-slave/cleanup_docker_images.py"
# Puppet Name: docker_cleanup_containers
5 */2 * * * bash -c "docker ps -aq | xargs -L1 docker rm "
It's supposed to fix disk space leaking.
It's supposed to fix hang on docker pull.
If it's not working roll back to 1.7.1
it has been removed from puppet forge and it's repo is gone
buildfarmdeploymentconfig_slave_1 log file:
bash: /etc/init.d/jenkins-slave: No such file or directory
puppet.log:
2015-09-02 11:30:38 +0000 Puppet (info): Computing checksum on file /etc/init.d/docker
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker] (info): Filebucketed /etc/init.d/docker to puppet with sum 870b6e52007eca22390f36e3b2547954
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker]/ensure (notice): ensure changed 'file' to 'link'
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/File[/etc/init.d/docker] (info): Scheduling refresh of Service[docker]
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/Service[docker] (err): Could not evaluate: undefined method `[]' for nil:NilClass
2015-09-02 11:30:38 +0000 /Stage[main]/Docker::Service/Service[docker] (notice): Triggered 'refresh' from 2 events
2015-09-02 11:30:38 +0000 /User[jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /User[jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/cleanup_docker_images.py] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/cleanup_docker_images.py] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Exec[get_swarm_client] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Exec[get_swarm_client] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/.ccache] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/File[/home/jenkins-slave/.ccache] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_images] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_images] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/default/jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/default/jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/init.d/jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/File[/etc/init.d/jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_containers] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Main/Cron[docker_cleanup_containers] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 Puppet (info): Computing checksum on file /etc/dbus-1/system.d/Upstart.conf
2015-09-02 11:30:38 +0000 /Stage[main]/Upstart::Config/File[/etc/dbus-1/system.d/Upstart.conf] (info): Filebucketed /etc/dbus-1/system.d/Upstart.conf to puppet with sum 64be74cddb0c74b7d98202b40389784c
2015-09-02 11:30:38 +0000 /Stage[main]/Upstart::Config/File[/etc/dbus-1/system.d/Upstart.conf]/content (notice): content changed '{md5}64be74cddb0c74b7d98202b40389784c' to '{md5}0e7eadb0a62687e1ebb1b35021ca97cf'
2015-09-02 11:30:38 +0000 /Package[daemon] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Package[daemon] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Service[jenkins-slave] (notice): Dependency Service[docker] has failures: true
2015-09-02 11:30:38 +0000 /Stage[main]/Jenkins::Slave/Service[jenkins-slave] (warning): Skipping because of failed dependencies
2015-09-02 11:30:38 +0000 Puppet (info): Creating state file /var/lib/puppet/state/state.yaml
2015-09-02 11:30:38 +0000 Puppet (notice): Finished catalog run in 103.56 seconds
This error ( /Stage[main]/Docker::Service/Servicedocker: Could not evaluate: undefined method `[]' for nil:NilClass) appears in all three puppet.log-files, but master and repo are starting on fig up.
I'm running the example config for deployment testing.
I'm not sure if I'm searching on the right place and how to handle this problem.
It will need some global config for setting the section delimiters.
string name: collapsing-console-sections
home page: https://wiki.jenkins-ci.org/display/JENKINS/Collapsing+Console+Sections+Plugin
warning observed: Replacing Puppet Forge API URL to use v3 https://forgeapi.puppetlabs.com. You should update your Puppetfile
Instead of using the xml templats.
Under the hood this uses the jenkins cli tools which will do the password hashing etc and allow us to more easily change the password instead of needing to manually compute or harvest the hash.
Alert: https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-11-11
The primary one I've detected is Push over SSH needs the jenkins ssh key added with read access: https://wiki.jenkins-ci.org/display/JENKINS/Slave+To+Master+Access+Control
I edited: http://54.183.26.131:8080/administrativeMonitor/slaveToMasterAccessControl/rule/ to add allow read /var/lib/jenkins/.ssh/id_rsa
this will allow us to update running systems with updates like this easily: #10
When running ./reconfigure.bash (https://github.com/ros-infrastructure/buildfarm_deployment_config/blob/master/reconfigure.bash) repo the following error occurs:
Error executing puppet module install. Check that this command succeeds:
puppet module install --version 2.0.1 --target-dir /root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --module_repository https://forgeapi.puppetlabs.com --modulepath /root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --module_working_dir
/root/buildfarm_deployment/master/.tmp/librarian/cache/source/puppet/forge/forgeapi_puppetlabs_com/puppetlabs-concat/2.0.1 --ignore-dependencies puppetlabs-concat
Error:
Error: Could not install 'puppetlabs-concat' (v2.0.1)
No releases matching '2.0.1' are available from https://forgeapi.puppetlabs.com
Current fix:
Freeze puppetlabs-concat to version 1.2.3 in Puppetfile (https://github.com/ipa-mdl/buildfarm_deployment/blob/master/master/Puppetfile)
A common use case will be a very small system which one machine is likely enough. We should update the configs to support everything running on the same machine. DIND is not reliable enough to be used for more than basic testing.
This will require making sure all the puppet config elements don't collide. And that all the configuration elements also don't collide.
This will require refactoring a lot of the things into proper puppet classes so they can share resources and not collide on definitions.
On larger systems you need to set the java vm memory limit higher to keep the system running
This will save time and bandwidth.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.