bloomberg / chef-bcpc Goto Github PK

Bloomberg Clustered Private Cloud distribution

License: Apache License 2.0

Shell 2.28% Ruby 16.24% HTML 5.95% Python 73.60% Makefile 0.44% Jinja 1.49% Vim Script 0.01%

chef openstack vagrant ceph ansible

chef-bcpc's Issues

Upgrade-able chef deployments

The focus of Chef is on writing idempotent scripts which can be re-run with no effect. This is helpful unless you have a cluster of services upon which you and the open source community are iterating like madmen.

We are faced with the need to upgrade a working, production cluster in-place, and I think that this need will become even more common in the future as we provide bcpc with production SLAs in more places.

My recommendation in 2-fold :
I
Our recipes should completely build/rebuild configs, so we don't have to sprinkle not-ifs and only-ifs around the recipes. And we break up configurations which manage multiple services (e.g. HAP and DNS) into a master template and a partial template. Following DRY principles, we would take haproxy.conf.erb and break it up into a master template which then iterates through attributes that look like this:

default[:bcpc][:hap_services] = [ 
   { name => "ldap-389ds", 
     src_ip => -> () { node[:bcpc][:management][:vip] }, 
     src_port => 389,
     dst_ip => -> () {node[:bcpc][:management][:vip]},
     dst_port => 389,
     listen_options => ['timeout  client 1h',
                       'timeout  server 1h',
                       'mode tcp',
                       'balance leastconn',
                       'option  tcplog',
                       'option tcpka'],
      server_options => "check inter 1s rise 1 fall 1 observe layer4"
      servers => -> () { get_servers }
  } 
  ... 
]

it then iterates over the array and passes to the partial template that looks like this:

<%="listen #{name} #{src_ip}:#{src_port}"%>
  <% listen_options.each do |opt| -%>
     <%= opt %>
  <% end -> 
<% servers.each do |server| -%>
  <%= "server #{server.hostname} #{dst_ip}:#{dst_port} #{server_options}" %>
<% end -%>

This has three benefits, it is DRY, it is declarative, so we have better visibility about which ports belong to which service, and most importantly, we can inject new services either by adding to the main attributes file, or in an upgrade file to the attributes which pushes another service onto the array.

II
The second, and probably more contentious piece is that we need to move to a better git feature branch management model. We no longer have a single code base running in production, so we should treat our source management accordingly.
I heartily recommend the git-flow (http://nvie.com/posts/a-successful-git-branching-model/) workflow which will allow us to make branches for new, experimental features as well as make branches for installations into our production system. This way we could hot-patch a specific branch when we need to add new features, but we could also completely recreate a production environment if needed.

beaver service restart hangs on initialization

When enrolling a node via Chef, restarting the beaver service hung until I killed off the beaver processes on that node. Opening an issue to remember to investigate whether this is repeatable or not.

[2013-08-17T21:12:41-04:00] INFO: Processing service[beaver] action restart (bcpc::beaver line 80)
[2013-08-17T21:15:52-04:00] INFO: service[beaver] restarted

Accomodate VIP movement when restarting haproxy

I've seen an intermittent bug when standing up a new cluster of headnodes where the keystone service catalog entries (keystone recipe), the glance upload of the cirros image (glance recipe), or the nova secgroup setup (nova-setup recipe) sometimes do get re-run even though the entries are already there. I've only see it occur when running chef-client again on an existing headnode after after new headnodes are added (aka, when the chef-client run is just regenerating configs and restarting processes that need references to all existing headnodes, like haproxy and mysql).

I think this is due to the fact that when haproxy restarts, the VIP may move (if the keepalived healthcheck occurs in the window between stop and start of haproxy). Since this bug can cause the chef recipes to duplicate entries (which isn't harmful per se, just sloppy), we should probably give the cluster a small amount of time to settle the VIP before hammering away at the openstack APIs for setup (since it's likely that the guard commands like not_if statements are failing and then the subsequent commands under the guard succeed).

I'm still not 100% sure this is what's happening, but I have a patch (hack) that I can't repro this bug under. I'll commit it but keep this open in case that's not the culprit.

Upgrade hypervisors to trusty (14.04)

We should think about when we want to upgrade from precise/12.04 to trusty/14.04.

One notable change that we know should be fixed in upstream packages captured in 14.04 is #54 where keepalived can drop the VIP under load. I expect there's more.

Thoughts?

Ruby locale issues w/ chef-client

I don't profess to be a ruby or chef expert (or novice, for that matter).

When sshing into the nodes to run chef-client (to test updated recipes), I hit the following error:

[2013-06-05T18:06:16-05:00] FATAL: ArgumentError: package[python-ujson_1.30-1_amd64.deb] (bcpc::beaver line 34) had an error: ArgumentError: invalid byte sequence in US-ASCII

I was able to fix it by explicitly setting the locale prior to running chef-client:

export LC_ALL=en_US.UTF-8

This isn't a very specific bug, but I figure I'd file something in case anyone else hits this problem or knows how I've misconfigured the system.

Install sshpass on bootstrap node by default

nodessh.sh won't work without sshpass installed.

vagrant@bcpc-bootstrap:~/chef-bcpc$ ./nodessh.sh Test-Laptop 10.0.100.11 -
Error: sshpass required for this tool. You should be able to 'sudo apt-get install sshpass' to get it

We should install it by default.

DHCP lease for fixed IPs too long

DNSMasq hands out fixed IPs with lease times of a week (nova.conf.erb: dhcp_lease_time=604800). For low-churn situations where the number of VMs doesn't approach the size of the DHCP pool, you wouldn't notice, but when machines turn over often, the DHCP pool runs out of unleased IPs. This causes a denial of service where no new VM in the same DHCP pool gets an IP at startup until leases start expiring. This is most obvious in the startup messages when CloudInit complains that the eth0 interface is not configured and times out waiting for it.

I propose setting the DHCP lease time to something less than an hour, or an hour at most. Every tenant gets their own DNSMASQ instance, and lease renewal should be relatively cheap.

New version of rabbit breaks 'guest' login

Running a fresh install of bcpc includes RabbitMQ 3.3. The platform installs fine, but doesn't work properly because processes can no longer connect as 'guest' to Rabbit. This appears to be intentional:

http://www.rabbitmq.com/blog/2014/04/02/breaking-things-with-rabbitmq-3-3/

The suggested workaround on that page (adding an empty loopback_users to re-enable guest) doesn't appear to work. After some poking about, I was able to get openstack back up by:

Modifying the rabbitmq recipe to put 'ostack' into the data bag instead of guest PRIOR to cheffing the head node.
creating an "ostack" user in rabbit and giving it '.' '.' '.*' permissions on '/'. It was not created automatically.
adding or updating the line rabbitmq_userid=guest to rabbitmq_userid=ostack in:

/etc/glance/glance-api.conf
/etc/cinder/cinder.conf
/etc/nova/nova.conf

I'm sure that's not exhaustive. I can see in the rabbit logs that some things still can't connect, and I'm not able to log into the rabbitmq management page, but this at least allows me to bring up a VM.

Issues with keepalived dropping VIP (running on VirtualBox).

I'm not sure what the right fix for this is, so I thought I would submit an issue and perhaps generate discussion.

I see occasional issues with keepalived dropping the VIP. It seems like it happens when the system is busy, particularly ceph. I am able to reproduce this easily by attempting an upload to glance:

    glance --insecure image-create --name=ubuntu-12.04 --is-public=True --container-format=bare --disk-format=raw --file ubuntu-12.04-server-cloudimg-amd64-disk1.raw

This will fail about half-way through with:

    Error communicating with http://10.0.100.5:9292 [Errno 32] Broken pipe

The root cause of that failure is that the IP was dropped. In fact, this was so consistent that I am only able to complete the upload by disabling keepalived, and add the IP myself :).

The laptop running the VMs is pretty beefy (16GB RAM, SSD, quad-core), but I'm assuming that all the ceph-osds in action is a bit too much, and get_monstatus starts taking longer than the timeout. I ran time get_monstatus continuously during a fresh image upload and saw wild variation (unedited):

...

real    0m1.076s
user    0m0.048s
sys 0m0.036s

real    0m4.813s
user    0m0.048s
sys 0m0.008s

real    0m10.869s
user    0m0.052s
sys 0m0.012s

real    0m0.922s
user    0m0.052s
sys 0m0.008s

I think it was at this point where the IP was dropped on this upload (makes sense!). This actually isn't inconsistent with what I've seen with ceph before, where querying the monitor can take a bit of time.

Note that I think this is highly unlikely to be an issue when deploying to physical machines.

Rethink ceph-fs usage due to cold-start issues

If you have a cluster with a single head node, when you restart that machine, the ceph-mds won't restart by default...and if you run:

# service ceph-mds start id=`hostname`

It will come up, but the mds gets stuck in "replay" state. This is likely due to a bug in ceph-mds not handling a cold-start scenario properly. =(

Furthermore, since /mnt is now in /etc/fstab as a ceph-fs mount, we also fail on bootup in a cold-start scenario as ceph-fuse hangs. Ubuntu is smart enough to detect it is hanging, but requires you to hit a button to proceed skipping the mount.

ubuntu@bcpc-vm1:~$ ceph -v
ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
ubuntu@bcpc-vm1:~$ ceph mds stat
e28: 1/1/1 up {0=bcpc-vm1=up:replay}

Given the mds hang, I can't enroll any new machines in my cluster. It is probably worth trying out the new cuttlefish release, but...well...

CC @pchandra @cbaenziger

Creating multiple head nodes fails with ceph-mon aborts

When creating multiple head nodes with Ceph Cuttlefish (confirmed to still exist with 0.61.7), ceph-mon on the additional headnodes will abort on startup.

On the new monitor node, do:

# ceph-mon --cluster=ceph --id=<hostname> --public_addr=<storage_ip> -f

Then once it gets quorum, ctrl-c it, then rerun chef-client.

There is an upstream issue filed with Ceph that should resolve the underlying issue. We will close this issue when we have confirmed that the upstream issue is resolved.

td-agent complains on restart

2014-05-06 17:03:12 -0400 [warn]: out_record_reformer: output_tag is deprecated. Use tag option instead.

Prob want to just change that.

DNS Tenancy to Domain conversion doesn't handle dots in tenancy names

If I have a tenancy like "New Site.Com", it is not properly handled in the mysql function. The subdomain winds up looking like "new-site.com.bcpc.whatever.com" which is wrong. It should look more like "new-site-com.bcpc.whatever.com" so it's not creating new subdomains willy nilly.

Ceph monitor logs fill up root volume quickly

I'm seeing the ceph monitor logs quickly spew out info every 100ms into the log file of the form:

2013-09-22 09:01:01.465453 7fcf55fb3700  1 mon.bcpc-vm1@0(leader).paxos(paxos active c 2260..2937) is_readable now=2013-09-22 09:01:01.465455 lease_expire=0.000000 has v0 lc 2937
2013-09-22 09:01:01.465490 7fcf55fb3700  1 mon.bcpc-vm1@0(leader).paxos(paxos active c 2260..2937) is_readable now=2013-09-22 09:01:01.465492 lease_expire=0.000000 has v0 lc 2937
2013-09-22 09:01:01.612839 7fcf557b2700  1 mon.bcpc-vm1@0(leader).paxos(paxos active c 2260..2937) is_readable now=2013-09-22 09:01:01.612840 lease_expire=0.000000 has v0 lc 2937
2013-09-22 09:01:02.795764 7fcf557b2700  1 mon.bcpc-vm1@0(leader).paxos(paxos active c 2260..2937) is_readable now=2013-09-22 09:01:02.795765 lease_expire=0.000000 has v0 lc 2937

At run-time, the following command reduces the paxos file logging:

$ ceph tell mon.* injectargs '--debug_paxos 0/5'

To make it permanent, the ceph.conf change would be:

[mon]
        debug paxos = 0/5

I'm not sure if we should file an upstream issue or just incorporate this into our scripts.

Netlink errors in keepalived

When bringing up a headnode, keepalived is happy (the VRRP_Script lines below) and then 2-3 minutes later, it logs some Netlink: filter function error messages. This happens reliably for me when I'm testing in VMs, so I think it's an issue. Googling around says that once you see them, you should restart keepalived to make sure it's still working properly. In testing, once I restart keepalived, those messages don't pop back up (I waited 30+ mins and didn't see anything).

Aug 18 11:41:02 bcpc-vm2 Keepalived_vrrp: VRRP_Script(chk_haproxy) succeeded
Aug 18 11:41:02 bcpc-vm2 Keepalived_vrrp: VRRP_Script(chk_ceph) succeeded
Aug 18 11:42:55 bcpc-vm2 Keepalived_vrrp: Netlink: filter function error
Aug 18 11:42:55 bcpc-vm2 Keepalived_healthcheckers: Netlink: filter function error
Aug 18 11:42:56 bcpc-vm2 Keepalived_healthcheckers: Netlink: filter function error
Aug 18 11:42:56 bcpc-vm2 Keepalived_vrrp: Netlink: filter function error
Aug 18 11:43:14 bcpc-vm2 Keepalived_vrrp: Netlink: filter function error
Aug 18 11:43:14 bcpc-vm2 Keepalived_healthcheckers: Netlink: filter function error
Aug 18 11:43:14 bcpc-vm2 Keepalived_healthcheckers: Netlink: filter function error
Aug 18 11:43:14 bcpc-vm2 Keepalived_vrrp: Netlink: filter function error

Google's DNS Server Seems To Creep Into Networks Table

On a new cluster, the nova database, table networks, is getting the column dns1 set to 8.8.4.4. Strangely, this isn't seen anywhere in our setup for the nova-networks.

For example, the environment file had no mention of this DNS server:
ubuntu@foohost:/chef-bcpc$ knife environment show foo_env | grep 8.8.4.4
ubuntu@foohost:/chef-bcpc$

And we don't pass the DNS server in the recipe:
bash-3.2$ grep -i network ./cookbooks/bcpc/recipes/nova-setup.rb
nova-manage network create --label fixed --fixed_range_v4=#{node[:bcpc][:fixed][:cidr]} --num_networks=#{node[:bcpc][:fixed][:num_networks]} --multi_host=T --network_size=#{node[:bcpc][:fixed][:network_size]} --vlan=#{node[:bcpc][:fixed][:vlan_start]}
only_if ". /root/adminrc; nova-manage network list | grep "No networks found""

Perhaps we need to use --dns1 and --dns2, as the default in nova/network/manager.py is 8.8.4.4:
cfg.StrOpt('flat_network_dns',
default='8.8.4.4',
help='Dns for simple network'),

Include ethtool in default install for troubleshooting

Ethtool is useful for troubleshooting network issues. We should include it by default.

DNS for clusters may provide troublesome

DNS currently has some questions with:
add VIP as preferred DNS server - 67d805f
And
create DNS entries for hypervisors and enable recursion - 987e878

The issues are added as code review comments to the commits. However, this is not an issue causing catastrophic issue at this time.

automated_install.sh script error on Mac

Running the automated_install.sh script generates error in "sed -i" commands executed in the script.

For e.g. sed -i 's/vb.gui = true/vb.gui = false/' Vagrantfile

This is specific to behior of sed on Mac and can be fixed by

sed -i.bu 's/vb.gui = true/vb.gui = false/' Vagrantfile (or)

sed -i '' 's/vb.gui = true/vb.gui = false/' Vagrantfile

Unify VM and physical machine Cobblering

Today, VMs are booted using enroll_cobbler.sh, however, bare-metal is booted using cluster-enroll-cobbler.sh. This presents a logical disconnect for new adopters of the project as they mature through the stack being both scripts do largely the same thing.

DNS SOA records do not have valid nameserver

Our DNS records are currently using localhost for the nameserver field. From RFC 1035 section 3.3.13. "SOA RDATA format", the MNAME field of the SOA record should be: "The of the name server that was the original or primary source of data for this zone."

Thanks go to Erdal Gerda for noticing this!

SMP performance with VirtualBox

FWIW, I've enabled SMP on my local VirtualBox (4.3.10) on Mac OS X 10.9.mumble - the performance/corruption issues that we saw a year ago don't seem to recur. So, I'd like to re-enable SMP in vbox_create.sh - just up each bcpc-vm to have 2 CPUs - specifically, set CLUSTER_VM_CPUs to 2. The responsiveness of the head nodes is significantly better with 2 VCPUs. If others could try it out as well, that'd be great. =)

DNS slows down to be unusable as MySQL load climbs

With our view for PowerDNS, it see that as MySQL load climbs the view can take unacceptably long to run. Further, if getting a lot of DNS lookups for addresses not in DNS, PowerDNS's caching does little to help.

Migrate all the scripts in the repo root to a subdir

Just opening a placeholder for anyone feeling motivated to move the automation scripts to a subdir, since it's getting crowded IMHO. It'll take a little bit of review and testing since some may need a little re-writing to accommodate relative dirs, other assumptions, etc.

Chef apt packages at apt.opscode.com incompatible with latest Ubuntu 12.04 LTS

If you try to install Chef using apt and pointing at an apt mirror, it fails due to incompatible dependencies. For example:

The following packages have unmet dependencies:
chef-server : Depends: chef-server-api (>= 10.18.2) but it is not going to be installed
Depends: chef-solr (>= 10.18.2) but it is not going to be installed
Depends: chef-expander (>= 10.18.2) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Using aptitude to try to get to the bottom of it, it appears the chef packages are incompatible with tzdata version 2013g-0ubuntu0.12.04 which was released on Oct 13th.

ceph-mon and ceph-mds are not starting on reboot

ceph-mon and ceph-mds are not starting on head node reboot.

Immediate remediation:

service ceph-mon start id=`hostname`
service ceph-mds start id=`hostname`

Investigating.

Need a script to create router instances for VirtualBox setup

Since vbox_create.sh just creates three basic VMs, we need to automate the creation of the utility router instance that provides DHCP and routing capabilities. It might actually be good to tie this into a scripted cobbler setup so that we can just PXE boot the VMs with the correct Ubuntu images via a preseed file. (Perhaps have yet-another-VM that does Cobbler? Or, get pfSense to do it? Or?)

I've got a start on a VirtualBox script to create a pfSense VM, but I need to confirm and automate this. Quick note is that the VirtualBox bridging doesn't work for me with pfsense-2.0.3 (kernel panic when FreeBSD 8.1 sees en1 bridged), so I need to upgrade to the latest pfsense 2.1 snapshot and enable VirtIO to bridge to en1. Random notes for those following along at home:

http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/amd64/pfSense_HEAD/nanobsd/
http://doc.pfsense.org/index.php/VirtIO_Driver_Support

$VBM modifyvm $vm --nic4 bridged --bridgeadapter4 "en1: Wi-Fi (AirPort)" --nictype4 virtio

VM's don't know their tenancy is part of their DNS name

After 7210d70, the VMs nolonger know their proper DNS name.

The metadata server returns the . via http://169.254.169.254/latest/meta-data/public-hostname. Also, stock Ubuntu VM image's end up setting the VM's FQDN to . (also missing the tenancy name in the actual FQDN).

insecure_private_key: No such file or directory

Took me a few minutes to figure this out from the Vagrant documentation

Seems like we would want to attempt to fetch that from Github if it is not found in $HOME/.vagrant.d/insecure_private_key

I'll send over PR later, this issue is merely just to remind me.

Document sourcedir for `vagrant up`

Another minor thing that took me a minute to figure out:

When running vagrant up manually, ensure you are in /path/to/chef-bcpc/vbox

VirtualBox Version Check Fails

On systems where $VBM isn't defined before virtualbox_env.sh:check_version() is run will fail with:

$ ./tests/automated_install.sh 
#### Setup configuration files
#### Setup VB's and Bootstrap
./virtualbox_env.sh: line 22: --version: command not found
ERROR: VirtualBox 4.3.8r92456 is less than 4.3.x!
  Only VirtualBox >= 4.3.x is officially supported.

SMART monitoring is disabled by default

I disabled SMART monitoring by default in the diamond recipe because we found it spammed syslog with voluminous condition check errors from the hpsa driver (HP Storage Array). I'd like to make this a preference from the user, or find a way to avoid the spammage. If the latter then I'll re-enable the SMART monitoring.

On a mac, sed -i cmd file should be sed -i -e cmd file

Issue seen in mac with sed.

For e.g., sed -i 's/vb.gui = true/vb.gui = false/' Vagrantfile will raise an error.

Change this to : sed -i -e 's/vb.gui = true/vb.gui = false/' Vagrantfile which should work fine.

DNS: Stop generating NS records for tenancy subdomains

It appears that we are generating NS records for all of our tenancy subdomains. This appears to be the offending bit in powerdns.rb powerdns-table-records_forward-view:

SELECT domains.id+500 AS id, domains.id AS domain_id, domains.name AS name, 'NS' AS type, '#{node[:bcpc][:management][:vip]}' AS content, 300 AS ttl, NULL AS prio, NULL AS change_date FROM domains WHERE id > (SELECT MAX(id) FROM domains_static) UNION

I'd like to first verify with the DNS guys, but I'm pretty sure we don't need this.

vbox_create fails on OS X Mavericks

Chriss-MacBook-Pro:chef-bcpc cmorgan$ VBoxManage hostonlyif create
0%...
Progress state: NS_ERROR_FAILURE
VBoxManage: error: Failed to create the host-only adapter
VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open /dev/vboxnetctl: No such file or directory

VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component HostNetworkInterface, interface IHostNetworkInterface
VBoxManage: error: Context: "int handleCreate(HandlerArg_, int, int_)" at line 68 of file VBoxManageHostonly.cpp

Vagrant-built VMs don't PXE boot on default VirtualBox 4.2 due to built in DHCP server

Trying to build on a fresh install of VirtualBox 4.2, the bootstrap node installed fine but the VM picked up an IP address of 192.168.56.101. That IP is from the range VirtualBox gives out in its default DHCP servere, which isn't carrying the PXE information. Quick fix was to delete the dhcp server from VirtualBox.

Warning message: ERROR: RuntimeError: Please set EDITOR environment variable

In cluster-assign-role.sh run, noticed this warning message:
ERROR: RuntimeError: Please set EDITOR environment variable

Load default Ubuntu images with glance

So, afaikt Cirros is a total pos. Pretty much unusable beyond making sure that the basic settings of your OpenStack installation are running.

As such, it would be amazing if we could load the default Ubuntu 12.04 image into glance

I'll try to take a stab at adding this as I already have it loaded locally.

Switch to omnibus Chef installers

I'm too busy to create a branch and PR right now...but here's a first-cut patch to switch to the Chef omnibus installers. This installs Chef 10 on the client and Chef 11 on the server. The chef-client run fails as the latest Chef 10 client Omnibus packages don't create a 'chef' user which chef-client cookbook expects. Oy.

The Chef 11 client packages require typing in a password for the knife configure --initial run...which is lame.

diff --git a/Vagrantfile b/Vagrantfile
index c62f13e..578a576 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -11,14 +11,12 @@ $local_mirror = nil

 if $local_mirror.nil?
   $repos_script = <<EOH
-    echo "deb http://apt.opscode.com precise-0.10 main" > /etc/apt/sources.list.d/opscode.list
 EOH
 else
   $repos_script = <<EOH
     sed -i s/archive.ubuntu.com/#{$local_mirror}/g /etc/apt/sources.list
     sed -i s/security.ubuntu.com/#{$local_mirror}/g /etc/apt/sources.list
     sed -i s/^deb-src/\#deb-src/g /etc/apt/sources.list
-    echo "deb http://#{$local_mirror}/chef precise-0.10 main" > /etc/apt/sources.list.d/opscode.list
 EOH
 end

diff --git a/cookbooks/bcpc/files/default/build_bins.sh b/cookbooks/bcpc/files/default/build_bins.sh
index f113f2b..2ae75e6 100755
--- a/cookbooks/bcpc/files/default/build_bins.sh
+++ b/cookbooks/bcpc/files/default/build_bins.sh
@@ -29,6 +29,19 @@ if [ -z `gem list --local fpm | grep fpm | cut -f1 -d" "` ]; then
   gem install fpm --no-ri --no-rdoc
 fi

+# Fetch chef client and server debs
+CHEF_CLIENT_URL=https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chef_10.30.4-1.ubuntu.12.04_amd64.deb
+#CHEF_CLIENT_URL=https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chef_11.10.4-1.ubuntu.12.04_amd64.deb
+CHEF_SERVER_URL=https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chef-server_11.0.11-1.ubuntu.12.04_amd64.deb
+if [ ! -f chef-client.deb ]; then
+   $CURL -o chef-client.deb ${CHEF_CLIENT_URL}
+fi
+
+if [ ! -f chef-server.deb ]; then
+   $CURL -o chef-server.deb ${CHEF_SERVER_URL}
+fi
+FILES="chef-client.deb chef-server.deb $FILES"
+
 # Build kibana3 installable bundle
 if [ ! -f kibana3.tgz ]; then
     git clone https://github.com/elasticsearch/kibana.git kibana3
diff --git a/setup_chef_cookbooks.sh b/setup_chef_cookbooks.sh
index 7ed81ae..7205d21 100755
--- a/setup_chef_cookbooks.sh
+++ b/setup_chef_cookbooks.sh
@@ -26,7 +26,7 @@ if [[ -f .chef/knife.rb ]]; then
   knife client delete $USER -y || true
   mv .chef/ ".chef_found_$(date +"%m-%d-%Y %H:%M:%S")"
 fi
-echo -e ".chef/knife.rb\nhttp://$BOOTSTRAP_IP:4000\n\n\n\n\n\n.\n" | knife configure --initial
+echo -e ".chef/knife.rb\nhttps://$BOOTSTRAP_IP\n\n\n/etc/chef-server/chef-webui.pem\n\n/etc/chef-server/chef-validator.pem\n.\n" | knife configure --initial

 cp -p .chef/knife.rb .chef/knife-proxy.rb

diff --git a/setup_chef_server.sh b/setup_chef_server.sh
index 033324f..1ac722c 100755
--- a/setup_chef_server.sh
+++ b/setup_chef_server.sh
@@ -16,39 +16,22 @@ if [[ -z "$CURL" ]]; then
    exit
 fi

-if [[ ! -f /etc/apt/sources.list.d/opscode.list ]]; then
-  cp opscode.list /etc/apt/sources.list.d/
-fi
-
-# When rerunning a bootstrap, the 'apt-get update' gets very slow if
-# the bootstrap node happens to be our apt mirror, so only do this if
-# the package we're after is not installed at all
-#
-# See http://askubuntu.com/questions/44122/upgrade-a-single-package-with-apt-get
-#
-if dpkg -s opscode-keyring 2>/dev/null | grep -q Status.*installed; then
-  echo opscode-keyring is installed
-else 
-  apt-get update
-  apt-get --allow-unauthenticated -y install opscode-keyring
-  apt-get update
-fi
-
 if dpkg -s chef 2>/dev/null | grep -q Status.*installed; then
   echo chef is installed
 else
-  DEBCONF_DB_FALLBACK=File{$(pwd)/debconf-chef.conf} DEBIAN_FRONTEND=noninteractive apt-get -y --force-yes install chef
+  dpkg -i cookbooks/bcpc/files/default/bins/chef-client.deb
 fi

 if dpkg -s chef-server 2>/dev/null | grep -q Status.*installed; then
   echo chef-server is installed
 else
-  DEBCONF_DB_FALLBACK=File{$(pwd)/debconf-chef.conf} DEBIAN_FRONTEND=noninteractive apt-get -y --force-yes install chef-server
+  dpkg -i cookbooks/bcpc/files/default/bins/chef-server.deb
+  sudo chef-server-ctl reconfigure
 fi

-
-chmod +r /etc/chef/validation.pem
-chmod +r /etc/chef/webui.pem
+chmod +r /etc/chef-server/admin.pem
+chmod +r /etc/chef-server/chef-validator.pem
+chmod +r /etc/chef-server/chef-webui.pem

 # copy our ssh-key to be authorized for root
 if [[ -f $HOME/.ssh/authorized_keys && ! -f /root/.ssh/authorized_keys ]]; then

Upgrade Ceph to Firefly (v0.80.x)

We are currently on Dumpling (v0.67.x). Firefly isn't final just yet, but we should think about our timing and updating to Firefly (v0.80.x).

https://ceph.com/docs/master/release-notes/#v0-80-firefly

Pip Breaks Behind MITM Proxies

For those using the Hadoop branch, you will now hit the following bug when run behind a MITM proxy without a trusted cert, as Pip has gone SSL-only:

Successfully installed pip2pi pip
Cleaning up...
+ /usr/local/bin/pip install setuptools --no-use-wheel --upgrade
Cannot fetch index base URL https://pypi.python.org/simple/
Could not find any downloads that satisfy the requirement setuptools in /usr/lib/python2.7/dist-packages
Downloading/unpacking setuptools
Cleaning up...
No distributions at all found for setuptools in /usr/lib/python2.7/dist-packages
Storing debug log for failure in /home/vagrant/.pip/pip.log

Discussion I have found so far can further be found at:

automated_install.sh script error on Mac due to difference in "nc" syntax

"nc" command on Mac doesn't seem to recognize the -q option used in the scripts.

Workaround Havana upstream novncproxy issue and new websockify package

nova-novncproxy does not currently start with the latest proposed packages for Havana due to an older version of websockify.

In /var/log/upstart/nova-novncproxy.log:

TypeError: __init__() got an unexpected keyword argument 'no_parent'

See https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1228490 for more info.

There is a ppa which has a new upstream websockify package:

# cat /etc/apt/sources.list.d/gdahlman-havana-precise.list 
deb http://ppa.launchpad.net/gdahlman/havana/ubuntu devel main
deb-src http://ppa.launchpad.net/gdahlman/havana/ubuntu devel main
# apt-get update
# apt-get upgrade websockify
# service nova-novncproxy restart

Doing add-apt-repository ppa:gdahlman/havana doesn't directly work for me as the distro name needs to be devel. YMMV.

vagrant 1.2.7 doesn't permit VMs to use .1 address

Vagrant 1.2.7 no longer allows VMs to be statically assigned the .1 address. We can change it to .3 instead. (This did work with Vagrant 1.2.2.)

See hashicorp/vagrant#1750 for upstream "fix".

There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* Static IPs cannot end in ".1" since that address is always
reserved for the router. Please use another ending.
* Static IPs cannot end in ".1" since that address is always
reserved for the router. Please use another ending.
* Static IPs cannot end in ".1" since that address is always
reserved for the router. Please use another ending.

keystone dies on idle cluster

A large cluster on real hardware was left idle for nearly two weeks. Keystone died on all head nodes. 'sudo service keystone status' reports 'stop/waiting' and in Kibana3 I can see muttering about tokens being revoked.

Need a way to carve out IP's from float net

Today, the Chef environment variable:
['bcpc']['floating']['available_subnet'] = "192.168.43.128/25"

Can be used to set a specific IP range as the float range, however, our hypervisor machines sit in our float range, so we would like to carve out a /24 from the range.

Perhaps we could add a new variable:
['bcpc']['floating']['exclude_subnet']
Or some other IP set operation.

The nova-manage command to delete a range from an already setup range is, for example:
nova-manage floating delete 191.168.0.0/24

Graphite fails to Redirect

If one goes to https://<VIP>/graphite today, one gets a complaint about SSL redirection being broken:

Bad Request

Your browser sent a request that this server could not understand.
Reason: You're speaking plain HTTP to an SSL-enabled server port.
Instead use the HTTPS scheme to access this URL, please.

    Hint: https://bogus_host_without_reverse_dns:8888/

Apache/2.2.22 (Ubuntu) Server at bogus_host_without_reverse_dns Port 8888

This is not the right thing and should be fixed...

Recipe compile error in /var/chef/cache/cookbooks/apt/providers/repository.rb

The bootstrap_chef.sh phase of bringing up the bootstrap node of CHEF-BCPC fails with this :

================================================================================
Recipe Compile Error in /var/chef/cache/cookbooks/apt/providers/repository.rb
================================================================================

NameError
---------
undefined local variable or method `use_inline_resources' for #<Class:0x7f460e1a2d58>

Cookbook Trace:
---------------
  /var/chef/cache/cookbooks/apt/providers/repository.rb:20:in `class_from_file'

Relevant File Content:
----------------------
/var/chef/cache/cookbooks/apt/providers/repository.rb:

 13:  # Unless required by applicable law or agreed to in writing, software
 14:  # distributed under the License is distributed on an "AS IS" BASIS,
 15:  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 16:  # See the License for the specific language governing permissions and
 17:  # limitations under the License.
 18:  #
 19:  
 20>> use_inline_resources
 21:  
 22:  def whyrun_supported?
 23:    true
 24:  end
 25:  
 26:  # install apt key from keyserver
 27:  def install_key_from_keyserver(key, keyserver)
 28:    execute "install-key #{key}" do
 29:      if !node['apt']['key_proxy'].empty?

[Mon, 10 Jun 2013 11:58:07 -0400] ERROR: Running exception handlers
[Mon, 10 Jun 2013 11:58:07 -0400] FATAL: Saving node information to /var/chef/cache/failed-run-data.json
[Mon, 10 Jun 2013 11:58:07 -0400] ERROR: Exception handlers complete
[Mon, 10 Jun 2013 11:58:07 -0400] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[Mon, 10 Jun 2013 11:58:07 -0400] FATAL: NameError: undefined local variable or method `u/User/Users/c/Users/Users//Users/User/Use/Use/Use/User/Use/Users/User/Users/User/Us/U//U/Use/U/U//////

Mac bootstrap VM create warning

After machine is booted and ready, I see this warning:

[bootstrap] The guest additions on this VM do not match the installed version of
VirtualBox! In most cases this is fine, but in rare cases it can
prevent things such as shared folders from working properly. If you see
shared folder errors, please make sure the guest additions within the
virtual machine match the version of VirtualBox you have installed on
your host and reload your VM.

Guest Additions Version: 4.1.12
VirtualBox Version: 4.3

Percona is Breaking Build

The current version of Percona (5.5.37-25.10-756.precise) breaks the build as all local requests fail as follows:

Access denied for user 'root'@'localhost' (using password: YES)

This is regardless of root having host % in mysql.user. The same cookbooks work with version 5.5.34-25.9-607.precise.

bloomberg / chef-bcpc Goto Github PK

chef-bcpc's Issues

Recommend Projects

Recommend Topics

Recommend Org