May 15 2009

The machine that vanished.

Today I lost a machine, a physical one, I couldn't find it back in my rack anymore. One moment I was logged on to it, and when I instructed it to boot off the network again for a fresh installation I couldn't find it back anymore, it was gone.

When you have different ad hoc build development environments, you often grab whatever hardware is available to add to your pool and hope it doesn't kick you back, time always works against you when you have to build a fresh platform from a pool of hardware ready to be reused.

I had half a rack of hardware ready to be redeployed, the default boot order of most machines is Disk, Network so we trigger a fresh network install by overwriting the MBR. So the one machine .. after doing a quick check to see if there was nothing relevant on it anymore we sent it to the reboot pool.

The host was supposed to boot of the network, but I didn't even see a dhcp request coming in. So off to the lab it was .. where was that machine.. none of the consoles I tried was the correct one... until I found one box.. with a really really old installation , a machine that had returned from a different office.

And then it all came clear ... unlike all the other machines this machine had a 2 disk raid setup, which we actually weren't using , we indeed hat cleared the bootsector of the first disk, but not the second disk .. and we never had really cleared the 2nd disk. So rather than booting of the network because the first disk failed it booted of the old copy on the second disk.

Scratching that 2nd disk solved the problem .. for once it wasn't a DNS problem, but the RAID setup wasn't really helpfull either :)

PS. Yes re-labeling the machines is still on the todolist .. maybe next year :)

Feb 06 2009

Image Sprawl , and the new cure ..

When I tell people that the concept of copying VM's around as frequently done in the VMWare world is one of the most stupid ideas on this planet, I get the weirdest looks.

In my world it is, I want my infrastructure to be reproducible , I want to be able to throw any machine in my infrastructure out of the 10th floor of a building and be up and running again in no time. If I spread a bunch of VM copies around who knows what kind of life they start leading. Some will get upgrades, some won't ..
If I get an image from someone, how did he get there ? Nobody knows ..

To me Image Sprawl is more than not being able to to manage your Virtual Machines, it also matters for physical machines that are being deployed using a golden image.

Now rewind back about 4 something years.. back then I wrote a paper for LinuxKongress titled Automating Xen Virtual Machine Deployment which described a Hybrid way of Bootstrapping an infrastructure.
Quicly summarized, you use the benefits of images to quickly deploy a minimal image which
Luke today calls a Stem Cell then go on using centralized package management and a configuration management tool to keep them up to par. There are 2 things that changed in between,
we replaced CFEngine with Puppet , and the fact that today some people do care a bit more about the infrastructure side of the web, guess we have to thank Amazon and the Cloud Hype for that

But fundamentally .. not that much changed :)

Jan 18 2009

Is anybody else confused about Chef ?

Chef absolutely confuses me..

Luke is confused too ..

I’m clearly disappointed that someone who has been a high-profile user of Puppet but has never contributed much in the way of code (Ohloh claims 2 commits) would decide to start a whole new project rather than attempt to contribute to Puppet

Now , if you know me a bit you know that reinventing the wheel, or creating identical projects with no clear reasons is something I dislike .

When looking at Chef's FAQ there isn't really a clear reason listed why they wanted to create a new project.

I could understand if Chef were written in a total different language .. but hmm.. it's written in Ruby again .. I can only think of one other area where there are 2 major competing tools written in the same language and that is OTRS and RT, still wondering how that can happen.

One of the core values of an Open Source project is that you can contribute, adapt , and even fork.. why would you want to start over from scratch ?
So launching a competing open source project in that way therefore doesn't really seem like a smart thing to do,

Maybe one way to explain it is the European vs American style of Open Source Adoption ... , Luke has the more European approach (consultancy, build new features, support, train, evangelize, earn a good living) , where as OpsCode with Jesse Robins in charge might head for a more American style (Productize, Dual License , CashOut ).

So can the Chefs please explain why they didn't contribute to Puppet, or as their FAQ , well it doesn't really Answer any of the Questions