Mar 29 2011

Vagrant & Rubylibs

I was testing some MySQL puppet modules on my Vagrant box earlier this week and one of them required augeas.
I kept running into "Could not find a default provider for augeas", however all the appropriate augeas , augeas-lib and ruby-augeas packages were installed. I inspected the different ruby directories and the files were perfectly in /usr/lib/ruby/site_ruby/1.8 where I expected them.

With all the files seemd to be in the right place, my next option was to strace a small ruby script that included augeas, guess what that showed ..

  1. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  2. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  3. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  4. stat64("/opt/ruby/lib/ruby/site_ruby/1.8/i686-linux/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  5. stat64("/opt/ruby/lib/ruby/site_ruby/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  6. stat64("/opt/ruby/lib/ruby/site_ruby/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  7. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  8. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  9. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  10. stat64("/opt/ruby/lib/ruby/vendor_ruby/1.8/i686-linux/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  11. stat64("/opt/ruby/lib/ruby/vendor_ruby/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  12. stat64("/opt/ruby/lib/ruby/vendor_ruby/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  13. stat64("/opt/ruby/lib/ruby/1.8/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  14. stat64("/opt/ruby/lib/ruby/1.8/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  15. stat64("/opt/ruby/lib/ruby/1.8/i686-linux/augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  16. stat64("/opt/ruby/lib/ruby/1.8/i686-linux/", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  17. stat64("./augeas.rb", 0xbfd2af1c) = -1 ENOENT (No such file or directory)
  18. stat64("./", 0xbfd2af1c) = -1 ENOENT (No such file or directory)

Indeed ... vagrant throws the default ruby to /opt/ruby .. and obviously there were no ruby-augeas files in there.

Mar 04 2011

24 hours of Puppet Drama

Over the past couple of days I've been fighting with a weird puppet problem , we eventually cracked it , but I promised a bunch of you to fully explain it here ;)

So we were deploying 2 Blade chassis at a pretty remote location with a mix of phyisical and virtual machines, some 48 instances in total. This is a pretty standard rollout, we've got a bunch of similar platforms in our lab , so we knew about a couple of glitches, what to expect etc.

I was just keeping an eye on the deployment, looking at the logs seeing if things were running fine, when suddenly a couple of puppet runs didn't come trough, we had seen such behaviour before, usually it's a matter of running them a gain a couple of times and they will come trough. (Upgrading ruby and putting passenger in front of puppet actually solved those issues,
We'd even had a loop built in the platform that runs puppet a couple of times till it returns with the correct exit code just to make sure. )

We were first scratching the A chain of our setup, so that in the event of failure we could still bring up the B chain of the platform and be up and running again. Actually machines were coming up.. slowly .. some of them took a bit longer . One of the machine's clock was seriously off .. the SSL was barfing on it , so we set the bios clock, and restarted .. it was the machine with 6VM's took a while but everything was back on schedule.. then suddenly things were going down fast more and more puppetruns started failing and .. , at some point in time actually none of our puppet runs were working again .
I'd see the puppetmaster perfectly compile it's catalog

  1. notice: Compiled catalog for ctl-0-a

Then the client .. not wanting to get it ..

  1. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Not using expired catalog for ctl-0-a from cache; expired at Tue Mar 01 09:50:06 +0000 2011
  2. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Using cached catalog
  3. Mar 1 11:10:45 ctl-0-a puppet-agent[3674]: Could not retrieve catalog; skipping run

We had gone from about 60% of our fresly deployed boxen working fine, to not one
So what do you do .. indeed .. you turn on debugging.
You put both your puppetmaster and client in debug. Nothing, no errors no nothing ..

I asked some collegues, asked on irc .. much ideas but none of them that actually cracked the problem. I did what I knew that solve similar problems before,

I switched our serialization format from yaml back to pson , and back, no luck.
I upgraded ruby to a version from the repository. No luck.
I upgraded our Puppet version 2.6 to a version from the TMZ Epel repo , we cleaned out ssl the certificates on all sides multiple times. Cleaned out /var/lib/puppet , We uninstalled puppet and reinstalled it.
It wasn't a DNS Problem

I had started stripping my manifests to empty runs, those worked, then started uncommenting the actual manifests again ... Then in the middle of the debug our VPN connection to the remote location broke down, we'd only be getting it back in the morning ..about 12 hours later not fun. Murphy obviously ..

So the next morning we dived right back in ... making those manifests bigger again, removing all the stages, 1 or 2 successful runs, then with the same config .. back to failure. On and off.. successfull and unscussessful. ... it wasn't in the manifests ..

So we decided to roll the puppetmaster back to it's previous version, that one was known to be stable, there obviously was something really fishy going, so that was the safest bet.

Wrong, the machine came up, but it took longer than expected, and when trying to connect new clients to it .. nothing worked anymore .. same problem as before .. puppetmaster compiles catalog, clients didn't get anything. we started to suspect faulty hardware .. but how could that bee.. the puppetclient looked liked the only malfunctioning thing around .

Then Dim0 suggested me to look at the that one logfile I hadn't looked , /var/log/puppet/masterhttp.log and then we saw it . it was being flooded with ssl errors, ssl errors from clients that shouldn't even be connecting to the puppetmaster at all.

  1. [2011-03-02 13:32:00] ERROR OpenSSL::SSL::SSLError: tlsv1 alert decrypt error
  2. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:44:in `accept'
  3. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:44:in `listen'
  4. /usr/lib/ruby/1.8/webrick/server.rb:173:in `call'
  5. /usr/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
  6. /usr/lib/ruby/1.8/webrick/server.rb:162:in `start'
  7. /usr/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
  8. /usr/lib/ruby/1.8/webrick/server.rb:95:in `start'
  9. /usr/lib/ruby/1.8/webrick/server.rb:92:in `each'
  10. /usr/lib/ruby/1.8/webrick/server.rb:92:in `start'
  11. /usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
  12. /usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
  13. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:42:in `listen'
  14. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `initialize'
  15. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `new'
  16. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:41:in `listen'
  17. /usr/lib/ruby/1.8/thread.rb:135:in `synchronize'
  18. /usr/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:38:in `listen'
  19. /usr/lib/ruby/site_ruby/1.8/puppet/network/server.rb:127:in `listen'
  20. /usr/lib/ruby/site_ruby/1.8/puppet/network/server.rb:142:in `start'
  21. /usr/lib/ruby/site_ruby/1.8/puppet/daemon.rb:124:in `start'
  22. /usr/lib/ruby/site_ruby/1.8/puppet/application/master.rb:114:in `main'
  23. /usr/lib/ruby/site_ruby/1.8/puppet/application/master.rb:46:in `run_command'
  24. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
  25. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:393:in `exit_on_fail'
  26. /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
  27. /usr/sbin/puppetmasterd:4
  28. [2011-03-02 13:32:00] ERROR OpenSSL::SSL::SSLError: tlsv1 alert decrypt error

What happened was that 'we' decided to bring of the one backup machines back online, afterall once the slow starting server came trough, it would be the passive node in the cluster , no worries there, right ? Wrong,
This physical machine had 6 virtual machines with old ssl certificates that got stuck in an loop which was put there to sure their puppetrun came trough correctly at boot time.

Those 7 rogue clients which generated little to no relevant traffic on the network were saturating the default webrick, killing them solved the problem and we were back to regular deployment in no time.

The sad part is that our upcoming release already has passenger , a fresher version of ruby etc .. and that most of the above mentioned errors won't occur anymore there.
But in short .. don't use the default webrick .. it will kill you :)

And no , not everything is a freaking dns problem, ssl is a big pain in the B too .. :)

Jun 01 2010

PuppetCamp Europe 2010

Last week was pretty heavy on conferences for me. On wednesday I had to give my Building Virtual Appliances talk at the at the Sizing Server event on Advanced Virtualization and Hybrid Cloud Computing , but the most important part of the week was the first edition of Puppetcamp Europe.

When the first ideas about PuppetCamp Europe started I asked Luke when and where it'd be held. He replied that I should know as I was supposed to organise it... I thanked for the honour , he went on to ask Patrick , he accepted ... I hope I helped him out enough :) I even handed out a personal invitation to some of the most famous configuration mgmt people on this planet and Inuits sponsored the event too

Luke started with the opening talk, talking about the future and past of puppet , about version numbers, 2.6 does sound familiar and stable doesn't it, about
During @puppetmasterd 's talk @kartar played Bugmaster which was great and almost realtime

The real fun started with the Open Spaces ... after everybody presented themselves, a mix of usual suspects, first timers and oldskoolers from irc #puppet that finally got faces, different sessions were proposed, ranging from Puppet 101, Alternative Puppet Architectures, Puppet HA, MultiMaster Puppet to Dating for PuppetMasters

Over the 2 days spread the open space different ideas came up on e.g how to scale puppet. Different people are letting their puppetclients run from cron in batches, but probably the weirdest idea I heard was to run Puppet in Jruby in order to speed it up.

Lots of talk on certificates and how to solve the pains with them .. e.g like in a HA setup .. you need to create an authority chain .. there was also talk about having a
--trust-my-network feature that would disable certificates, Luke was open to accepting such a patch, or a patch that would make the whole certificate setup more pluggable
That would for sure be a feature a lot of people would want to use ..

The thurday evening conference dinner was "Stoofvlees met Frieten" for most of us .. but for me it was a London Devops Curry in Gent, with @unixdaemon @ripienaar and some others ;)

But with lots of interesting chatter, free beer and free icecream there's for sure going to be another similar event in Europe next year ..

Mar 30 2010

#Devops / Ruby Meetup , Antwerp, April 8, 2010

Joshua Timberman will be in town, (Antwerpen) that is, for Loadays as he is arriving on thursday Botchagalupe suggested we should have a Devops / Ruby get together.

So I'm dutyfully announcing the Devops/Ruby meetup next thursday april 8th, in Antwerp

The plan is to meet up for beers and chatter in our favourite Antwerp geek pub in , Kulminator , Vleminckveld 32 , Antwerp , around 20h00 ish..

Topics will be devops, ruby and much more :)

No need to register .. just show up ..

If for some reason the Kulminator is to crowdy, smokey, closedy you should be able to find us next door in the Zeppos :)

Sep 02 2007

Juliux End of Life Announcement

Raf posted the Juliux End Of Life announcement a couple of days ago,
We started Juliux about 2 years ago with as idea to create an open source alternative for the back then proprietary RedHat Satelite and the back then totally broken ZLM from Novell. We set out to build a webbased platform from where you could do package management for a hybrid set of servers, with as a second version target to also integrate some form of configuration management. We even presented our proof of Concept at the 2006 UKUUG Lisa Conference

RedHat in the meanwhile announced open sourceing Satellite, but more importantly Puppet became more popular and more feature rich.
One of the features that came in puppet that made me rethink Juliux was the fact that you could do package management including installation etc within Juliux, so I started thinking about making Juliux a webfrontend that used Puppet as a back end.

The ideas were good, but we really didn't have time to focus on developing the things we wanted to write. So when I ran into PuppetShow the time was right to refocus the little efforts we were spending and as Raf announced he now will be working on PuppetShow when he has time.

Funny . there was a time when I was thinking about rebranding Julux to PuppetMaster, or Master of Puppets, but I actually like PuppetShow better .. So we had similar ideas , only Luke had more time to implement his ideas .

Time often is an issue in opensource development. There's lots of stuff that I have in my head that could easily be written .. but I don't always have time to do it myselve or other people available that can write it for me .. :( Well.. maybe one day :)