heartbeat

Feb 10 2011

Ensure Running

Has anyone noticed that pretty much every puppet module one finds on the internet by default enables the service they try to configure in the module

When looking at it from a single machine point of view it makes sense to include the module , have it configure your service and directly enable it by default.

So I started wondering .. isn't there anybody out there who is building clusters ? Where services have to configured on multiple nodes but should NOT be running acitvely on all nodes by default because there is an external tool which manages that for you (Pacemaker framework eg.)

Agreed it's a small patch to get the functionality you want , but it brings an extra overhead when one upgrades the modules etc.

So if it doesn't bother you please split your puppet module in 2 parts.. one you call to configure the service, another which you call to enable the service , if you want to.

thnx !

Nov 04 2010

High Availability MySQL Cookbook , the review

When I read on the internetz that Alex Davies was about the publish a Packt book on MySQL HA I pinged my contacts at Packt and suggested that I'd review the book .

I've ran into Alex at some UKUUG conferences before and he's got a solid background on MySQL Cluster and other HA alternatives so I was looking forward to reading the book.

Alex starts of with a couple of indepth chapters on MySQL Cluster, he does mention that it's not a fit for all problems, but I'd hoped he did it a bit more prominently ... an upfront chapter outlining the different approaches and when which approach is a match could have been better. The avid reader now might be 80 pages into MySQL cluster before he realizes it's not going to be a match for his problem.

I really loved the part where Alex correcly mentions that you should probably be using Puppet or so to manage the config files of your environment, rather than scp them around your different boxes ..

Alex then goes on to describe setting up MySQL replication and Multi Master replication with the different approaches one can take here, he gives some nice tips on using LVM to reduce the downtime of your MySQL when having to transfer the dataset of an already existing MySQL setup, good stuff.

He then goes on to describe MySQL with shared storage ... if you only mount your redundant sandisk once on your MySQL nodes my preference would probably be a Pacemaker stack rather than a RedHat Cluster based setup, but his setup seems to work too. Alex quickly touches on using GFS to have your data disk mounted simultaneously on both nodes (keep in mind with only 1 active MySQLd) and then goes on to describe a full DRBD based MySQL HA setup

The last chapter titled Performance tuning gives some very nice tips on both tuning your regular storage, as your
GFS setup but also the tuning parameters for MySQL Cluster

I was also really happy to see the Appendixes on the basic installation where he advocates the use of Cobbler , Kickstart and LVM ..

One of the better books I read the past couple of years .. certainly the best book from Packt so far , I hope there is more quality stuff coming from that direction !

Nov 18 2009

Got Interviewed

by @botchagalupe
on Virtualization, Open Source tools and DNS Problems

Oct 16 2009

Heartbeat 2 OpenAIS

While upgrading a pretty recent Heartbeat cluster to OpenAis earlier today I ran into the following weird situation

  1. Last updated: Fri Oct 16 08:50:03 2009
  2. Stack: openais
  3. Current DC: CO_NMS-1 - partition with quorum
  4. Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
  5. 4 Nodes configured, 2 expected votes
  6. 1 Resources configured.
  7. ============
  8.  
  9. Online: [ CO_NMS-1 CO_NMS-2 ]
  10. OFFLINE: [ co_nms-1 co_nms-2 ]

or

  1. crm(live)node# show
  2. co_nms-1(5c48ab4f-767f-e2dc-20ec-5969cddad152): normal
  3. co_nms-2(922ff786-eca9-bed0-d79d-8222727a2c5b): normal
  4. CO_NMS-1: normal
  5. CO_NMS-2: normal

Whohoo.. OpenAIS must have realized I have upperase and lowercase cores :)

Funny to see .. but quickly solved..

Feb 02 2009

Everything is a fine whitespace problem ...

Couple of days ago I was working on a Linux Heartbeat v2 setup.
Upon inserting an XML snippet into the cib cib-adm started eating memory fast until the oom killer kicked in.

The environment was running a fairly old heartbeat-2.0.8 version so I upgraded to heartbeat-2.1.4-2.1 and there I got a nice warning that my XML sintax wasn't correct.

There was a whitespace in the XML syntax.

  1. <expression attribute="#replicationvalue" id="is_lagged" operation ="gt" ... ><

Removing the whitespace solves the problem, also on the older version. So the problem is already fixed upstream.. but you might run into it anyhow.

Sep 24 2008

Bug in ifconfig ?

So earlier this week I ran into the weirdest problem with Linux-HA. Heartbeat was happily adding an IP address as an active resource so one of my nodes when needed, but upon removal it failed to remove the IP from the stack. Further debugging learned that the Heartbeat scripts claimed the ip wasn't on the actual stack.

It was.. but it the output from ifconfig was different from what it expected it to be.

Heartbeat checks the output of ifconfig and expects to find the IP address it added itselve to be on a :0 or similar interface. Now ifconfig only seems output 8 characters for the interface name Which means that when you have an interface called eth0:0 the output perfectly lists it and heartbea
t is smart enough to remove the ip again when the node goes to standby. If however you have a vlan with 3 digits on a bond interface Heartbeat will add :0 to bond0.129 , the Heartbeat resource will add the ip address perfectly but opon checking all the :0 interfaces the bond0.129:0 interface won't be parsed as ifconfig outputs it as bond0.129 , hence resulting in a potential painfull situation where 2 nodes still share an IP address.

So where's the actual problem ifconfig, or heartbeat, I'd say both, but the easiest fix will be in Heartbeat, afterall there are other preferred ways of adding an ip addres to an interface. ip addr add comes to mind :)

So I filed a bug report :)

Feb 06 2008

It's February again

It seems like for the past 4 years February is the month that O'Reilly really loves me and decides to publish one of my articles.

This years version was cowritten together with my collegue Johan Huysmans and tackles the creation of Highly Available Gateways

Altough the every HA situation is different and this is a pretty easy setup it's a good start for other setups.

Enjoy the read

PS. Yes I know , in 2006 I also had a January article :)

Oct 17 2007

Virtual Machine Replication

I don't know on which planet I have been for the past couple of years , days or hours but since when do
VMware’s Vmotion, XenSource’s Xenmotion or Virtual Iron’s Virtual Iron support Replication ?

Live Migration yes, but Replication , No.

I discussed this kind of technologies with Mark and Vincent , Moshe and others already a zillion times.. Continuously mirroring or realtime replication of a virtual machine is really difficult to do. And I haven't heard from a working scalable solution yet .. (Shared Memory issues such as we had with openMosix still are amongst the issue to be tackled)

Live Replication would mean that you mirror the full state of your virtual machine realtime to another running virtual machine. Every piece of disk/memory and screen you are using has to be replicated to the other side of the wire realtime. Yes you can take snapshots of filesystems and checkpoints of virtual machines. But continuous checkpointing over the network , I'd love to see that.. (outside of a lab)

So with a promise like that .. our good friends the CIO will be dreaming and the vendors will be blamed for not delivering what was promised to them.

But on the subject of using just Live Migration features as an alternative for a real High Availability solution , I know different vendors are singing this song, but it's a bad one.

Using Live migration in your infrastructure will give you the opportunity to move your applications away from a bad behaving machine when you notice it starts behaving badly, hence giving you a better overall uptime. If however you don't notice the machine is failing, or if it just suddenly stops working, or if your application crashes you are out of luck.
Live migration won't work anymore since you are to late, you can't migrate a machine that's dead. The only thing you can do is quickly redeploy your virtual machine on another node, which for me doesn't really qualify as a Clustered or HA solution.

Real HA looks at all the aspects of an application, the state of the application, the state of the server it is running on and the state of the network it is connected to. It has an alternative ready if any of these aspects fail. Session data is replicated, data storage is done redundantly and your network has multiple paths. If your monitoring decides something went wrong another alternative should take over with no visible interruption for the end user. You don't have to wait till your application is restarted on the other side of the network, you don't have to wait till your virtual machine is rebooted, your filesystems are rechecked and your database has recovered no it happens right away .

But Virtual Machine Replication as an alternative for HA ? I'd call that wishfull thinking and vapourware today

Sep 05 2007

LinuxConference Europe 2007 5/X

So today is the last day of LinuxConference Europe , down the stairs in the same building there a bunch of weirdos sitting at round tables for some highly elite and secret meeting. , also known as the KernelSummit.

I just heard someone say that they are figuring out which new bug they are planning to introduce into the new kernels.

I`m in LMB's tutorial on Linux HA, so I`ll be musing about one of my favourite topics today :)
Or I`ll just pay attention ;)

I`m wondering why Lars just modified one of his slides... maybe I`ll ask him over Lunch...