High Availability Storage Foundation , on SLES 10

So last friday I already teased you people by asking what you get if you mix stuff such as OCFS2, iSCSI , Xen and Linux-HA2 , here's the full story .

Guess what.. its all about the release of SLES 10 and the High Availability Storage Foundation

Last week we sat down with Jo De Baer to dicuss and proofread his document on the Novell High Availability Storage Foundation.
Jo spent a lot of time preparing this document and a fully working setup and we were the happy few actually validating his document and validating his setup. Great workshop, we did find some issues with the setup and some errors in the documents that got fixed in the meanwhile but overall really good work.

The overall concept of the Novell High Availability Storage foundation is to provide a more easy to install package including iSCSI, OCFS2, Hearbeat2 and Xen in SLES10, Suse did some work in making sure that OCFS2 and Heartbeat2 were working together nicely and in defining Xen as a Heartbeat Resource.

The goal is to create a virtual machine that is actually a cluster resource, so that when there is a problem with it , you just launch another one on another machine. For this you need to be able to access the same filesystem in order to launch the same virtual machine, and that's where the concept of exporting an loopback image pool over iSCSI with OCFS2 comes in.
Novell is clearly going for disk images., something I don't prefer, I love LVM for this and have always been running into issues with loopback now they have been optimising and bugfixing the loopback drivers for both performance and stability so at least that argument should be gone in the future. The idea of being able to move around an image to multiple machines by just copying a file is interresting for testing and playing around, but it doesn't belong in a managed infrastructure that should survive the 10th floor test. Well, at least they don't recommend using loopback images for your real data, just for the OS.

Anyway.. amongst the problems you run into when tring to do such a setup is how to integrate ocfs with ha-2
as OCFS2 is a cluster filesystem with it's own membership and heartbeat functions they don't play nice with HA-2 by default, luckily the OCFS2 CMS is pluggable so you can disable it and give this function to HeartBeat which is exactly how we set it up

The second problem is how do you tell heartbeat that Xen is a resource. Yes you can shutdown your vm's on one node then restart them on the other manually but then what's the HeartBeat doing, so you want HeartBeat to do that for you ..
But one can define Xen as a HeartBeat resource today

Of course this setup isn't really ideal, I much more prefer having failover between different applications in virtual machines, those things are already in production today giving you the advantage that you don't have to boot a full machine again before your application is ready hence less downtime, but still its a nice case that these different technologies really do work together.

Now the next steps in this kind of setups could be adding multiple virtual machines in a resource group and rather than shutting down a machine migrate it to another host based on the definitions of resource groups.

So now that SLES 10 is out , it's no secret anymore what's in there, I`m not going to document everything further inhere, Jo did a great job writing a huge document so that even our Sales guy understood what it was all about so I`ll point you to the real juice when Jo puts it online :)