ha | Everything is a Freaking DNS problem

ha

Nov 10 2007

Scaling Drupal

By: Kris Buytaert Tags:

John Quinn writes about Scaling Drupal he is taking a one step at a time approach and is still writing his 4th and 5 stages.

His first step obviously is separating the drupal from a separate database server, and he chooses mysql for this purpose, moving your DB to a different machine is a good thing to do.

However then he gets this crazy idea of using NFS to share his his drupal shared files :(
(he even dares to mention that the setup ease is good) Folks, we abandonned NFS in the late nineties. NFS is still a recipe for disaster, it has performance issues , it as stability issues (stale locks), and no security admin in his right mind will tolerate portmap to be running in his DMZ.
(Also think about the IO path that one has to follow to serve a static file to a surfer when the file is stored on a remote NFS volume)

On top of that he adds complexity in a phase where it isn't needed yet. Because of the fact he needs to manage and secure NFS and he is storing his critical files on the other side of the ethernet cable he did create a single point of failure he didn't need creating yet.
Yes as soon as you start to scale you need to look at a scalable and redundant way to share your files.
When those files are pretty static you'll start out with a set of rsync scripts or scripts that push them to different servers upon deploying your application. When they are changing often you start looking into filesystems or block devices that bring you replication, such as DRBD or Lustre
But if today his NFS server goes down he is screwed, much harder than when his database has a hickup.

One could discuss the order of scaling, but adding more webservers might not always be the first step to take, one might want to tackle the database first depending on the application.
He decides to share the load of his application over multiple Drupal instances using apache mod_proxy , then adds Linux-HA to make it highly available.
I`m interested in knowing why he chose for apache mod_proxy and not for LVS

Although using NFS for me belongs in a How NOT to scale tutorial, his other steps give you a good idea of the steps to take.

I`m looking forward to his next steps :) I hope that in part 4 he also removes NFS in favour of a solution with no performance and locking issues that really takes away a big fat single point of failure. In part 5 he discusses how to scale your database environment. The actual order of implementing step 2 and 5 will be different for each setup.

Anyway.. I`m following up on his next steps.. interesting reading

Oct 17 2007

Virtual Machine Replication

By: Kris Buytaert Tags:

I don't know on which planet I have been for the past couple of years , days or hours but since when do
VMware’s Vmotion, XenSource’s Xenmotion or Virtual Iron’s Virtual Iron support Replication ?

Live Migration yes, but Replication , No.

I discussed this kind of technologies with Mark and Vincent , Moshe and others already a zillion times.. Continuously mirroring or realtime replication of a virtual machine is really difficult to do. And I haven't heard from a working scalable solution yet .. (Shared Memory issues such as we had with openMosix still are amongst the issue to be tackled)

Live Replication would mean that you mirror the full state of your virtual machine realtime to another running virtual machine. Every piece of disk/memory and screen you are using has to be replicated to the other side of the wire realtime. Yes you can take snapshots of filesystems and checkpoints of virtual machines. But continuous checkpointing over the network , I'd love to see that.. (outside of a lab)

So with a promise like that .. our good friends the CIO will be dreaming and the vendors will be blamed for not delivering what was promised to them.

But on the subject of using just Live Migration features as an alternative for a real High Availability solution , I know different vendors are singing this song, but it's a bad one.

Using Live migration in your infrastructure will give you the opportunity to move your applications away from a bad behaving machine when you notice it starts behaving badly, hence giving you a better overall uptime. If however you don't notice the machine is failing, or if it just suddenly stops working, or if your application crashes you are out of luck.
Live migration won't work anymore since you are to late, you can't migrate a machine that's dead. The only thing you can do is quickly redeploy your virtual machine on another node, which for me doesn't really qualify as a Clustered or HA solution.

Real HA looks at all the aspects of an application, the state of the application, the state of the server it is running on and the state of the network it is connected to. It has an alternative ready if any of these aspects fail. Session data is replicated, data storage is done redundantly and your network has multiple paths. If your monitoring decides something went wrong another alternative should take over with no visible interruption for the end user. You don't have to wait till your application is restarted on the other side of the network, you don't have to wait till your virtual machine is rebooted, your filesystems are rechecked and your database has recovered no it happens right away .

But Virtual Machine Replication as an alternative for HA ? I'd call that wishfull thinking and vapourware today

Sep 14 2007

On the Future of Lustre

By: Kris Buytaert Tags:

So Sun bought ClusterFS. I`m wondering what their focus will be now. What will be the prime platform on which Lustre will be developed Solaris or Linux ? Will other efforts in the open source cluster filesystem area react on this ? Will Lustre development speed up ? Will management become less complex ?
Time will tell .. I`m keeping an eye on it

Kris Buytaert's blog

Sep 05 2007

LinuxConference Europe 2007 5/X

By: Kris Buytaert Tags:

So today is the last day of LinuxConference Europe , down the stairs in the same building there a bunch of weirdos sitting at round tables for some highly elite and secret meeting. , also known as the KernelSummit.

I just heard someone say that they are figuring out which new bug they are planning to introduce into the new kernels.

I`m in LMB's tutorial on Linux HA, so I`ll be musing about one of my favourite topics today :)
Or I`ll just pay attention ;)

I`m wondering why Lars just modified one of his slides... maybe I`ll ask him over Lunch...

Kris Buytaert's blog

Aug 31 2007

Ganeti

By: Kris Buytaert Tags:

I already mentionned ganeti before when going over the LinuxConference Europe Schedule
Google just poste the news a couple of hours ago and you can go and check out the project on Google Code

Quoting from their site :

"Ganeti is a virtual server management software tool built on top of Xen virtual machine monitor and other Open Source software.
However, Ganeti requires pre-installed virtualization software on your servers in order to function. Once installed, the tool will take over the management part of the virtual instances (Xen DomU), e.g. disk creation management, operating system installation for these instances (in co-operation with OS-specific install scripts), and startup, shutdown, failover between physical systems. It has been designed to facilitate cluster management of virtual servers and to provide fast and simple recovery after physical failures using commodity hardware. "

You can use disk management using either plain LVM volumes, local-disk raid1 mirrors or across-the-network raid1 (using DRBD) for quick recovery in case of physical system failure

I`ll certainly be in the talk next week and I`ll be keeping a look on what happens ..

Kris Buytaert's blog

Dec 13 2006

MySQL Cluster Disk Data Storage, the fine print

By: Kris Buytaert Tags:

As eventually read on Cluster Disk Data Storage

Important: In MySQL 5.1.8 and later, there can exist only one log file group at any given time.

Somehow error handling could improve ,
ERROR 1515 (HY000): Failed to create LOGFILE GROUP
makes me go look whats wrong with the cluster status :)

Jul 17 2006

High Availability Storage Foundation , on SLES 10

By: Kris Buytaert Tags:

So last friday I already teased you people by asking what you get if you mix stuff such as OCFS2, iSCSI , Xen and Linux-HA2 , here's the full story .

Guess what.. its all about the release of SLES 10 and the High Availability Storage Foundation

Last week we sat down with Jo De Baer to dicuss and proofread his document on the Novell High Availability Storage Foundation.
Jo spent a lot of time preparing this document and a fully working setup and we were the happy few actually validating his document and validating his setup. Great workshop, we did find some issues with the setup and some errors in the documents that got fixed in the meanwhile but overall really good work.

The overall concept of the Novell High Availability Storage foundation is to provide a more easy to install package including iSCSI, OCFS2, Hearbeat2 and Xen in SLES10, Suse did some work in making sure that OCFS2 and Heartbeat2 were working together nicely and in defining Xen as a Heartbeat Resource.

The goal is to create a virtual machine that is actually a cluster resource, so that when there is a problem with it , you just launch another one on another machine. For this you need to be able to access the same filesystem in order to launch the same virtual machine, and that's where the concept of exporting an loopback image pool over iSCSI with OCFS2 comes in.
Novell is clearly going for disk images., something I don't prefer, I love LVM for this and have always been running into issues with loopback now they have been optimising and bugfixing the loopback drivers for both performance and stability so at least that argument should be gone in the future. The idea of being able to move around an image to multiple machines by just copying a file is interresting for testing and playing around, but it doesn't belong in a managed infrastructure that should survive the 10th floor test. Well, at least they don't recommend using loopback images for your real data, just for the OS.

Anyway.. amongst the problems you run into when tring to do such a setup is how to integrate ocfs with ha-2
as OCFS2 is a cluster filesystem with it's own membership and heartbeat functions they don't play nice with HA-2 by default, luckily the OCFS2 CMS is pluggable so you can disable it and give this function to HeartBeat which is exactly how we set it up

The second problem is how do you tell heartbeat that Xen is a resource. Yes you can shutdown your vm's on one node then restart them on the other manually but then what's the HeartBeat doing, so you want HeartBeat to do that for you ..
But one can define Xen as a HeartBeat resource today

Of course this setup isn't really ideal, I much more prefer having failover between different applications in virtual machines, those things are already in production today giving you the advantage that you don't have to boot a full machine again before your application is ready hence less downtime, but still its a nice case that these different technologies really do work together.

Now the next steps in this kind of setups could be adding multiple virtual machines in a resource group and rather than shutting down a machine migrate it to another host based on the definitions of resource groups.

So now that SLES 10 is out , it's no secret anymore what's in there, I`m not going to document everything further inhere, Jo did a great job writing a huge document so that even our Sales guy understood what it was all about so I`ll point you to the real juice when Jo puts it online :)

Jan 17 2006

GFS and DNS

By: Kris Buytaert Tags:

Jan 17 20:32:35 cnode1 ccsd4672: cluster.conf (cluster name = my_cluster, version = 5) found.
Your host name maps to the loopback device rather than a real network interface.Please change your /etc/hosts file so that your host name has a proper IP
address, as a cluster cannot function over the loopback interface.
root@cnode1 modules# Jan 17 20:32:42 cnode1 ccsd4672: Unable to connect to cluster infrastructure after 30 seconds.