Virtual Machine Replication

I don't know on which planet I have been for the past couple of years , days or hours but since when do
VMware’s Vmotion, XenSource’s Xenmotion or Virtual Iron’s Virtual Iron support Replication ?

Live Migration yes, but Replication , No.

I discussed this kind of technologies with Mark and Vincent , Moshe and others already a zillion times.. Continuously mirroring or realtime replication of a virtual machine is really difficult to do. And I haven't heard from a working scalable solution yet .. (Shared Memory issues such as we had with openMosix still are amongst the issue to be tackled)

Live Replication would mean that you mirror the full state of your virtual machine realtime to another running virtual machine. Every piece of disk/memory and screen you are using has to be replicated to the other side of the wire realtime. Yes you can take snapshots of filesystems and checkpoints of virtual machines. But continuous checkpointing over the network , I'd love to see that.. (outside of a lab)

So with a promise like that .. our good friends the CIO will be dreaming and the vendors will be blamed for not delivering what was promised to them.

But on the subject of using just Live Migration features as an alternative for a real High Availability solution , I know different vendors are singing this song, but it's a bad one.

Using Live migration in your infrastructure will give you the opportunity to move your applications away from a bad behaving machine when you notice it starts behaving badly, hence giving you a better overall uptime. If however you don't notice the machine is failing, or if it just suddenly stops working, or if your application crashes you are out of luck.
Live migration won't work anymore since you are to late, you can't migrate a machine that's dead. The only thing you can do is quickly redeploy your virtual machine on another node, which for me doesn't really qualify as a Clustered or HA solution.

Real HA looks at all the aspects of an application, the state of the application, the state of the server it is running on and the state of the network it is connected to. It has an alternative ready if any of these aspects fail. Session data is replicated, data storage is done redundantly and your network has multiple paths. If your monitoring decides something went wrong another alternative should take over with no visible interruption for the end user. You don't have to wait till your application is restarted on the other side of the network, you don't have to wait till your virtual machine is rebooted, your filesystems are rechecked and your database has recovered no it happens right away .

But Virtual Machine Replication as an alternative for HA ? I'd call that wishfull thinking and vapourware today


Kris Buytaert's picture

#1 Kris Buytaert : Identical does not exist

Imagine 2 identical servers, installed from identical media, where you create Identical user, this identical user then starts ssh-keygen

What's the chance the keys these identical user generate are identical ?

PJ's picture

#2 PJ : I can't help but wonder if

I can't help but wonder if "replication" could be done not by copying state, but by starting out with identical state and copying just inputs over the wire. So identical VMs start up and the user drives the primary and the user's interaction with it is replicated to the secondary 'hot spare'... if the primary dies, the secondary should be right there in the same state, just waiting to be made primary. It's like xfering just the deltas - but in this kind of limited case, the only delta is what the user's doing. Note that yes, this clearly falls down if the apps in question deal with the outside world much - those datastreams would have to be replicated too. Multicast maybe?

As a complete random aside, I'm really sad that openMosix has mostly died - I think there's still room out there for good SSI cluster technology.