Jul 27 2008

OLS Day 4

H Peter Anvin starts of with a long history of IBM booting in order to get to Network booting. how we went from ROMs placed on network cards up till PXE
He explains how he went from SYSLINUX , to PXELINUX adding ISOLINUX etc ..

Next to him there was the another boot project , Net Boot Image Proposal originally from NetBSD which aimed at creating a ROM image which then became Etherboot wich just in 2008 became gPXE which is now aiming at full PXE support. Initially 2 different projects with a totally different approach.

However some NIC's were broken and PXELinux didn't solve those problems, but gPXE could solve these. So the PXELinux and gPXE folks met up. Which eventually lead to a joined effort. Syslinux will get a LUA scripting interface where you can boot a certain kernel based on hardware specification.

Next talk was David Lutterkort talk about Augeas .. luckily he didn't spoil it yet over dinner yesterday ;_) Very interresting stuff .. surely something that's on my todolist.

Next was a talk on Vesper . (Virtual Embraced Space Prober)
When Clustering Virtual Machines there are different possibilities, What does heartbeat do already to cluster virtual machines Heartbeat just uses a VM as a resource. A heartbeat is to "slow" to monitor it's actual resources ..

Their solution is to probe the virtual machine , event driven , therefore immediate failure detection is possible.
It will also give you an alternative to debug the failure better.
Vesper is a framework to handle Kprobes in a virtual environment.
Sungho Kim pointed us to the CIMHA project which is aimed to create an integrated suite to provision and manage HA clusters in virtual environment. It will also constantly analyze kernel/hardware health of cluster nodes by using VESPER. Not sure if I`ll put that on my todolist however

What is Bringup you'd ask , according to Tim Hockin it is the process of making a new piece of hardware boot up YOUR os.
He also told us about Drummonds law ..which learns us that When you ship to a customer some thing will fail and When you start debugging it goes away ..
To circumvent a lot problems they have been developing a set of tools starting off with SGABios and iotools and then goes on to show how he digs into the cpu and pci information using their new ppfs
His project is PrettyPrint

Then off to the closing session of this last Ottawa Linux Symposium, this will be the last OLS for a while as they are tearing down the Conference center.. co the Linux Symposium will move to Montreal next year..

We're off to the Black Thorn Cafe for the final closing event now

Jul 26 2008

OLS Day 3

As mentioned earlier Day 3 of the 2008 Ottawa Linux Symposium started of with our own talk about the different monitoring tools available on this planet.

I planned on heading to the SynergyFS talk but phone calls and the hallway track with interresting discussions on Configuration management, the use of Live Migration and obviously Open Source Monitoring tools and the scaling thereof came in the way of that..

So the openVZ Live migration talk came next ... Andrey also spoke on the Virtualization Mini summit and luckily he went a bit deeper
into their Live Migration strategy .. however as someone noticed .. why spend so much time on developing something that already
exists for ages ... so yes.. I guess I'll have to make that wheel T-shirt one day ...

After lunch I headed into the Performance Inspector, very interesting to see a tool that might help out to debug Java code and corelate it to platform stuff .. I however need to figure out if it would actually fit our environments or if it requires too much other dependencies.

Next was the Live Migration with Pass Trough Device for
The propose to use guest PCI hot removal and hot add so it can be migrated.

The topic has been discussed already a couple of times here in Ottawa (and in Boston at the Xen summit)
What puzzled me was the bonding mode used... I'm used to using miimon and thus the linkstate of my physical interfaces to decide between active and backup, but the arp_interval is another way to test if there is traffic on the (virtual) network interface and decide which one should be active.

The Auditing the Edgy and Complicated talk was a lot of fun.. it reminded me a lot of the time when I was more involved in the security area.. the stories from the trenches haven't really changed .. and it's still a pretty insecure environment out there which we should work on more often.

No pictures from space this time (as at LCA 2005), but an outreach to the community to synchronise

The blogosphere already had it's say about this a couple of months ago .. so now the kernel community can discuss it.

We're going to skip the Whiskey thingie .. I know a couple of people that will have heavy heads tomorrow morning .. but they are used to it .. we're off to get some decent food.. and probably some drinks ..

Jul 25 2008

OLS Day 2

Were does kernel documentation hide ?

It hides in all kind of different places, in /usr/src/linux/Documentation which isn't on the web
, in papers, in blogs, in Google Video, even in flashfiles .. so Google won't find them, or at least not all of it .
Rob Landley had a 6 month fellowship with the Linux Foundation focusing on cleaning up the documentation ...

The results is, now if we could only turn it into a Wiki ..

After Robs talk I went into Measuring database performance with NFSv4 .. sadly this Netapp focussed talk covers a proprietary database that I don't care about. The speaker claims that his proprietary vendor prefers NFS to store it's data .. if that's the case I wonder why they are focussing on their own Clusterfilesystem (OCFS2)
Maybe he should have focussed on an open database .. Obviously Netapp is in the business of selling their storage and not Cluster filesystems but I find it hard to believe.... also the fact that they only talk about client optimalization because the server side its a proprietary device wasn't really helping their case.

Obviously I went to the Virtualization of Linux Servers: a comparative study talk which had a standing room, but I'll save my thoughts on that one for a separate post.

Just one thought here .. there is a new benchmark in town namely the number of kernels can you build per hour :)

Next talk was the the Corosync Cluster Engine. isn't up yet but and the source code is about to be released but it aims at becoming the common cluster infrastructure that can be used by different cluster toolsm It will be used as a backend for Pacemaker and is already used by RedHat folks ..

Werner gave an interesting talk on the building of the openMoko NEO during which "nobody" said FSO is Android done right .. FSO being Werner described the traditional problems a young engineering company has to go trough in order to go from prototype to actual mass production.

So over the past couple of days I ran into people that have much more time to travel and go to events than I do , so who have been to both the Xen summit and the KVM Forum interesting to know that the KVM summit had about 50-60 people and the Xen summit about 100 people now these are developer summits .. not end user conferences like with other technologies.

After dinner there was a BOF about Direct Function assignment benefits by folks from Neterion

The advantage is that it Eliminates overhead of network virtualization
Allows support for a large number of guest with a compact number of ports.
Allows using native OS drivers as is ..

The idea of using bonding of a physically PCI bridged device with a virtual NIC sounds really tempting to enable migration of Virtual machines with a hardware dependency , however I still have some thoughts on that.

Jul 25 2008

OLS 2008 Presentation

So Tom and I just finished our Systems Monitoring Shootout talk here at OLS 2008.

The talk was fairly wel attended and gained a lot of hallway afterchatter. (We ran almost out of time so we took the Q&A in the Hallway so the next speaker could start his talk).

I've placed the presentation online already for your viewing pleasure ..

The Vote for your favourite monitoring tool is still open so please vote !

While here in Ottawa we got news that our talk was also selected for the upcoming Nagios conference in Germany in September.. so Tom will be presenting it there again.
Most probably with even more findings !

Anyway .. back to the conference now .. trying to catch up with my other writings :)

Jul 24 2008

OLS Day 1

So Ottawa Linux Symposium, at last .. OLS been in my planning for ages.. but finally this year I made it..
Compared to or , OLS is obviously one of the bigger conferences around. With approximately 600 visitors it might even be the biggest.

First on the shedule was Matthew WilCox Keynote , titled The Kernel Report

Matthew asked some interesting questions, such as
What would happen if we ran the Mindcraft tests again now on the same hardware.. some Linux 2.6 version against Vista .. who would win ?
A reply from the audience obviously was .. Vista probably wouldn't run ..

He gave a good overview of new features in the kernel that need my attention such as pluggable congestion control UBIFS and SMACK

After the state of the kernel keynote I went to Cloud computing talk from Gerrit Huizenga, a bit dissapointing as he didn't really talk about potential implementations and just trying to define it ..
Obviously Gerrit works for Bigblue as after 3 minutes I lost count on how many times he mentionned the 3 letter word.

Probably good for people that are new to it.. not for people that already build these kinda me ..
But still fairly well presented.

Then Into Arnaldo "Caipirinia" Carvalhlo de Melo 's talk about If I turn this knob .. what happens.

An interesting talk to get deeper into debugging scheduling , cpu affinity and realtime related issues.

Arnaldo mentioned tools such as python-schedutils, python-linux-procfs,ait ,tuna ,
and tuneit, Lots of work to be done here :)

After lunch I was late for the Second Arches workshop about Fedora on different architectures, but I was in time for the talk about Virtualization Workloads by Andrew Theurer

The big question which also some thesis students I know have run into is How do you define a relevant workloads for virtualization.
Initially one just takes the oldschool test to figure out the overhead that the Hypervisor adds.
Then you try to scale up the number of virtual machines and figure out what happens.

So there are a couple of workloads around
VMmark created by VMware .. it requires Exchange etc.. does 6 different benchmarks including web, mail, database, fileserver a java spec that requires a Bea JVM and an Idle server.

Also vConsolidate , created by Intel requires proprietary tools., it has a very similar approach as VMMark only running just 5 tests.

So one of the problems is that one can't just build an image that you can reuse to run on your environment, mainly due to licensing issues on most of these platforms.

With that in mind there is work ongoing at Spec that should be finished Q1 2009 to create a standardized virtualization benchmark.

However that's still not going to solve the fact that one can't just take prebuild image and run that on his test setup.

After the Workload talk I went to the talk about Korset , an Automated Zero False Alarm IDS it basically tracks regular behavour of certain binaries and finds out when it deviates.
We ran into the Korset guy already at the Speaker boat trip .. one of the questinos I had was why kill offending the process you might want to keep track of whats happening and alert someone.. but you might not want to kill it of by default.