Dec 19 2008

MySQL 5.1 on Fedora 10

So I wanted a fresh MySQL 5.1 release on my Fedora 10 box.
Fedora comes with a mysql 5.0 package and the MySQL community packages for 5.1 conflict with that in different ways. (to start with MySQL vs mysql)

For some weird reason libmysqlclient (in mysql-libs) is required by redland, which is required by I probaly never need the RDF funtionalities that redland provides me but I`m stuck with them anyhow.

So Fred pointed me to

Remi repo for mysql 5.1

  1. yum --enablerepo remi install mysql-server

Now gives me a very nice MySQL 5.1 to play around with.

Dec 18 2008

Free Beer, Meetup today has a lot of active open source users and contributors.
The weird thing is that we never meet apart from Fosdem.
Yes, we occasionally run into eachother at other events but there's not enough beer involved.

I've had different people ask me what and where are the Open Source gatherings in Belgium, and we must admit that apart
from Fosdem there isn't that much in our little country.

There were a couple of MySQL User Group events, some Drupal ones, some LUG had meetings altough I have the idea most of them have dried out :(

So I have this crazy idea of inviting you all to grab a beer, maybe even free beer :) on december 29 some in a pub in Antwerp

I`m open for suggestions on good locations.

Oh and everybody is welcome, both readers and writers :)

PS. Yes I know that some of you will be drinking in Berlin at that time .. we'll drink an extra one for you.

Dec 07 2008

CCK & Views, My First Experiences

and probably my last one ..

I mentioned a couple of weeks ago that I was looking into a way of converting my static page with published papers, articles and presentations in a more dynamic page where I could create RSS feeds from the updates and feed them into another site for everybody to use.

Some people suggested that I'd have a look at CCK & Views for this. While up till now when creating an app I had usually written my own Drupal module with my own database schema and manually written SQL Queries. This indeed looked like the perfect opportunity to dig into the CCK and Views thingie.

Now I must admit that I`m not really fond of "generated queries"

I've had nightmares before when having to debug 2Mb Hibernate queries that could be reduced to a single select value from table where query but Hibernate used a zillion join on tables I didn't need.
So I was a bit afraid of what Views would do ..

So here's what happened, I created 2 pages .. presentations. and publications. Personally I'd go for 1 table each with some smart indexes and then the only thing you want is SELECT * FROM table ORDER BY date

Now if I'd create a special content type for this I probably get some more benefits.
And I could get the data from the tables with a rather simple query like this :

  1. SELECT node.nid,
  2. content_type_presentation.field_datum_value,
  3. content_type_presentation.field_location_value,
  4. content_type_presentation.field_country_value ,
  5. content_type_presentation.field_event_link_title,
  6. content_type_presentation.field_event_link_url
  7. FRON content_type_presentation, node
  8. WHERE node.type="presentation" AND node.nid = content_type_presentation.nid
  9. ORDER BY content_type_presentation.field_datum_value desc;

However when using Views , this is what is being created.

  1. SELECT DISTINCT(node.nid) AS nid,
  2. node_data_field_country.field_datum_value AS node_data_field_country_field_datum_value,
  3. node_data_field_country.nid AS node_data_field_country_nid,
  4. node.type AS node_type,
  5. node_data_field_country.field_event_link_url AS node_data_field_country_field_event_link_url,
  6. node_data_field_country.field_event_link_title AS node_data_field_country_field_event_link_title,
  7. node_data_field_country.field_event_link_attributes AS node_data_field_country_field_event_link_attributes,
  8. node_data_field_country.field_location_value AS node_data_field_country_field_location_value,
  9. node_data_field_country.field_country_value AS node_data_field_country_field_country_value,
  10. node_data_field_link.field_link_url AS node_data_field_link_field_link_url,
  11. node_data_field_link.field_link_title AS node_data_field_link_field_link_title,
  12. node_data_field_link.field_link_attributes AS node_data_field_link_field_link_attributes,
  13. node_data_field_link.nid AS node_data_field_link_nid
  14. FROM node node
  15. LEFT JOIN content_type_presentation node_data_field_country ON node.vid = node_data_field_country.vid
  16. LEFT JOIN content_field_link node_data_field_link ON node.vid = node_data_field_link.vid
  17. WHERE UPPER(node_data_field_country.field_country_value) != UPPER('')
  18. ORDER BY node_data_field_country_field_datum_value DESC

So there's some joins, a zillion more fields that I don't really need being fetched from my database.
The explain also tells me some interesting things.. like the Using temporary AND Using Filesort

  1. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  2. +----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+------+----------------------------------------------+
  3. | 1 | SIMPLE | node_data_field_country | ALL | PRIMARY | NULL | NULL | NULL | 33 | Using where; Using temporary; Using filesort |
  4. | 1 | SIMPLE | node | eq_ref |vid | vid | 4 |buytaert_blog.node_data_field_country.vid | 1 | |
  5. | 1 | SIMPLE | node_data_field_link | eq_ref | PRIMARY | PRIMARY | 4 | buytaert_blog.node.vid | 1 | |
  6. +----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+------+----------------------------------------------+

Now my simpler query uses the same amount of rows to return the result but doesn't build a temporary table.

  1. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+
  2. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  3. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+
  4. | 1 | SIMPLE | content_type_presentation | ALL | nid | NULL | NULL | NULL | 33 | Using filesort |
  5. | 1 | SIMPLE | node | ref | PRIMARY,node_type,nid | nid | 4 | buytaert_blog.content_type_presentation.nid | 1 | Using where |
  6. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+

In this small example there isn't much difference in performance to be noticed, but with bigger tables you will see other results. However I still prefer my personal query approach.

The other thing that bugs me is the excessive use of aliases. The MySQL internal experts can shed a light on the performance impact of using them, but if you are running queries remotely you at least have an increase in bandwidth usage. Not a problem now, but who knows later when you are big and famous :)

Now Drupal seems to do it pretty ok .. it's aliasing table.field to table_field which still makes sense. Not unlike some other tools that alias table.field to dynamically_generated_unreproducable_unreadable_key_word_with_no_relevant_meaning , try debugging that for a change. And yes I see valuable use for resultset aliasing, but I also se it is often overused.

But the short summary ... given my background I prefer writing the module and crafting my own queries over having Views create
a query for me.

Dec 06 2008

The Eclipse of Sun

If you've been reading this blog for a while you'll know that
for the past decade I've had a love-hate relation ship with Sun.

Sun has done a lot of good things in their open source journey such as MySQL, openOffice , Virtualbox and opensourcing Java. But on the other hand they had troubles with creating really organic opensource.

I've always wondered why a company that bought StarDivision because it was cheaper than buying the MS Office license keeps wasting money on building it's own operating systems, albeit in a pretty open model, but with zero to no relevant community.

Yes I've ranted on their marketing boys and girls when they polluted forums I liked. But no I don't want this to happen. (Slicing up Sun)

However I`m in fear it actually might happen, the funny thing is that I already mentioned the same idea in offline conversations during the last couple of months.

My Advice to Sun. : Stop putting money in things you can co-create with other organisations, if there's only 2 features in your product that count for some people, contribute those 2 features to something bigger.
(That's D-Trace and ZFS and the Linux Kernel I`m talking about in case you didn't realise) Why keep throwing money down the drain ?

Also let Google, Percona and others contribute to MySQL, don't keep it as closed as it is today. As after al it's more Free software than it is Open Source, but it has a great community which I love to be part of .. and I hope for a long time to come..

I`m going to try to keep my mouth shut on this for a while.. as I think Jonathan is listening anyhow. If he was, he would have known that at some point in time when you step out of the shower and there is too much hair dropping from your skull every morning, it's time to cut off the ponytail.. , not opensource it. I did that years ago.. and it's growing much thicker again now.

My hair that is ..

Dec 01 2008

Tab Dump , Drupal Performance

Have to close some tabs .. what more can I say ..

2 interresting articles about Drupal and MySQL performance tuning

here and here


Nov 18 2008

Do we want an Open Source MySQL Monitoring tool ?

Matt Reid wants to know what we want in an Open Source MySQL monitoring solution ?

He is working on the second incarnation of Monolith and wants input from the MySQL community.

Now for me the bigger question is if we want an isolated tool that runs stand alone, or a tool which we can integrate it in something we already have.

To me there is a difference between a tool that I want to use to debug my environment, such as Mytop or MySQL Activity Report, in that case I need some tool that quickly installs with little dependencies and little impact.

On the other side I want a tool that is constantly there, that tells me about trends and performance history. But there I don't want an isolated toool, I want something fully integrated where I can correlate different measurements from disk io, memory usage etc , that tool should also tell me about the things that go

We did some research earlier this year to figure out the current state of Open Source monitoring tools. Different tools have a different audience.. some go for the network layer, others take the os level and other even try to go deep inside the applications.

Given that knowledge we even had the idea to refocus that research comparing different monitoring tools such as Zabbix, Zenoss, Hyperic and Nagios again but this time with a focus on monitoring MySQL and submit that as an abstract for the upcoming MySQL conference, we didn't .. maybe next time.

There's plenty of frameworks already that will allow you to send alerts on all of the occasions you list, or allow you to graph all the values you want. And yes we want to see those values too.

But do we want yet another tool , yet another URL to browse to or do we want those alerts and graphs integrated in an existing tool such as Zabbix, Zenoss or
Hyperic .. I guess I prefer the integrated approach.

Nov 01 2008

The Want to be Social Network

Kaj wrote about using Dopplr to keep track of where colegues and friends are traveling and finding out about accidental meetups. (I even use it to track where one of the Inuits collegues work plans)
Now I have never met Giuseppe , but we shared slides before so I'd love to meet him one day .. connecting to him via Dopplr gives me that opportunity .. who knows one day Dopplr will tell me we have matching travel schedules.

Earlier this week LinkedIn announced Tripit as one of their first integrated applications, I was a bit dissapointed I use Tripit to let Dopplr learn about my trips , but I really prefer Dopplr over Tripit.

Now there are 2 things I would like to acoomplish with these kind of tools.
First of all I don't mind giving all of my LinkedIn contacts, so my whole addresbook, access to my upcoming travel plans so I can meet up with them again if our plans happen to match. On the other hand I use LinkedIn as my addressbook, so it only contains people I've met, or collaborated with (sometimes even over a decade ago) . And there are a bunch of people out there that I haven't met yet but that I'd love to meet one day and buy some beers or even dinner/lunch. I'd love to connect to them on Dopplr to make that happen but I don't want to have them connected on LinkedIn (yet).

So when Dopplr integrates with LinkedIn I hope they think about this kind of scenario ..

Oh and btw.. Xaprb rocks !

Oct 31 2008

How Sun Will be rescued

I probably ranted a bit too much on the marketing push that Sun has been trying to make into the Open Source community.
The economical situation isn't really perfect so Sun does deserve some credit too.

Yesterday Techcrunch published an exclusive interview with Jonathan Schwartz on the future of Sun and how Sun will be rescued .

[YouTube Movie Embedded]

More details are here

Oct 28 2008

Virtualizing MySQL , are you stupid ?

or timebound ? or don't you have any load on your DB at all ?

I personally don't see many reasons to virtualize your database, apart from the , we plan to start small and scale out, or the we need it now and we don't have the hardware yet , putting your database on a virtual platform where you have to share resources with other virtual machines doesn't really sound like a tempting proposition to me. Small, almost idle databases , maybe. But enterprise production level databases no thnx.

Sheeri Cabral also mentions the above reasons .. and there also .. Enterprise Production use isn't listed.

Databases typically require a good amount of memory , and steady disk access.
So if you are in a production environment with a fairly loaded database, would you want a 4Gb machine with full direct memory access, Or 3.5Gb of virtual memory that can be ballooned to 3 if underused. My pick is at the 4Gb real memory.

The original article at Sun argues the use of Virtual Harddisk to move around workloads between different servers or even Virtualization platforms. But it fails to describe the guaranteed performance penalty of not using raw disks but a filesystem on top of a loopback device. How many layers do you want before actually write to the disks. Good practice in a virtual environment is to dedicate full disks or LVM parts to the virtual machine hence lowering the overhead, but most (default) setups do the opposite.

And don't get me started about the myth of using virtualization for high-availability :)

Now can somebody please remove all the clueless marketing people from , thnx. (they can be identified with by a source and posts that mainly talk about Sun products including only a slight hint to MySQL)

(PS. What's a Market Development Engineer's job description anyhow ? , that's just a different name for Marketing Assistant right ?)

Oct 27 2008

T-Dose 2008 is over

It was fun, the good part about T-Dose is that is small enough to actually be able to speak with everybody you want to .. well almost .. there were still some people around I wanted to talk to but I didn't get the chance to . Specially Ber Kessels and Roy Scholten who filled in the gaps in the Drupal track. After my own talk I had to run to the other track so I could answer the tricky questions in our other talk about Open Source Monitoring Tools. And I never really made it back to the Drupal room. So Ber, Roy , next time you run into me I`ll buy you a Beer !

Anyway Pics are up (so Geert now finally has pictures of himself on stage)

Social event pic is also up ..

Slides (Drupal/ Virtualization) were already up

Somehow I had problems seeing al the sessions I wanted to see this year. lots of interesting things happening at the same time and therefore forcing me to choose for specific sessions. JP and Jeroen already announced they will be there again next year .. I just hope to have a better planned Drupal track then ...

T-Dose kind of concluded my current scheduled list of talks , I`m looking for new interesting conference venues to visit .. specially in southern Europe .. so if anybody has ideas :)