Everything is a Freaking DNS problem - monitoringsucks http://127.0.0.1:8080/blog/taxonomy/term/1469/0 en Love, MonitoringLove http://127.0.0.1:8080/blog/love-monitoringlove <p>Last year we were pretty negative about Monitoring, We shouted out that MonitoringSucked ... A year has passed and a lot has changed ... most importantly our new found love for monitoring, thanks to an inspirational Ignite talk by <a href="https://twitter.com/ulfmansson" rel="nofollow">Ulf Mansson</a> at devopsdays Rome.</p> <p>Right after Fosdem about 20 people showed up at the #monitoringlove hacksessions hosted at the <a href="http://www.inuits.eu/" rel="nofollow">Inuits.eu</a> offices to work on Open Source monitoring projects and exchange ideas. Some completely new people, some people with already a lot of experience.</p> <p>Amongst the projects that were worked on was Maciej working on Packaging graphite for Debian, Ohter people were fixing bugs in Puppet , I spent some time with a <a href="https://github.com/krisbuytaert/vagrant-sensu" rel="nofollow">vagrant box</a> to deploy Sensu using Puppet. Last time I was playing with Sensu was on the flight back from PuppetCon , I gave up the fight with<br /> RabbitMQ and SSL because I had no internet connection .. and now Ulf just pointed out that I could disable SSL at all, which resulted in having a POC up and running in no time.</p> <p>Patrick was hacking on the Chef counterpart of the vagrant-puppet sensu setup a part of <a href="https://github.com/monigusto" rel="nofollow">#monigusto</a>. Ulf Mansson was getting <a href="http://shopify.github.com/dashing/" rel="nofollow">dashing</a> to display on a Raspberry Pi ... pretty cool stuff<br /> And Jelle Smet was working on <a href="https://github.com/smetj/pyseps" rel="nofollow">Pyseps</a> a Python based Simple Event Processing Server framework that consume JSON docs from RabbitMQ and forwards them real time to other queues using MongoDB query syntax.</p> <p>One of the more interesting discussion was around the topic of alerting and modeling business rules and input from a lot of different sources<br /> in order to send the right alerts to the right people. </p> <p>We explored different ideas like using BPM tools such as Activity or Rules engines like Ruby Rools. There exist some Saas providers that try to solve this need like PagerDuty and friends but obviously there is still a lot of work that needs to be done in order to create a viable alerting system based on different input sources.</p> <p>The monitoring problem is not solved yet .. and it will stay around for a couple of years .. but with the advent of event such as <a href="http://monitorama.com/" rel="nofollow">Monitorama</a> its clear<br /> that an event like our #monitoring love hackessions is needed .. and is probably here to stay for a couple of years.</p> http://127.0.0.1:8080/blog/love-monitoringlove#comments devops infracoders monitoringlove monitoringsucks puppet sensu Wed, 13 Feb 2013 17:10:32 +0000 Kris Buytaert 1077 at http://127.0.0.1:8080/blog check_graphite http://127.0.0.1:8080/blog/checkgraphite <p>During my Puppetcamp Gent talk last week, I explained how to get alerts based on trends from graphite. A number of people asked ,e how to do that.</p> <p>First lets quickly explain why you might want to do that .<br /> Sometimes you don't care about the current value of a metric..as an example take a Queing system .. there is no problem if there are messages added to the queue, not even if there are a lot of messages on the queue, there might however be a problem if over a certain period the number of messages on a queue stays to high.</p> <p>In this example I`m monitoring the queue length of a hornetq setup which is exposed by JMX.<br /> On the server runnnig HornetQ I have an exported resource that tells the JMXTrans server to send the MessageCount to graphite<br /> (you could also do this using collectd plugins) </p> <p><div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">@@jmxtrans::graphite {&quot;MessageCountMonitor-${::fqdn}&quot;:</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> jmxhost =&gt; hiera('hornetqserver'),</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> jmxport =&gt; &quot;5446&quot;,</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> objtype =&gt; 'org.hornetq:type=Queue,*',</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> attributes =&gt; '&quot;MessageCount&quot;,&quot;MessagesAdded&quot;,&quot;ConsrCount&quot;',</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> resultalias =&gt; &quot;hornetq&quot;,</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> typenames =&gt; &quot;name&quot;,</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> graphitehost =&gt; hiera('graphite'),</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> graphiteport =&gt; &quot;2003&quot;,</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">}</div></li></ol></pre></div></p> <p>This gives me a computable url on which I can get the graphite view </p> <p>The next step then is to configure a nagios check that verifies this data. For that I need to use the <a href="http://github.com/datacratic/check_graphite" rel="nofollow">check_graphite</a> plugin from Datacratic ..</p> <p>Which can work with an nrpe config like<br /> <div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">### File managed with puppet ###</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">### Served by: '&lt;%= scope.lookupvar('::servername') %&gt;'</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">### Module: '&lt;%= scope.to_hash['module_name'] %&gt;'</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">### Template source: '&lt;%= template_source %&gt;'</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">&nbsp;</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">command[check_hornetq]=/usr/lib64/nagios/plugins/check_graphite -u &quot;http://&lt;%= graphitehost%&gt;/render?target=servers.&lt;%= scope.lookupvar('::fqdn').gsub(/\./,'_')%&gt;_5446.hornetq.docstore_private_trigger_notification.MessageCount&amp;from=-30minutes&amp;rawData=true&quot; -w 2000 -c 20000</div></li></ol></pre></div></p> <p>I define this check on the host where HornetQ is running as it then will map to that host on Icinga/Nagios rather than throw a host error on an unrelated host.</p> http://127.0.0.1:8080/blog/checkgraphite#comments graphite icinga monitoringlove monitoringsucks puppet Tue, 05 Feb 2013 09:10:15 +0000 Kris Buytaert 1076 at http://127.0.0.1:8080/blog #monitoringlove hackfest http://127.0.0.1:8080/blog/monitoringlove-hackfest <p>The age of #monitoringsucks is over, we're now transitioning into a #monitoringlove period. </p> <p>That however doesn't mean al the work is done, we still need to do a lot of work and a lot of people are working on a lot of stuff.</p> <p>Therefore like last year we are opening up our offices again right after Fosdem for a #monitoringlove hackfest</p> <p>That's right on february 4 and 5 a bunch of people interrested to fix the problem will be meeting , discussing and hacking stuff together in Antwerp. In short a #monitoringlove hackathon</p> <p>Inuits is opening up their offices for everybody who wants to join the effort Please let us (@KrisBuytaert) know if you want to join us in Antwerp. We'll provide caffeine, wireless, chairs and some snacks. </p> <p>Please register upfront at : <a href="http://monitoringlove2013.eventbrite.com/" rel="nofollow">http://monitoringlove2013.eventbrite.com/</a></p> <p>Obviously if you can't make it to Antwerp you can join the effort on ##monitoringsucks on Freenode or on Twitter.</p> <p>The <a href="http://www.inuits.eu/contact" rel="nofollow">location will be Duboistraat 50 , Antwerp</a><br /> It is about 10 minutes walk from the Antwerp Central Trainstation<br /> Depending on Traffic Antwerp is about half an hour north of Brussels and there are hotels at walking distance from the venue.</p> <p>Plenty of parking space is available on the other side of the Park </p> <p>Read last years report <a href="http://www.krisbuytaert.be/blog/we-didnt-fix-it">http://www.krisbuytaert.be/blog/we-didnt-fix-it</a> to get an idea of what will happen...</p> <p>PS. Yes I`m trying to get another event of the ground the days before Fosdem but I`m still awaiting confirmation of the venue ..</p> http://127.0.0.1:8080/blog/monitoringlove-hackfest#comments fosdem inuits monitoringlove monitoringsucks opensource Tue, 13 Nov 2012 20:54:24 +0000 Kris Buytaert 1074 at http://127.0.0.1:8080/blog Our #monitoringsucks rpm is repository available http://127.0.0.1:8080/blog/our-monitoringsucks-rpm-repository-available <p>Not only our Rubygems Builds have changed, but also my internal #monitoringsucks repository.</p> <p>You might have noticed a variety of vagrant- projects on my github acount</p> <p><a href="http://github.com/KrisBuytaert/vagrant-ganglia" rel="nofollow">http://github.com/KrisBuytaert/vagrant-ganglia </a><br /> <a href="//github.com/KrisBuytaert/vagrant-graphite" rel="nofollow">http://github.com/KrisBuytaert/vagrant-graphite</a><br /> <a href="http://github.com/KrisBuytaert/vagrant-puppet-logstash" rel="nofollow">http://github.com/KrisBuytaert/vagrant-puppet-logstash,</a><br /> Being the #monitoringsucks part of them. All of those Vagrant projects are basically my test setups to play with those new tools.</p> <p>They contain a bunch of puppet modules that install and configure these tools. (Note that they mostly consist of<br /> of git submodules to other puppet module repositories.</p> <p>Given the fact that I also like to have my software cleanly installed from a package, that means that some of these tools had to be packaged, or I had to create a personal / internal repository which had packages from upstream that were hiding on the internet available.</p> <p>I've forked of this repository off the internal Inuits epository so you all can also benefit from these efforts.<br /> (You gotta love pulp :))</p> <p>That means you can now install all of the above mentionned #monitoringsucks tool from our public repo on </p> <p><div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">yumrepo { 'monitoringsucks':</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> baseurl =&gt; 'http://pulp.inuits.eu/pulp/repos/monitoring',</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> descr =&gt; 'MonitoringSuck at Inuits',</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> gpgcheck =&gt; '0',</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">}</div></li></ol></pre></div></p> <p>Patches to both the Vagrant projects and the puppet modules are welcome ...</p> http://127.0.0.1:8080/blog/our-monitoringsucks-rpm-repository-available#comments devops ganglia graphite logstash monitoring monitoringsucks puppet repo vagrant Sat, 11 Aug 2012 19:49:39 +0000 Kris Buytaert 1068 at http://127.0.0.1:8080/blog The ultimate 2012 open source and devops conference http://127.0.0.1:8080/blog/ultimate-2012-open-source-and-devops-conference <p>Kent Skaar pinged me last week , asking for feedback on Lisa'11 and input for Lisa 2012. </p> <p>Thought I should share my advise to him with the rest of the world </p> <p>So If I were to host an event similar to Lisa I'd had either<br /> Jordan Sissel or Mitchell Hashimoto give the keynote because over the past 24 months those people have written more relevant tools for me than anyone else :) </p> <p>I'd have someone talk about Kanban for Operations, There's 2 names that pop up Dominica DeGrandis and Mattias Skarin</p> <p>I'd have the Ubuntu folks talk about JuJu and I'd have RI Pienaar talk about MCollective .. while you have RI have him talk about Hiera too. Have Dean Wilson carry RI's bags and put him unknowingly on a panel. (Masquerade it as a Pub with hidden cameras) </p> <p>Obviously as #monitoringsucks you want to hear about new monitoring tools initiatives and how people are dealing with them , so you want people talking about Graphite, Collectd, Statsd, Sensu , Icinga-MQ And how people are reviving Ganglia and using that in large scale environments.</p> <p>You want someone to demistify Queues, I mean .. who still knows about the differences between Active, Rabbit , Zero, Hornet and many other Q's ?</p> <p>You want people talking about how they deal with logs, so talks about Logstash and Graylog2. </p> <p>You want to cover Test Driven Infrastructure How do you test your infrastructure , someone to demystify Cucumber and Webrat , and talk about testing Charms, Modules, and Cookbooks.</p> <p>Oh and Filesystems , distributed ones the Ceph, FraunhoverFS, Moose, KosmosFS, Glusters, Swifts of this world ... you want people to talk about their experiences , good and bad with any of the above, someone who can actually compare those rather than heresay stuff. :) With recent updates on what's going on in these projects.</p> <p>Now someone please organise this for me :) In a warm and sunny place ... preferably with 27 holes next door , and daycare for my kids :) </p> <p>PS. Yes the absence of any openstack related topic is on purpose .. that's for 2013 :)</p> http://127.0.0.1:8080/blog/ultimate-2012-open-source-and-devops-conference#comments devops monitoringsucks Fri, 10 Feb 2012 22:15:35 +0000 Kris Buytaert 1061 at http://127.0.0.1:8080/blog We didn't fix it http://127.0.0.1:8080/blog/we-didnt-fix-it <p>MonitoringSucks and we didn't fix it.</p> <p>Earlier this week <a href="http://www.inuits.eu">Inuits</a> hosted a 2 day hackfest titled #MonitoringSucks. A good number of people with a variety of backgrounds showed up on monday morning. I don't know why but people had high expectations for this event , did they really expect us to fix the #monitoringsucks problem in a mere 2 days ?</p> <p>Next to myselve we had Patrick Debois , <a href="https://twitter.com/#!/gkarekinian">Grégory Karékinian</a>, <a href="https://twitter.com/#!/sjourdan">Stefan Jourdan</a>, <a href="https://twitter.com/#!/hatofmonkeys">Colin Humphreys</a>, <a href="https://twitter.com/#!/acrmp">Andrew Crump</a>, <a href="https://twitter.com/#!/ohadlevy/">Ohad Levy </a>, Frank Marien, Toshaan Bharvani, Devdas Bhagat, <a href="https://twitter.com/#!/mpasternacki/"> Maciej Pasternacki </a> <a href="https://twitter.com/xtaran">Axel Beckert</a> <a href="https://twitter.com/#!/smetj/">Jelle Smet</a>, Noa Resare @blippie , John John Tedro @udoprog, Christian Trabold @ctrabold and obviously some people I missed<br /> <br /><br /> A good mixture of Fosdem visitors that stayed a litte longer in our cold country and locals with ideas. We had people from TomTom, RedHat , Spotify, Booking.com, Inuits, Atlassian, coming from Belgium, The Netherlands France, Israel, the UK, Sweden, Germany, Poland and Switzerland if I`m not mistaken.</p> <p>The format was pretty open, much of the first day was spend around the drawingboard.</p> <p><img src="http://www.krisbuytaert.be/images/drawingboard.jpg" alt="people around the drawingboard" /><br /> (Ohad Levy, Jelle Smet, PatrickDebois and Frank Marien) discussing a variety of topics </p> <p>This monitoring topic is complex, there are different areas that need to be covered. The drawing below documents how we splitted the problem into different areas , and listed the different tools people use for these areas.</p> <p><img src="http://www.krisbuytaert.be/images/componnents.jpg" /></p> <ul> <li>Collection: Collectd, Nagios, Ganglia </li><li> Transport: XMPP, Smiple, Smtp, 0mq , APMQ, rsyslog, irc, stomp </li><li>Storage : rrd, graphite, opentsdb, hbase, </li><li>Filtering: logstash, esper, </li><li>Visualisation : Graphite, </li><li>Notifcation: PagerDuty </li><li>Reporting: Jasper </li></ul> <p>Obviously above list is far from complete. </p> <p><img src="http://www.krisbuytaert.be/images/morediscussions.jpg" /><br /> <br /><br /> The afternoon discussion continued where we left of before lunch, just after the powercut. Only now we started refocussing on filtering and aggregating values using Logstash<br /> @patrickdebois had been talking about the idea to use Logstash as a way to collect data , transform it and throw it either to another tool, or onto a Queue before.<br /> Looking at Logstash it makes kind of sense. Logstash already has a zillion of input types, filters and outputs. Including popular queues such as amqp and zeromq. Yes, the default behaviour for a lot of people is to get data from different inputs, filter it and then send it to ElasticSearch, but much more is possible with the available outputs.</p> <p><br /><br /> <img src="http://www.krisbuytaert.be/images/logstash.jpg" /><br /> <br /><br /> It was only on tuesday that people really started writing code<br /> So what did really come out of the #monitoringsucks hackfest. ?</p> <p>A couple of people were working on packaging existing tools for their favourite distro. Others were working on integrating a number of other already existing tools (e.g Patrick working on more inputs for Logstash., me working on replacing logster with Logstash, setting up Kibana etc. New tools were learned, items were added to todolists (Kibana, (doesn't work on older Firefox instances) Tattle, statsd) and items were scratched from todolists (Graylog2 (Kibana replaces that as a good Frontend for Logstash) )</p> <p>A lot of experiences with different tools were exchanged</p> <p>Frank Marien showed us a demo of his freshly release <a href="https://extremon.org/">ExtremeMon</a> framework. A really promising project.</p> <p>The sad part about a workshop like this one is that you enter with a bunch of ideas , and leave with even more ideas, hence more work. We haven't solved the problem yet, but a lot of more people are now thiking about the problem and how to solve it a more modulare (unix style) approach. With different litte tools, all being good at something and all being interconnectable.</p> http://127.0.0.1:8080/blog/we-didnt-fix-it#comments devops extrememon monitoringsucks Thu, 09 Feb 2012 23:03:52 +0000 Kris Buytaert 1060 at http://127.0.0.1:8080/blog #monitoringsucks hackathon 6&7 february Practical details: http://127.0.0.1:8080/blog/monitoringsucks-hackathon-67-february-practical-details <p>As announced <a href="http://www.krisbuytaert.be/blog/monitoringsucks-and-well-fix-it">earlier</a> next monday and tuesday we're opening up the Inuits offices for everybody working on monitoring problems.</p> <p>There's already a <a href="https://github.com/monitoringsucks/werefixingit/wiki">good number of people</a> that have confirmed their presence and some people have asked </p> <p>As for practical details .. the plan is simple.<br /> I`m going to be at the place somewhere between 8:30 and 9:00 on monday. ( Hey .. it's the day after Fosdem you know :)) </p> <p>The only thing I've planned is to do a get to know eachother round around 10:30 after that I`m expecting the hackathon to be self organising, </p> <p>There will be water, coffee , etc , IP connectivity, and electricity. </p> <p>The location is still Duboisstraat 50, Antwerp</p> <p><img src="http://www.inuits.eu/sites/default/files/contact_inuitsmap.png" /> </p> <p>Free parking is on the Hardenvoort or Kempenstraat ( 3minutes walk) , paid parking right in front of the door.</p> http://127.0.0.1:8080/blog/monitoringsucks-hackathon-67-february-practical-details#comments devops monitoringsucks Wed, 01 Feb 2012 07:48:55 +0000 Kris Buytaert 1059 at http://127.0.0.1:8080/blog Graphite, JMXTrans, Ganglia, Logster, Collectd, say what ? http://127.0.0.1:8080/blog/graphite-jmxtrans-ganglia-logster-collectd-say-what <p>Given that @patrickdebois is working on improving data collection I thought it would be a good idea to describe the setup I currently have hacked together.</p> <p>(Something which can be used as a starting point to improve stuff, and I have to write documentation anyhow) </p> <p>I currently have 3 sources , and one target, which will eventually expand to at least another target and most probably more sources too.</p> <p><img src="http://www.krisbuytaert.be/images/VagrantGraphite.jpg" /></p> <p>The 3 sources are basically typical system data which I collect using collectd, However I`m using collectd-carbon from <a href="https://github.com/indygreg/collectd-carbon.git" title="https://github.com/indygreg/collectd-carbon.git">https://github.com/indygreg/collectd-carbon.git</a> to send data to Graphite.</p> <p>I`m parsing the Apache and Tomcat logfiles with logster , currently sending them only to Graphite, but logster has an option to send them to Ganglia too.</p> <p>And I`m using JMXTrans to collect JMX data from Java apps that have this data exposed and send it to Graphite. (JMXTrans also comes with a Ganglia target option) </p> <p>Rather than going in depth over the config it's probably easier to point to a Vagrant box I build <a href="https://github.com/KrisBuytaert/vagrant-graphite" title="https://github.com/KrisBuytaert/vagrant-graphite">https://github.com/KrisBuytaert/vagrant-graphite</a> which brings up a machine that does pretty much all of this on localhost.</p> <p>Obviously it's still a work in progress and lots of classes will need to be parametrized and cleaned up. But it's a working setup, and not just on my machine .. </p> http://127.0.0.1:8080/blog/graphite-jmxtrans-ganglia-logster-collectd-say-what#comments collectd devops ganglia graphite jmxtrans logster monitoringsucks Tue, 03 Jan 2012 20:46:47 +0000 Kris Buytaert 1058 at http://127.0.0.1:8080/blog #monitoringsucks and we'll fix it ! http://127.0.0.1:8080/blog/monitoringsucks-and-well-fix-it <p>If you are hacking on monitoring solutions, and want to talk to your peers solving the problem<br /> Block the monday and tuesday after fosdem in your calendar !</p> <p>That's right on february 6 and 7 a bunch of people interrested to fix the problem will be meeting , discussing and hacking stuff together in Antwerp</p> <p>In short a #monitoringsucks hackathon</p> <p>Inuits is opening up their offices for everybody who wants to join the effort Please let us (@KrisBuytaert and @patrickdebois) know if you want to join us in Antwerp</p> <p>Obviously if you can't make it to Antwerp you can join the effort on ##monitoringsucks on Freenode or on Twitter.</p> <p>The <a href="http://www.inuits.eu/contact" rel="nofollow">location will be Duboistraat 50 , Antwerp</a><br /> It is about 10 minutes walk from the Antwerp Central Trainstation<br /> Depending on Traffic Antwerp is about half an hour north of Brussels and there are hotels at walking distance from the venue.</p> <p>Plenty of parking space is available on the other side of the Park</p> http://127.0.0.1:8080/blog/monitoringsucks-and-well-fix-it#comments collectd devops ganglia graphite icinga monitoring monitoringsucks munin nagios rrd Tue, 03 Jan 2012 18:23:00 +0000 Kris Buytaert 1057 at http://127.0.0.1:8080/blog