Kris Buytaert's blog

Jan 07 2016

Bimodal IT , redefined

There's been a lot of discussion about the sillynes of the term BiModal IT, aka the next big thing for IT-organisations that don't dare to change, but still want to sound cool.

So here is my idea to reuse that term for something relevant.

BiModal IT, is the idea where you take a fully automated infrastructure which has been build on the principles of Infrastructure as Code. Which gets periodic idempodent updates (e.g every 15 or 30 minutes, or when you orchestrate it), and consistency checks , where the source code for that infrastructure is versioned , tested and delivered trough a traditional Continuous Delivery pipeline for the majority of your services. and add realtime reconfiguration capacities based on service discovery for the other services that really change fast, or in a real elastic way, using tools like Consul, Consul_template, etcd etc..

That way you have 2 modes of managing your infrastructure, aka BiModal

Jul 28 2015

The power of packaging software, package all the things

Software delivery is hard, plenty of people all over this planet are struggling with delivering software in their own controlled environment. They have invented great patterns that will build an artifact, then do some magic and the application is up and running.

When talking about continuous delivery, people invariably discus their delivery pipeline and the different components that need to be in that pipeline.
Often, the focus on getting the application deployed or upgraded from that pipeline is so strong that teams
forget how to deploy their environment from scratch.

After running a number of tests on the code , compiling it where needed, people want to move forward quickly and deploy their release artifact on an actual platform.
This deployment is typically via a file upload or a checkout from a source-control tool from the dedicated computer on which the application resides.
Sometimes, dedicated tools are integrated to simulate what a developer would do manually on a computer to get the application running. Copy three files left, one right, and make sure you restart the service. Although this is obviously already a large improvement over people manually pasting commands from a 42 page run book, it doesn’t solve all problems.

Like the guy who quickly makes a change on the production server, never to commit the change, (say goodbye to git pull for your upgrade process)
If you package your software there are a couple of things you get for free from your packaging system.
Questions like, has this file been modified since I deployed it, where did this file come from, when was it deployed,
what version of software X do I have running on all my servers, are easily answered by the same
tools we use already for every other package on the system. Not only can you use existing tools you are also using tools that are well known by your ops team and that they
already use for every other piece of software on your system.

If your build process creates a package and uploads it to a package repository which is available for the hosts in the environment you want to deploy to, there is no need anymore for
a script that copies the artifact from a 3rd party location , and even less for that 42 page text document which never gets updated and still tells you to download yaja.3.1.9.war from a location where you can only find
3.2 and 3.1.8 and the developer that knows if you can use 3.2 or why 3.1.9 got removed just left for the long weekend.

Another, and maybe even more important thing, is the current sadly growing practice of having yet another tool in place that translates that 42 page text document to a bunch of shell scripts created from a drag and drop interface, typically that "deploy tool" is even triggered from within the pipeline. Apart from the fact that it usually stimulates a pattern of non reusable code, distributing even more ssh keys , or adding yet another agent on all systems. it doesn’t take into account that you want to think of your servers as cattle and be able to deploy new instances of your application fast.
Do you really want to deploy your five new nodes on AWS with a full Apache stack ready for production, then reconfigure your load balancers only to figure out that someone needs to go click in your continuous integration tool or deployment to deploy the application to the new hosts? That one manual action someone forgets?
Imvho Deployment tools are a phase in the maturity process of a product team.. yes it's a step up from manually deploying software but it creates more and other problems , once your team grows in maturity refactoring out that tool is trivial.

The obvious and trivial approach to this problem, and it comes with even more benefits. is called packaging. When you package your artifacts as operating system (e.g., .deb or .rpm) packages,
you can include that package in the list of packages to be deployed at installation time (via Kickstart or debootstrap). Similarly, when your configuration management tool
(e.g., Puppet or Chef) provisions the computer, you can specify which version of the application you want to have deployed by default.

So, when you’re designing how you want to deploy your application, think about deploying new instances or deploying to existing setups (or rather, upgrading your application).
Doing so will make life so much easier when you want to deploy a new batch of servers.

May 11 2015

On the importance of idempotence.

A couple of months ago we were seeing weird behaviour with consul not knowing all it's members at a customer where we had deployed Consul for service registration as a POC
The first couple of weeks we hadn't noticed any difficulties but after a while we had the impression that the number of nodes in the cluster wasn't stable.

Obviously the first thought is that such a new tool probably isn't stable enough so it's expected behaviour , but rest asured that was not the case.

We set out to frequently monitor the number of nodes
a simple cron to create a graph.

  1. NOW=`date +%s`
  2. HOST=`hostname -f`
  3. MEMBERS=`/usr/local/bin/consul members | wc -l`
  4.  
  5. echo "consul_members.$HOST $MEMBERS $NOW" | graphite 2003

It didn't take us very long to see that indeed the number members in the cluster wasn't stable, frequently there were less nodes in a cluster then slowly the expected number of nodes came back on our graph.

Some digging learned us that the changes in number of nodes was in sync with our puppetruns.
But we weren't reconfiguring consul anymore, there were no changes in the configuration of our nodes.
Yet puppet triggered a restart of consul on every run. The restart was because knew it had rewritten the consul config file.
Which was weird as the values in that file were the same.

On closer inspection we noticed that the values in the file didn't change, however the order of the values in the file
changed. From a functional point of view that did not introduce any changes, but puppet rightfully assumed the configuration file
had changed and thus restarted the service dutyfully.

The actually problem lied in the implementation of the writing of the config file which was in JSON,
The ancient Ruby library just took the hash and wrote it in no specific order, each time potentially resulting
in a file with the content in a different order.

A bug fix to the puppet module made sure that the hash was written out in a sorted way , so each time resulting in the
same file being generated.

After that bugfix obviously our graph of the number of nodes in the cluster flatlined as restarts were not being introduced anymore.

This is yet another example of the importance of idempotence . When we trigger a configuration run , we want to
be absolutely sure that it won't change the state of the system if it already has been defined the way we want.
Rewriting the config file should only happen if it gets new content.

The yak is shaved .. and sometimes it's not a funky dns problem but just a legacy ruby library one ..

May 03 2015

What done REALLY looks like in devops

Steve Ropa blogged about What done looks like in devops , I must say I respecfullly , but fully disagree with Steve here.

For those of you that remember I gave an Ignite about my views on the use of the Definition of Done back ad #deovpsdays 2013 in Amsterdam.

In the early days we talked about the #devops movement partly being a reaction against the late friday night deployments where the ops people got a tarball with some minimalistic notes and were supposed to put stuff in production. The work of the development team was Done, but the operations team work just started.

Things have improved .. like Steve mentions for a lot of teams done now means that that their software is deployable, that we have metrics from them, that we can monitor the application.

But lets face it .. even if all of that is in place there is still going to be maintenance, security fixes, major stack upgrades, minor application changes, we all still need to keep the delivery pipelines running.

A security patch on an appliction stack means that both the ops and the developers need to figure out the required changes together.

Building and delivering value to your end users is something that never ends, we are never actually done.

So let me repeat ,

"Done is when your last enduser is in his grave"
In other words, when the application is decomissioned.

And that is the shared responsability mindset devops really brings, everybody is caring about the value they are bringing to their customers, both developers and operations people. Thinking about keeping the application running. And not assuming that because a list of requirements have been validated at the end of a sprint we are done. Because we never are...

BTW. Here's my original slides for that #devopsdays Amsterdam talk.


Feb 09 2015

2014 vs 2015 interest in Open Source Configuration Management

A couple of people asked me to results of the survey of the 2015 vs 2014 Configuration Management Camp room interrests.

This is a bunch of 350 last year and 420 people telling us what tools they are interested in so we can map the right roomsizes to the communities.

2014 :

2015:

Enjoy.. but remember there's Lies, Damn Lies and Statistics ..
PS. this is a mostly European Audience .

Sep 20 2014

On Systemd and devops

If it's not broken , don't fix it.
Those who don't understand Unix are doomed to reinvent it, poorly
Complexity is the enemy of reliability.

Are some of the more frequently heard arguments in the systemd discussion. Indeed I see and hear a lot of senior Linux people react openly and probably way to late against the introduction of systemd in a lot of our favorite Linux distributions.

To me this is a typical example of the devops gap. The gap between developers writing code and operations needing to manage that code on production platforms at scale.
Often developers writing code that they think is useful and relevant while they are not listening to their target audience , in this case not the end users of the systems but the people that are maintaining the platforms. The people that work on a daily base with these tools.

I have had numerous conversations with people in favor and against systemd, till today I have not found a single general purpose use case that could convince me of the relevance of this large change in our platforms. I've found edge cases where it might be relevant. but not mainstream ones. I've also seen much more people against it than in favor. I've invited speakers to conference to come and teach me. I've probably spoken to the wrong people,

But this is not supposed to be yet another systemd rant.. I want to tackle a bigger problem. The problem that this change and some others have been forced upon us by distributions that should be open, and listen to their users, apparently both Debian and Fedora/RHEL failed largely but somehow fail to listen to their respective communities. Yes we know that e.g Fedora is the development platform and acts as a preview of what might come up in RHEL and thus CentOS later , but not everything eventually ends up in RHEL. So it's not like we didn't have an 'acceptance' platform where we could play with the new technology. The main problem here is that we had no simple way to stop the pipeline, it really feels like that long ago Friday evening rush deploy. Not like a good conversation between developers and actual ops on the benefits and problems of implementing these changes. This feels like the developers of the distributions deciding what goes in from their own little silo and voting in 'private' committee.

It also feels like the ops people being to busy to react, "Someone else will respond to this change, it's trivial this change is wrong , someone else will block this for sure",

And the fact that indeed Operating System developers, like Fedora and Debian friends kinda live in their own silo. (specifically not listing CentOS here..)

So my bigger question is .. how do we prevent this from happening again.. how do we make sure that distributions actually listen to their core users and not just the distribution developers.

Rest assured, Systemd is not the only case with this problem .. there's plenty of cases where features that were used by people, sometimes even the something people considered the core feature of a project got changed or even got ripped out by the developers because they didn't realize they were being used, sometimes almost killing that Open Source project by accident.
And I don't want that to happen to some of my favourite open source projects ..

Aug 12 2014

Upcoming Conferences

After not being able to give my planned Ignite at #devopsdays Amsterdam because I was down with the flu here's some fresh opportunities to listen to my rants :)

In September I`ll be talking at PuppetConf 2014, San Francisco , USA about some of the horror stories we went trough over the past couple of years when deploying infrastructure the automated fasion.

Just one week later I`ll be opening the #devops track at DrupalCon Amsterdam together with @cyberswat (Kevin Bridges) where we'll talk about the current state of #drupal and #devops , We'll be reopening the #drupal and #devops survey shortly, more info about that later here..

Just a couple of weeks later I will be ranting about Packaging software on Linux at LinuxConf Europe in Dusseldorf, Germany

And in November , I`m headed to Nuremberg, Germany where I will be opening the Open Source Monitoring Conference tinkering about the current state of Open Source Monitoring, do we love it .. or does it still suck :)

That's all ..
for now ..

Aug 12 2014

Fedora 20 Annoyancies

  1. yum install docker
  2.  
  3. docker run --rm -i -t -e "FLAPJACK_BUILD_REF=6ba5794" \
  4. > -e "FLAPJACK_PACKAGE_VERSION=1.0.0~rc3~20140729T232100-6ba5794-1" \
  5. > flapjack/omnibus-ubuntu bash -c \
  6. > "cd omnibus-flapjack ; \
  7. > git pull ; \
  8. > bundle install --binstubs ; \
  9. > bin/omnibus build --log-level=info flapjack ; \
  10. > bash"
  11. docker - version 1.5
  12. Copyright 2003, Ben Jansens <ben@orodu.net>
  13.  
  14. Usage: docker [OPTIONS]
  15.  
  16. Options:
  17. -help Show this help.
  18. -display DISLPAY The X display to connect to.
  19. -border The width of the border to put around the
  20. system tray icons. Defaults to 1.
  21. -vertical Line up the icons vertically. Defaults to
  22. horizontally.
  23. -wmaker WindowMaker mode. This makes docker a
  24. fixed size (64x64) to appear nicely in
  25. in WindowMaker.
  26. Note: In this mode, you have a fixed
  27. number of icons that docker can hold.
  28. -iconsize SIZE The size (width and height) to display
  29. icons as in the system tray. Defaults to
  30. 24.
  31.  
  32.  
  33. [root@mine ~]# rpm -qf /bin/docker
  34. docker-1.5-10.fc20.x86_64
  35. [root@mine ~]# rpm -qi docker
  36. Name : docker
  37. Version : 1.5
  38. Release : 10.fc20
  39. Architecture: x86_64
  40. Install Date: Sun 03 Aug 2014 12:41:57 PM CEST
  41. Group : User Interface/X
  42. Size : 40691
  43. License : GPL+
  44. Signature : RSA/SHA256, Wed 07 Aug 2013 10:02:56 AM CEST, Key ID 2eb161fa246110c1
  45. Source RPM : docker-1.5-10.fc20.src.rpm
  46. Build Date : Sat 03 Aug 2013 10:30:23 AM CEST
  47. Build Host : buildvm-07.phx2.fedoraproject.org
  48. Relocations : (not relocatable)
  49. Packager : Fedora Project
  50. Vendor : Fedora Project
  51. URL : <a href="http://icculus.org/openbox/2/docker/<br />
  52. Summary" title="http://icculus.org/openbox/2/docker/<br />
  53. Summary">http://icculus.org/openbox/2/docker/<br />
  54. Summary</a> : KDE and GNOME2 system tray replacement docking application
  55. Description :
  56. Docker is a docking application (WindowMaker dock app) which acts as a system
  57. tray for KDE and GNOME2. It can be used to replace the panel in either
  58. environment, allowing you to have a system tray without running the KDE/GNOME
  59. panel or environment.
  60. [root@mine ~]# yum remove docker
  61. [root@mine ~]# yum install docker-io
  62.  
  63. Installed:
  64. docker-io.x86_64 0:1.0.0-9.fc20
  65.  
  66. Complete!

Aug 12 2014

Ubuntu 14.04 on an Dell XPS 15 9530

So I had the chance to unbox and install a fresh Dell XPS-15 9530
Dell ships the XPS13's with Ubuntu, but for some crazy reason it does not do the same with the XPS15 which imvho is much more
appropriate for a developer as it actually has a usable size screen.

That means they also force you to boot into some proprietary OS the very first time before you can even try get into the boot options menu.
which is where you need to be.

The goal was to get a fresh Ubuntu on the box , nothing more.

There's a large number of tips scattered on the internet such as

- Use the USB 2 connector (that's the one furthest to the back on the right side of the device)
- Disable secure booting
- Put the box in Legacy mode
- Do not enable external repositories, and do not install updates during the installation.

The combination of all of them worked. I booted form USB and 20 minutes later stuff worked. But it really was after trying them all ..

Profit.

wireless works, touchscreen works, touchpad works, hdmi works,

Screen resolution is awesome 3000x1940

PS. For those who wonder .. I`m still on Fedora 20 :)

Jun 04 2014

Jenkins, Puppet, Graphite, Logstash and YOU

This is a repost of an article I wrote for the Acquia Blog some time ago.

As mentioned before, devops can be summarized by talking about culture, automation, monitoring metrics and sharing. Although devops is not about tooling, there are a number of open source tools out there that will be able to help you achieve your goals. Some of those tools will also enable better communication between your development and operations teams.

When we talk about Continuous Integration and Continuous Deployment we need a number of tools to help us there. We need to be able to build reproducible artifacts which we can test. And we need a reproducible infrastructure which we can manage in a fast and sane way. To do that we need a Continuous Integration framework like Jenkins.

Formerly known as Hudson, Jenkins has been around for a while. The open source project was initially very popular in the Java community but has now gained popularity in different environments. Jenkins allows you to create reproducible Build and Test scenarios and perform reporting on those. It will provide you with a uniform and managed way to , Build, Test, Release and Trigger the deployment of new Artifacts, both traditional software and infrastructure as code-based projects. Jenkins has a vibrant community that builds new plugins for the tool in different kinds of languages. People use it to build their deployment pipelines, automatically check out new versions of the source code, syntax test it and style test it. If needed, users can compile the software, triggering unit tests, uploading a tested artifact into a repository so it is ready to be deployed on a new platform level.

Jenkins then can trigger an automated way to deploy the tested software on its new target platform. Whether that be development, testing, user acceptance or production is just a parameter. Deployment should not be something we try first in production, it should be done the same on all platforms. The deltas between these platforms should be managed using a configuration management tool such as Puppet, Chef or friends.

In a way this means that Infrastructure as code is a testing dependency, as you also want to be able to deploy a platform to exactly the same state as it was before you ran your tests, so that you can compare the test results of your test runs and make sure they are correct. This means you need to be able to control the starting point of your test and tools like Puppet and Chef can help you here. Which tool you use is the least important part of the discussion, as the important part is that you adopt one of the tools and start treating your infrastructure the same way as you treat your code base: as a tested, stable, reproducible piece of software that you can deploy over and over in a predictable fashion.

Configuration management tools such as Puppet, Chef, CFengine are just a part of the ecosystem and integration with Orchestration and monitoring tools is needed as you want feedback on how your platform is behaving after the changes have been introduced. Lots of people measure the impact of a new deploy, and then we obviously move to the M part of CAMS.

There, Graphite is one of the most popular tools to store metrics. Plenty of other tools in the same area tried to go where Graphite is going , but both on flexibility, scalability and ease of use, not many tools allow developers and operations people to build dashboards for any metric they can think of in a matter of seconds.

Just sending a keyword, a timestamp and a value to the Graphite platform provides you with a large choice of actions that can be done with that metric. You can graph it, transform it, or even set an alert on it. Graphite takes out the complexity of similar tools together with an easy to use API for developers so they can integrate their own self service metrics into dashboards to be used by everyone.

One last tool that deserves our attention is Logstash. Initially just a tool to aggregate, index and search the log files of our platform, it is sometimes a huge missed source of relevant information about how our applications behave.. Logstash and it's Kibana+ElasticSearch ecosystem are now quickly evolving into a real time analytics platform. Implementing the Collect, Ship+Transform, Store and Display pattern we see emerge a lot in the #monitoringlove community. Logstash now allows us to turn boring old logfiles that people only started searching upon failure into valuable information that is being used by product owners and business manager to learn from on the behavior of their users.

Together with the Graphite-based dashboards we mentioned above, these tools help people start sharing their information and communicate better. When thinking about these tools, think about what you are doing, what goals you are trying to reach and where you need to improve. Because after all, devops is not solving a technical problem, it's trying to solve a business problem and bringing better value to the end user at a more sustainable pace. And in that way the biggest tool we need to use is YOU, as the person who enables communication.