Category Archives: DevOps

“You are not done, when it works”

The title is a quote from Robert C. Martin during a conference talk on clean code. For a long time now (I am getting old) I have followed this approach and it has produced remarkable results. For a bit more than four years I had been responsible for a corporate integration platform. I had built and run it following DevOps principles so that everything was fully automated. No manual maintenance work had been necessary at all, log files were archived, deleted after their retention period etc. This had freed up a lot of time. Time which I used to keep my codebase clean.

Particular focus was put onto the structure. And that had been a really tough job. Much tougher than I anticipated, to be quite frank. But I had paid off. Instead of writing a lot of conventional documentation, the well thought-out structure allowed me to find stuff in a completely intuitive way. Because, believe me, six months after you have written something, you do not remember much about it. But if you need a certain kind of functionality and do not only find the corresponding module immediately, but also from a first look make a correct guess how the parameters are meant to be used, that is truly rewarding.

Having experienced this first hand has greatly influenced my work since then. And I am more convinced then ever, that this is not beautifying for the sake of it. But instead it is a mandatory requirement and a prerequisite for business agility.

DevOps and Ownership

“You build it, you run it” has been my mantra for many years now. A number of times I was approached by management and they asked who should be operating stuff that I had built. Because, allegedly, my time was too precious for doing such a mundane task like operations.

This is to all managers: Operations is neither mundane nor something for junior staff. It is in fact exactly the opposite. Operations is what keeps the organization alive. Operations is where the best people should be, because here the rubber (the developed software) hits the road. Operations is your last line of defense, when (not if) something goes catastrophically wrong. Operations is a key influencing factor on your organization’s ROI. Operations determines your ability to be agile on the market. Operations is key for customer satisfaction. I could go on and on, but likely you long got my point.

Of course there are some aspects to operations that, when things are done the wrong way, are repetitive and far from challenging. But that should mostly be behind us. Yes, in the 1960s we had people who did nothing but enter data. And until not too long ago a lot of operations was just ticking off check boxes on a to-do list. But with things like infrastructure as code (see my recent post on starting with Chef Infra Server), this should really be something from the past. What you need today are people who take pride in running a lean, highly automated, highly resilient IT organization.

And that is where it should be clear to everybody, that DevOps is much more about organization, knowledge, and collaboration beyond traditional “borders”, than about technology.

By the way: My response to management about who should run my stuff, has always been “me”. Because the applications were built to be as maintenance-free as possible. Only the occasional support ticket had to be answered and with proper logging/auditing that is nothing that takes a lot of time. And fixing the occasional bug was not a big deal either, thanks to Clean Code and test-automation.

This allowed me to support 6 business-critical applications as a “side-project”, i.e. no time was officially allocated. Comparable applications operated by other departments had at least three people full-time for support only.

Getting Started with Chef Infra Server

A while ago Chef Software announced that they would move all source code to the Apache 2.0 license (see announcement for details), which is something I welcome. Not so much welcomed by many was the fact that they also announced to stop “free binary distributions”. In the past you could freely download and use the core parts of their offering, if that was sufficient for your needs. What upset many people was that the heads-up period for this change was rather short and many answers were left open. It also did not help that naturally their web site held many references to the old model, so people were confused.

In the meantime it seems that Chef has loosened their position on binary distributions a bit. There is now a number of binaries that are available under the Apache 2.0 license and they can be found here. This means that you can use Chef freely, if you are willing to compromise on some features. Thanks a lot for this!

This post will describe what I did to set up a fresh Chef environment with only freely available parts. You need just two things to get started with Chef: the server and the administration & development kit. The latter goes by the name of ChefDK and can be installed on all machines on which development and administration work happens. It comes with various command line tools that allow you to perform the tasks needed.

Interestingly, you will find almost no references to ChefDK on the official web pages. Instead its successor “Chef Workstation” will be positioned as the tool to use. There is only one slight problem here: The latest free version is pretty old (v0.4.2) and did not work for me, as well as various other people. That was when I decided to download the latest free version of ChefDK and give it a try. It worked immediately and since I had not needed any of the additional features that come with Chef Workstation, I never looked back.

No GUI is part of those free components. Of course Chef offer such a GUI (web-based) which is named Chef Management Console. It is basically a wrapper over the server’s REST API. Unfortunately the Management Console is “free” only up to 25 nodes. For that reason, but also because its functionality is somewhat limited compared to the command line tools, I decided to not cover it here.

Please check the licenses by yourself, when you follow the instructions below. It is solely your own responsibility to ensure compliance.

Below you will find a description of what I did to get things up and running. If you have a different environment (e.g. use Ubuntu instead of CentOS) you will need to check the details for your needs. But overall the approach should stay the same.

Environment

The environment I will use looks like this

  • Chef server: Linux VM with CentOS 7 64 bit (minimal selection of programs)
  • Chef client 1: Linux VM like for Chef server
  • Development and administration: Windows 10 Pro 64bit (v1909)

I am not sure yet whether I will expand this in the future. If you are interested, please drop a comment below.

Please check that your system meets the prerequisites for running Chef server.

Component Versions

The download is a bit tricky, since we don’t want to end up with something that falls under a commercial license. As of this writing (April 2020) the following component binaries are the latest that come under an Apache 2.0 license. I verified the latter by clicking at “License Information” underneath each of the binaries that I plan to use.

  • Chef Infra Server: v12.19.31 (go here to check for changes)
  • Chef DK: 3.13.1 (go here to check for changes)

As to the download method Chef offer various methods. Typically I would recommend to use the package manager of your Linux distribution, but this will likely cause issues from a license perspective sooner or later.

Server Installation and Initial Setup

So what we will do instead is perform a manual download by executing the following steps (they are a sub-set of the official steps and all I needed to do on my system):

  • All steps below assume that you are logged in as root on your designated Chef server. If you use sudo, please adjust accordingly.
  • Ensure required programs are installed
    yum install -y curl wget
  • Open ports 80 and 443 in the firwall
    firewall-cmd --permanent --zone public --add-service http && firewall-cmd --permanent --zone public --add-service https && firewall-cmd --reload
  • Disable SELinux
    setenforce Permissive
  • Download install script from Chef (more information here)
    curl -L https://omnitruck.chef.io/install.sh > chef-install.sh
  • Make install script executable
    chmod 755 chef-install.sh
  • Download and install Chef server binary package: The RPM will end up somewhere in /tmp and be installed automatically for you. This will take a while (the download size is around 243 MB), depending on your Internet connection’s bandwidth.
    ./chef-install.sh -P chef-server -v "12.19.31"
  • Perform initial setup and start all necessary components, this will take quite a while
    chef-server-ctl reconfigure
  • Create admin user
    chef-server-ctl user-create USERNAME FIRSTNAME LASTNAME EMAIL 'PASSWORD' --filename USERNAME.pem
  • Create organization
    chef-server-ctl org-create ORG_SHORT_NAME 'Org Full Name' --association-user USERNAME --filename ORG_SHORT_NAME-validator.pem
  • Copy both certificates (USERNAME.pem and ORG_SHORT_NAME-validator.pem) to your Windows machine. I use FileZilla (installers without bloatware can be found here) for such cases.
ChefDK Installation and Initial Setup

What I describe below is a condensed version of what worked for me. More details can be found on the official web pages.

  • I use $HOME in the context below to refer to the user’s home directory on the Windows machine.  You must manually translate it to the correct value (e.g. C:\Users\chris in my case).
  • Download the latest free version of ChefDK for Windows 10 from here and install it
  • Check success of installation by running the following command from a command prompt:
    chef -v
  • Create directory and base version of configuration file for connectivity by running
    knife configure (it may look like it hangs, just give it some time)
  • Copy USERNAME.pem and SHORTNAME-validator.pem to $HOME/.chef
  • Add your server’s certificate (self-signed!) to the list of trusted certificates with
    knife ssl fetch
  • Verify that things work by executing knife environment list, it should return _default as the only existing environment
  • The generated configuration file was named $HOME/.chef/credentials in my case and I decided to rename it config.rb (which is the new name in the official documentation) and also update the contents:
    • Remove the line with [default] at the beginning which seemed to cause issues
    • Add knife[:editor] = '"C:\Program Files\Notepad++\notepad++.exe" -nosession -multiInst' as the Windows equivalent of setting the EDITOR environment variable on Linux.
Initial Project

We will  create a very simple project here

  • Go into the directory where you want all your Chef development work to reside (I use $HOME/src; the comment regarding the use of $HOME from above still applies) and open a command prompt
  • Create a new Chef repo (where all development files live)
    chef generate repo chef-repo (chef-repo is the name, you can of course change that)
  • You will see that  a new directory ($HOME/src/chef-repo) has been created with a number of files in it. Among them is  ./cookbooks/example , which we will upload as a first test. Cookbooks are where instructions are stored in Chef.
  • To be able to upload it, the cookbook path must be configured, so you need to add to $HOME/.chef/config.rb the following line:
           cookbook_path   ["$HOME/src/chef-repo/cookbooks"]
    (example:  cookbook_path ["c:/Users/chris/src/chef-repo/cookbooks"])
  • You can now upload the cookbook via knife cookbook upload example
Client Setup

In order to have the cookbook executed you must now add it to the recipe list (they take the cooking theme seriously at Chef) of the machines, where you want it to run. But first you must bootstrap this machine for Chef.

  • The bootstrap happens with the following command (I recommend to check all possible options by via knife bootstrap --help) executed on your Windows machine :
    knife bootstrap MACHINE_FQDN --node-name MACHINE_NAME_IN_CHEF --ssh-user root --ssh-password ROOT_PASSWORD
  • You can now add the recipe to the client’s run-list for execution:
       knife node run_list add MACHINE_NAME_IN_CHEF example
    and should get a message similar to
      MACHINE_NAME_IN_CHEF :
        run_list:
          recipe[example]
  • You can now check the execution by logging into your client and execute chef-client as root.  It will also be executed about every 30 minutes or so. But checking the result directly is always a good idea after you changed something.

Congratulation, you can now maintain your machines in a fully automated fashion!

Ignoring Warnings in Log Files

I know that many people think along the lines of ignoring warnings in log files. But I am responsible for an environment where we  run applications that are sort-of business critical. So I need to take warnings seriously, especially if their wording is not really clear. The consequence is that any unnecessary warning causes operational problems. Because who can tell me, kind-of “written in blood”, that I can absolutely always and forever ignore this warning? Only in that case could I consider adding an exception to the log monitoring system and of course to the system documentation, the operations manual, etc. So for me, and from my consulting past I know many customer think the same, this is not just a small nuisance but a real issue.

On the other hand I have had many discussions with people who told me that I could just ignore this or that entry. In many cases it turned out after some discussion, that the log level was actually chosen badly and INFO would have been more appropriate. In that respect the semantics of the commonly used log levels deserve a closer look. Here are two good links (link 1, link2) for definitions. When I first read them, my initial thought was that I might have overreached with the first paragraph of this post. But looking at the example from link 1 about WARN a bit closer, I think my concerns are still valid.

So what can be done? Reality is that you rarely have the ability to get a log statement changed. So you do need a scalable approach to deal with log messages that you consciously choose to ignore. It involves primarily two things: Firstly, you need to have documentation why the decision was made that a given log message is not critical. Secondly, there should be an automated link with your log file monitoring system, that configures an exception in it. Depending on your business this whole area might also be regulated, so the legal side may very well play a role as well. But that is outside the scope of this post.

I know this post is not really actionable, but still wanted to share my thoughts.

 

Starting with Continuous Integration

In this post I look at things to consider when an organization wants to introduce Continuous Integration (CI). As in so many other situations the non-technical challenges are more difficult to solve than some nitty-gritty details.

Start Small Right Now

If ever there was a place for the proverb “the better is the enemy of the good” it is here. Waiting days, weeks, or months because you have not sorted out all details is the worst you can do. Instead you should start immediately by just installing a CI server (Jenkins is the de facto standard) and set up a simple job that does nothing but check out the source code from the VCS and compile it.

More advanced stuff like test automation, setting up delivery pipelines, integration with binary repositories like Artifactory or Nexus is not needed in the beginning.

Agile Automatically?

Most development teams that have not used CI so far are probably operating in a more or less non-agile fashion. That is fine and can stay as it is! Because while CI is virtually a prerequisite for agile development, that does absolutely not mean that teams following a waterfall model will not benefit considerably from CI.

So establishing CI can but does not have to be the first step of moving towards agile development. In fact I would argue that introducing CI is a large-enough step for an existing development organization. Only when this has been “digested”, you should think about moving towards agile. Otherwise too many things would be changed in parallel, similar to combining a new release of your own software with an upgrade of the underlying platform, e.g. the database server.

Frequency of Builds

This is the only part where I strongly recommend that you start at full throttle. What I mean by that is that you resist the temptation to run your builds only once a day or even less frequently. Ideally, every commit into the VCS triggers a build via a post-commit hook (here is more information for Git and Subversion). But polling the VCS every e.g. 10 minutes is a good-enough approximation in most cases. And it is also a little bit easier to set up when you just start on the whole topic.

Why am I so adamant on this particular point? I think that almost-instant feedback is at the very core of CI and the only way to deliver it is by running the build. All the points below change the amount of details that are provided or reduce the risk of introducing bugs into the code. But this hugely powerful feeling you get after your first commit triggers a build, is the important aspect for successful adoption in my view.

Test Automation

Start with “compilation works” as the lowest common denominator. When you want to start adding the use of “proper” test frameworks, feel free to do so. But is nothing you need on day one.

When you are ready to do more, you need to focus on those parts of your code that are most relevant for the business. Resist the temptation of striving for large test coverage of your code for the sake of it (having a KPI on this is a really bad idea). Otherwise people will start writing test for trivial helper functions, testing which on their own is of low relevance.

Instead take the critical parts of the business logic and develop a way to test them end-to-end (if possible without the GUI yet). With this approach you will implicitly cover all the lower-level stuff underneath automatically. Unless you have someone on your team with practical experience on integration testing frameworks (e.g. Citrus), I would not start with a full-blown approach but rather develop a few custom scripts.

The point in time when to start with more advanced topics, especially automated performance tests, depends on your individual situation and I will not make recommendations about it here. But what you should do as soon as possible, is read up on the subject and get an understanding about the different types of test and what they are good for. You do not need to implement everything now, but this will allow you to make informed judgements about the path you choose.

In Closing

You should now have an idea how to get started with CI quickly and in a way that delivers positive results pretty much from day one. Gaining traction in the organization should be your first priority in the beginning. There is a widespread misconception that things like CI, while theoretically the right to do, slow developers down. Nothing could be further from the truth. But unless you fight this impression fiercely, sooner or later management will ask for by-passing that “nice new thing” and get code out of the code faster using the old way.

Related posts:

How to Implement Test-Driven Development

Test-Driven Development (TDD) is something I have long had difficulties with. Not because I consider it a bad concept, but found it very difficult to start doing. In hindsight it appears that the advice given in the respective books and online articles was not suitable. So here is the approach that finally worked for me.

It boils down to deviating from the pure doctrine. Instead of writing a test before starting on a new piece of code, I start with the actual code right away. Yes, that violates the core principle, although only for a while. But I have found that in most cases my understanding of the problem is still somewhat vague when I start working on it. So for my brain it is better if I do not have to split its capacity between solving the actual problem and thinking about how to devise a proper test and what all that means for the structure of the future code.

Once the initial version of the working code is there and manually validated, I do add the test. From then on I am in a position to refactor the code without the risk of breaking something. And of course this refactoring is needed because the first version of any code is never really good. While you could write “better” initial code, this would require spending more time upfront than you otherwise need for refactoring later. And it also ignores the fact that you only really understand the problem, when you have finished implementing the solution.

What I later realized was that my approach also helped me to write more testable code. But instead of consciously having to work on it, this sneaked in as a by-product of my modified way of doing TDD. For me this is a more natural way of learning and the results are typically better than following some formal approach.