How Docker Manages Its Massive Open Source Project

Editor's Note: This is the final piece of a our three-part interview with Jérôme Petazzoni, Senior Engineer at Docker. Check out Part 1: How Docker Was Born and Part 2: How Docker Fits Into The Current DevOps Landscape

Less than 14 months after launching, Docker has become one of the fastest-growing open source projects in the world. With over 400 contributors pushing code at any given moment, closing over 600 issues per month, and hundreds of pull requests being merged in per week, we wondered how Docker keeps up with it all. The last part of our interview with Jerome dives into Docker's stack, tools and process for managing their massive project.

Note: Docker announced the Docker Governance Advisory Board shortly after we conducted this interview

Docker Ecosystem



Highlights

"I think one of the biggest challenges that we have been facing in managing the pull requests and the contributions is that the project is so active, and there are so many pull requests going on at any time that even when the maintainers commit more than half of their time to review pull requests, you still get a lot of old pull requests piling up in the depth of the queue....We invested more time in tools to have some visibility on open issues and pull requests. For instance, we want to be able to easily detect those pull requests that can't be merged because they need to be rebased first, due to some other change in the code base."

"Another thing to help with the volume of contributions was to improve the build and test process. In Docker 0.6, we added the ability to run Docker within Docker. This allows funny inception memes, but more importantly, it brings the ability to build, then test Docker, in a predictable manner, using the Docker way."

"There is a lot of discussion internally about the perfect Docker hacking environment. Some of us are considering moving away from a normal distro and going to something that boots Docker. Then everything would be in a container. Your graphic environment would be in a container. Your editor would be in a container. Everything would be in a container."



Docker's Stack

Go GitHub Gordon Docker boot2docker PagerDuty Pingdom dotCloud


<!--more-->

LS: I want to get a little bit into how you guys actually build and ship Docker itself. Can you talk a little bit about your process, just going through build, test, and deploy. Some of the things you guys use internally.

J: Initially, it was just Go code. Using the normal Go workflow, "go build" -

LS: By the way, was it always in Go? Was dotCloud in Go?

J: The internal container engine used in the dotCloud PaaS was in Python. Picking Go for Docker was a sure way to ensure that we wouldn't reuse the legacy code. But there are other reasons.

LS: The other one was that it's not Ruby and it's not Python right.

J: Yeah. It's something that everybody can love. Also, to ease with the adoption process, Go has this nice feature that you can compile a Go program in a single binary that embeds all the libraries required to run it. So it means that when you want someone to be able to test Docker, they just need to download that binary, run it, and that's it. No extra library, no dependency, no nothing.

LS: Same sort of concept that you guys are trying to address.

J: Exactly. Docker makes deployments easier, but it had to be easy to deploy itself.

LS: What's your development process?

J: We embraced a very typical open source development workflow very early. Everything happens through pull requests. Even the core maintainers themselves never commit to master; they make pull requests as well. The code gets reviewed by other maintainers, and you have to get the majority of the maintainers to approve your changes to be able to merge them.

Everything happens on GitHub. Pull requests, issues, everything is public. The discussions around the project happen either on a public IRC channel (when we need some interactivity) or on a public mailing list (when we need long-running conversations). Of course, there are some face-to-face meetings as well, but since an increasing number of core maintainers are outside of the company, we make sure that when there is a face-to-face meeting or whiteboarding session, the outcome is immediately shared with the community, to get feedback and comments.

We have a weekly IRC meeting; it is a kind of staff meeting for the Docker open source project. It's open for anyone to join. Every contributor is invited to talk about what they are going to work on, and give feedback on what was previously done.

It's a really open process, and we're rather proud of the way it works. There is no behind-the-scenes talks happening only at Docker Inc., or anything like that. It's extremely public and open.

And sometimes, it's hard to tell who is an external contributor and who is an employee of the company. That person might be very active on the IRC channels, contribute a lot of code, and be involved in many topics; you will see their name all around the place. But they might be working for another company that has a big involvement in Docker, and is investing a lot in the project. Reaching that point is a really nice achievement.

LS: Yeah, when you can't tell the difference between employees and contributors. That's very cool. This completely open process was there since the beginning, right? Since even the first version of Docker was out right?

J: That was from the very beginning, yeah. The specifics like the exact governance and decision process had to be refined a little bit, but there was no drastic change. It was a really progressive improvement to adapt to the bigger scale of the project and the larger number of contributors.

LS: Is it easy to have that governance process in GitHub? To be able to say, all right, if this many people approve... maybe you're using the GitHub API or something, but if this many people approve a pull request, then that constitutes...

J: Today, to be perfectly honest, we rely a lot on the good will and behavior of the maintainers. Which means that if some maintainer goes crazy and decides to merge some stuff, technically they could. Even though everybody would notice and would say, "What's happening?"

We considered using the GitHub API to automate and enforce the approval process. Maintainers would have to write down something specific in the comments to indicate, "I approve this commit." Then a web hook would trigger after each comment, and whenever it detects that the majority of maintainers is approving a change, it would be merged automatically.

We made some experiments in that direction and decided that, even though at some point it may make sense to have something like that, it's not yet necessary for the project. It works very well as it is right now.

However, we invested more time in tools to have some visibility on open issues and pull requests. For instance, we want to be able to easily detect those pull requests that can't be merged because they need to be rebased first, due to some other change in the code base.

Those tools are available in a repository called Gordon (Gordon is the name of our pet turtle in the office; think Flash Gordon!). Those tools are also used to extract some interesting metrics on pull requests and issues.

LS: So Gordon gives you more metadata around issues.

J: Yeah. There are many ways to look at it. I'm really a big fan of the command line; so for me, it's a kind of command line interface for GitHub. It makes it easy to spot the most active pull requests at the moment. Or the ones that have been lying around for a while. Maybe the person who submitted that pull request is just waiting for us for some feedback, so we should do that instead of leaving that open for a while.

I think one of the biggest challenges that we have been facing in managing the pull requests and the contributions is that the project is so active, and there are so many pull requests going on at any time, that even when the maintainers commit more than half of their time to review pull requests, you still get a lot of old pull requests piling up in the depth of the queue.

And let's say you do a pull request, but you don't have a lot of time and availability right now to follow up. So you come back one week later. One week is not a lot of time, but it's enough for the code base to change significantly. Sometimes, before even getting to the essence of your pull request, the maintainers will tell you, "Sorry, but you should rebase." Or, "Sorry, but we now support a new execution driver, so you should also implement your feature for the new execution driver as well." In some cases, that can be frustrating.

LS: I was about to say, it's frustrating for the contributor but also even for you guys, right, because you want to be able to merge these changes in, but the code base is changing so quickly that you can't get to everything.

J: Exactly. We are absolutely aware of that. It's something that we strive to address. It's a good opportunity to make a call to arms to all people who are already contributors and would like to step up in the community to become maintainers of specific subsystems. We are actively looking for maintainers.

LS: So that's how you're partly addressing this, by getting more maintainers.

J: Right, more maintainers. And trying to get more visibility on those pull requests that deserve more love.

LS: Does Gordon help a lot with just being able to triage some of the pull requests?

J: It helps. From my point of view, navigating through sixty pull requests with a command line tool feels less scary, more manageable in a kind of way. And you get some special features. For instance, you can show the pull requests that can or cannot be merged. Or display the number of approvals for each pull request, to find those that are closer from being merged. Gordon helps with that kind of thing.

Another thing to help with the volume of contributions was to improve the build and test process.

In Docker 0.6, we added the ability to run Docker within Docker. This allows funny inception memes, but more importantly, it brings the ability to build, then test Docker, in a predictable manner, using the Docker way.

Before, you needed a special development environment on your machine; or you could compile Docker inside Docker, but then you needed a few extra steps to extract that build from the container, stop Docker, upgrade the binary, and so on. With Docker in Docker all those problems are gone. It made the build process ridiculously easy and reproducible. It solved the issue of "it doesn't work because you're using a different Go version or a different library version. Or you built with those special flags for your custom environment." Everybody builds Docker using the same image. Everybody uses the same recipe to build the tools that build Docker.

The Docker repository contains a Dockerfile, a recipe to build the Docker builder. You run that. The result is a container image, and your build of Docker is in there. That container image also has the tools that we use to do releases, to upload the binaries to the mirrors. The process is really streamlined.

While this is not strictly required (our build process is not that crazy complicated), it's a good way to do some dogfooding. To use Docker ourselves. And, of course, to make nasty dependency issues go away.

LS: So that means everybody is basically developing in a VM. No one is developing locally.

J: In fact, you can develop locally. I do because I have a Linux machine, and sometimes I appreciate the extra speed that you get from running Docker natively.

But, yeah, almost everybody is working in the VM. Then you can use shared folders or anything like that, so that you can still use TextMate or whatever your favorite editor is to edit code. You don't have to run VI in a VM or something like that.

LS: Very interesting. I guess all the contributors, maybe they are doing that as well, and maybe they're not. Do you guys recommend that if you're contributing that you run Docker within Docker?

J: Absolutely. Yeah. We have some contributing guidelines addressing all those things. The guidelines describe the whole contribution process, with the maintainer approval system. They also explain how to set up your development environment, and how you should hack on Docker.

Because the point is to make contributing easier so that you get more contributions. Not only for Docker itself but for the documentation. One common misconception is that to do something in Docker, you need to be a Go guru, a containers guru, and Linux kernel guru. This is all wrong.

First, Go is an easy language. Maybe, if you know only one language, and its syntax is very different from Go, then maybe you won't feel at ease. But otherwise, you will be just fine. Go is really readable and writable as well.

Then, you don't have to be extremely familiar with the internals because there are many areas that can use some improvement and that do not require an in-depth knowledge of everything about containers and Docker itself. Just to give you a completely random example. Recently, we decided to go from dash flags to dash dash flags. Instead of doing "docker run -link something", you are now able to do "docker run --link something" because it's more unix-y, see. You don't have to use the dash dash form yet. The single dash will be deprecated at some point. So we have a warning that shows up. We realized that the warning was a little bit confusing because it said, "Hey! -link is deprecated. It will soon go away. Check the documentation." Some people actually asked me, "Are you actually going to deprecate that feature?" I said, "No, no way." "Because the command line told me that." And realized that, yeah, the message was confusing. We should say, "-link is deprecated in favor of --link," or something like that. This is not really a bug. It's just improving the experience, and it doesn't require knowledge of Docker internals. At the end of the day, it's an easy way to get your name in the list of contributors for the next Docker release.

LS: Very cool. Do you want to talk a little bit about testing and how you guys handle it?

J: Absolutely. We address testing in two ways, we have unit testing and functional (or integration) testing.

For unit testing, we're using the basic "go test" framework. We found out that it was great for unit testing. It can also be extended with coverage testing, to make sure that all the code was indeed executed during test runs.

At the same time, we have functional tests, to exercise more complex end-to-end scenarios. That brings a lot of interesting challenges. Because today, you can still test all the features of Docker on a single machine; but imagine that tomorrow, we have support for FreeBSD. Then it means that when you do changes, you will need to run the test suite not only on Linux but on FreeBSD as well.

LS: Cool. Just in terms of your guys' workflow, are you guys using anything for notification so that you know when builds are passing, failing? When issues are being submitted and all that?

J: We have a basic IRC bot to notify about pull requests. It doesn't notify about regular issue tickets, because there are so many of them, that it would be a lot of noise. That's mostly it.

LS: For the private registry, where are you guys hosting that? Are you allowed to say?

J: Yeah, it's fine to talk about it. Currently, the registry is still on EC2. It's still running on dotCloud. We built a PaaS, so we figured we could just use it.

We also have a new hosting platform, built entirely on top of Docker. It's basically containers on physical machines. We got a handful of physical machines in a data center real close from EC2 for latency and throughput reasons. And we deployed Docker directly on top of that. That's what we use for new projects. It's Docker all the way, in the sense that the team using those resources, doesn't even have SSH access to the machines. They only have Docker API access, and they have to use that for everything.

We wanted to see what does the workflow look like. Is it convenient, enjoyable, or painful to deploy something with Docker like that? So far, the overall feedback is that it works well. It's still very rough, but it's promising. We already see that we need a lot of extra tools to orchestrate containers.

For instance, when you lose a machine because it crashes, or is disconnected. Someone, or something, has to provision a new machine, redeploy it, restart containers. Initially, the developers would do it manually. But it's better if we can agree on a lingua franca, a common language between the various dev teams and ops teams, to express what containers should be running. So whenever a machine goes away, anyone can bring everything back up even if they have no idea exactly what's running. That's harder if one dev team is using shell scripts, another one is using Ansible, and another one, maybe Fabric. That's one of the things that we are working on right now. Trying to see if we can strike the balance between leaving as much freedom as possible for the developers (to let them use the best tools for what they need to do), and at the same time, having an ops team or an ops process for automatically restarting things if problems happen.

LS: Anything worth mentioning for your monitoring?

J: Right now, we are still using the good ol' Pingdom and PagerDuty. The things that we had on the PaaS. So, nothing specific to Docker and containers at this point. We learned one thing: it is better if you can reduce all your monitoring to be based on HTTP. If I'm to monitor the status of your SQL database, I want you to just put a really simple HTTP end point in front of it. Something with just a "/ping" route, which will open the connection, make the dumbest and cheapest request possible, and tell me if everything went right. Then I will do simple HTTP requests to monitor the health of the database. Then it means that I can use Pingdom to monitor all the things. The PaaS is using a big and very complicated Nagios set-up, but for Docker we still have a pretty simple stack. So we decided to remove Nagios from the equation for now, and we use directly Pingdom with HTTP checks, and we hook that with PagerDuty. Those were the tools that we were using on the PaaS, and we were extremely happy with them, so we're using that.

LS: In terms of database for the registry...

J: We tried really hard to reduce the need for a database in the registry. Almost everything is kept in metadata files alongside with the layers. When you push something to the registry, you're actually pushing a bunch of tarballs, responding to the various layers, composing your container. For each layer, you have a small JSON file containing some metadata, and describing how the layers are connected with each other. For optimization purposes, the registry builds the whole ancestry of each image. So when you start to download something, the registry can tell you immediately the list of layers needed by that image, and you can download them all in parallel immediately. Well, even that ancestry is not using a database. Everything is stored as plain files, and there is a caching mechanism so that the metadata is cached in Redis. If you nuke that Redis, it will just repopulate itself as people download images.

The code for the registry is publicly available, and anyone can run their own with exactly the same behavior as ours. The key difference in our registry is that you have a flag that lets your turn a specific image to private. But except for that flag, you can run the same registry anywhere.

There is even a public image available, so if you do "docker run stackbrew/registry" you end up with a Docker registry running locally in less than a minute. By default, it will store images locally in the file system. But you can provide a little config file that uses S3 or Swift or Elliptics or other object storage mechanisms if you need to scale beyond local file system.

LS: Do you use anything external to GitHub for code review?

J: Not at this point. We considered things like Gerrit, which automates the approval process, for instance. As I said earlier, at that scale, it's not necessary yet. I don't know. Maybe in six months, in twelve, eighteen months, maybe we'll get to the point where we have so many, not only contributors, but maintainers that we need to have something automated to prevent someone from doing something wrong, either by intent or by mistake. We are not at that point yet.

There is a lot of discussion internally about the perfect Docker hacking environment. Some of us are considering moving away from a normal distro and going to something that boots Docker. Then everything would be in a container. Your graphic environment would be in a container. Your editor would be in a container. Everything would be in a container.

It started as a kind of joke, but then you realize that nothing prevents you from bringing your graphics environment within a container. And at some point, you realize that it could even have some upsides. Next time I have to set up my new workstation, I might actually just boot Docker...

LS: Somewhat unrelated: I know you guys moved away from Vagrant to boot2docker. And when was that, by the way? Because that was really recent right?

J: Yeah, it's really recent. I think we officially said to people, it's still fine to use Vagrant obviously, but you will be more likely to get help within the community if you use boot2docker because we are shifting to boot2docker, because it's simpler. It has less moving parts.

It's lightweight. It's the first time that I can actually have a one-liner that downloads a VM image and runs it in less than a minute. Usually, downloading the VM image takes forever, because it's like 500 MB. That one is just 25 MB. Of course, if you have Vagrant for other purposes, it's fine to keep it to run Docker. But if you're considering to use Vagrant just because you need Docker, then no. Go straight to boot2docker because the experience will be more polished and more straight forward.

Then came the boot2docker wrapper on the Mac, as a kind of experiment. The idea was to get something that feels like the native experience, when I'm on my local machine and I do "docker something", and it runs locally. Except that here, when I do "docker something", it talks to a Docker demon running in a VM on the Mac.

The result is very impressive. It works as well as a native Linux machine. The next step might be to do that for Windows developers. Or Linux developers who don't want to run Docker natively for some reason. Someone came up with the wording "headless hypervisor", to say that it's a VM in which you run containers. It's a kind of hypervisor for containers, and it's headless because you don't care at all about the virtual display, the screen, the console of this VM. Because the only interaction will be through the Docker API anyway.


Check out Part 1: How Docker Was Born and Part 2: How Docker Fits Into The Current DevOps Landscape