It’s pretty common when you read a success story about migrating from a monolith to microservices to see that people have a clear idea of what they already have; what they want to attain overall; that they have looked at all the pros and cons; and out of the plethora of available candidates, they chose Kubernetes. They have been faced with insurmountable problems, and with an unbelievable superhuman effort they resolved these issues and finally found the kind of happy resolution that happens “a year and a half into production.”
Was that how it was for us? Definitely not.
We didn’t spend a lot of time considering the idea of migrating to microservices like that. One day we just decided “why not give it a try?” There was no need to choose from the orchestrators at that point: Most of the dinosaurs were already on their last legs, except for Nomad, which was still showing some signs of life. Kubernetes became the de facto standard, and I had experience working with it. So we decided to pluck up our courage and try to run something non-critical in Kubernetes.
Considering that at that time all our infrastructure was in AWS, we also didn’t spend much time deciding to use EKS.
I’m struggling to remember who we chose as the guinea pig for the run in EKS — it might have been Jenkins. Or Prometheus. It’s difficult to say, but gradually all the new services were launched in EKS, and on the whole, everyone liked the approach.
The only thing that we didn’t understand was how to organize CI/CD.
At that time, we had the heady mix of Ansible/Terraform/Bitbucket, and we were not entirely satisfied with the results. Besides, we tried to practice delivery engineering and didn’t have a dedicated DevOps team, and there were many other factors too.
What did we need?
- Unification — despite the fact that we never needed our teams to use a strictly defined stack, in CI/CD, some certainty was desired.
- Decentralization — as mentioned earlier, we did not have a dedicated DevOps team, nor the desire (or need) to start one.
- Relevance — not bleeding edge, but we wanted a tech stack that was on trend.
- We also wanted the obvious things like speed, convenience, flexibility, etc.
It was safe to say that Helm was the standard for installing and running applications in EKS, so we didn’t use Ansible or Terraform for the management and templating of Kubernetes objects, although this solution was offered. We only used Helm (although there were lots of questions and complaints about it).
We also didn’t use Ansible or Terraform to manage Helm charts. It didn’t fit with our desire for decentralization and wasn’t exactly convenient. Again, because we don’t have a DevOps team, our service can be deployed in EKS by any developer with the help of Helm, and we don’t need (or want) to be involved in this process. We therefore took the most controversial route: We made our wrapper for Helm so it would work like an automatic transmission, more specifically that it would reduce interaction with the user when making the decision to go or not to go (in our case, to deploy or not to deploy). Later, we added a general Helm chart to this wrapper, so the developer needed several input values for deploying:
- What to deploy (docker image)
- Where to deploy (dev, stage, prod, etc.)
So in all, the service deployment process was run from the repository of the same service by the same developer, exactly when and how the developer needed it. Our participation in this process was reduced to minimal consultation on some borderline cases and occasionally eliminating errors (where would we be without them?) in the wrapper.
And then we lived happily ever after. But our story isn’t about that at all.
In fact, I was asked to talk about why we use Kubernetes, not how it went. If I am honest (and as you can surely tell), I don’t have a clear answer. Maybe it would be better if I told you why we are continuing to use Kubernetes.
With Kubernetes, we were able to:
- Better utilize EC2 instances
- Obtain a better mix of decentralization (all the services arrive in Kubernetes from authorized repositories, we are not involved in the process) and centralization (we always see when, how, and from where a service arrives to us, whether it is a log, audit or event)
- Conveniently scale a cluster (we use the combination cluster autoscaler and horizontal pod autoscaler)
- Get a convenient infrastructure debug (not forgetting that Kubernetes is only one level of abstraction over several others, and even in the worst case scenario it is under the hood of standard RHEL … well, at the very least we have it)
- Get high levels of fault tolerance and self-healing for the infrastructure
- Get a single (well, almost) and understandable CI/CD
- Significantly shorten TTM
- Have an excellent reason to write this post
And although we didn’t get anything new, we like what we got.