In this article I will try to summarize my favorite tools for Kubernetes with special emphasis on the newest and lesser known tools which I think will become very popular.
This is just my personal list based on my experience but, in order to avoid biases, I will try to also mention alternatives to each tool so you can compare and decide based on your needs. I will keep this article as short as I can and I will try to provide links so you can explore more on your own. My goal is to answer the question: “How can I do X in Kubernetes?” by describing tools for different software development tasks.
K3D is my favorite way to run Kubernetes(K8s) clusters on my laptop. It is extremely lightweight and very fast. It is a wrapper around K3S using Docker. So, you only need Docker to run it and it has a very low resource usage. The only problem is that it is not fully K8s compliant, but this shouldn’t be an issue for local development. For test environments you can use other solutions. K3D is faster than Kind, but Kind is fully compliant.
Lens is an IDE for K8s for SREs, Ops and Developers. It works with any Kubernetes distribution: on-prem or in the cloud. It is fast, easy to use and provides real time observability. With Lens it is very easy to manage many clusters. This is a must have if you are a cluster operator.
- K9sis an excellent choice for those who prefer a lightweight terminal alternative. K9s continually watches Kubernetes for changes and offers subsequent commands to interact with your observed resources.
Helm shouldn’t need an introduction, it is the most famous package manager for Kubernetes. And yes, you should use package managers in K8s, same as you use it in programming languages. Helm allows you to pack your application in Charts which abstract complex application into reusable simple components that are easy to define, install and update.
It also provides a powerful templating engine. Helm is mature, has lots of pre defined charts, great support and it is easy to use.
- Kustomize is a newer and great alternative for helm which does not use a templating engine but an overlay engine where you have base definitions and overlays on top of them.
I believe that GitOps is one of the best ideas of the last decade. In software development, we should use a single source of truth to track all the moving pieces required to build software and Git is a the perfect tool to do that. The idea is to have a Git repository that contains the application code and also declarative descriptions of the infrastructure(IaC) which represent the desired production environment state; and an automated process to make the desired environment match the described state in the repository.
GitOps: versioned CI/CD on top of declarative infrastructure. Stop scripting and start shipping.
Although with Terraform or similar tools you can have your infrastructure as code(IaC), this is not enough to be able to sync your desired state in Git with production. We need a way to continuous monitor the environments and make sure there is no configuration drift. With Terraform you will have to write scripts that run
terraform apply and check if the status matches the Terraform state but this is tedious and hard to maintain.
Kubernetes has been build with the idea of control loops from the ground up, this means that Kubernetes is always watching the state of the cluster to make sure it matches the desired state, for example, that the number of replicas running matches the desired number of replicas. The idea of GitOps is to extend this to applications, so you can define your services as code, for example, by defining Helm Charts, and use a tool that leverages K8s capabilities to monitor the state of your App and adjust the cluster accordingly. That is, if update your code repo, or your helm chart the production cluster is also updated. This is true continuous deployment. The core principle is that application deployment and lifecycle management should be automated, auditable, and easy to understand.
For me this idea is revolutionary and if done properly, will enable organizations to focus more on features and less on writing scripts for automation. This concept can be extended to other areas of Software Development, for example, you can store your documentation in your code to track the history of changes and make sure the documentation is up to date; or track architectural decision using ADRs.
In my opinion, the best GitOps tool in Kubernetes is ArgoCD. You can read more about here. ArgoCD is part of the Argo ecosystem which includes some other great tools, some of which, we will discuss later.
With ArgoCD you can have each environment in a code repository where you define all the configuration for that environment. Argo CD automates the deployment of the desired application state in the specified target environments.
Argo CD is implemented as a kubernetes controller which continuously monitors running applications and compares the current, live state against the desired target state (as specified in the Git repo). Argo CD reports and visualizes the differences and can automatically or manually sync the live state back to the desired target state.
- Flux which just released a new version with many improvements. It offers very similar functionality.
Argo Workflows and Argo Events
In Kubernetes, you may also need to run batch jobs or complex workflows. This could be part of your data pipeline, asynchronous processes or even CI/CD. On top of that, you may need to run even driven microservices that react to certain events like a file was uploaded or a message was sent to a queue. For all of this, we have Argo Workflows and Argo Events.
Although they are separate projects, they tend to be deployed together.
Argo Workflows is an orchestration engine similar to Apache Airflow but native to Kubernetes. It uses custom CRDs to define complex workflows using steps or DAGs using YAML which feels more natural in K8s. It has an nice UI, retries mechanisms, cron based jobs, inputs and outputs tacking and much more. You can use it to orchestrate data pipelines, batch jobs and much more.
Sometimes, you may want to integrate your pipelines with Async services like stream engines like Kafka, queues, webhooks or deep storage services. For example, you may want to react to events like a file uploaded to S3. For this, you will use Argo Events.
These two tools combines provide an easy and powerful solution for all your pipelines needs including CI/CD pipelines which will allow you to run your CI/CD pipelines natively in Kubernetes.
We just saw how we can run Kubernetes native CI/CD pipelines using Argo Workflows. One common task is to build Docker images, this is usually tedious in Kubernetes since the build process actually runs on a container itself and you need to use workarounds to use the Docker engine of the host.
The bottom line is that you shouldn’t use Docker to build your images: use Kanico instead. Kaniko doesn’t depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This enables building container images in environments that can’t easily or securely run a Docker daemon, such as a standard Kubernetes cluster. This removes all the issues regarding building images inside a K8s cluster.
Istio is the most famous service mesh on the market, it is open source and very popular. I won’t go into details regarding what a service mesh is because it is a huge topic, but if you are building microservices, and probably you should, then you will need a service mesh to manage the communication, observability, error handling, security and all of the other cross cutting aspects that come as part of the microservice architecture. Instead of polluting the code of each microservice with duplicate logic, leverage the service mesh to do it for you.
In short, a service mesh is a dedicated infrastructure layer that you can add to your applications. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code.
Istio if used to run microseconds and although you can run Istio and use microservices anywhere, Kubernetes has been proven over and over again as the best platform to run them. Istio can also extend your K8s cluster to other services such as VMs allowing you to have Hybrid environments which are extremely useful when migrating to Kubernetes.
- Linkerd is a lighter and probably faster service mesh. Linkerd is built for security from the ground up, ranging from features like on-by-default mTLS, a data plane that is built in a Rust, memory-safe language, and regular security audits
- Consul is a service mesh built for any runtime and cloud provider, so it works great for hybrid deployments across K8s and cloud providers. It is a great choice if not all your workloads run on Kubernetes.
We mentioned already that you can use Kubernetes to run your CI/CD pipeline using Argo Workflows or a similar tools using Kanico to build your images. The next logical step is to continue and do continuous deployments. This is is extremely challenging to do in a real word scenario due to the high risk involved, that’s why most companies just do continuous delivery, which means that they have the automation in place but they still have manual approvals and verification, this manual step is cause by the fact that the team cannot fully trust their automation.
So how do you build that trust to be able to get rid of all the scripts and fully automate everything from source code all the way to production? The answer is: observability. You need to focus the resources more on metrics and gather all the data needed to accurately represent the state of your application. The goal is to use a set of metrics to build that trust. If you have all the data in Prometheus then you can automate the deployment because you can automate the progressive roll out of your application based on those metrics.
In short, you need more advanced deployment techniques than what K8s offers out of the box which are Rolling Updates. We need progressive delivery using canary deployments. The goal is to progressively route traffic to the new version of an application, wait for metrics to be collected, analyze them and match them against pre define rules. If everything is okay, we increase the traffic; if there are any issues we roll back the deployment.
To do this in Kubernetes, you can use Argo Rollouts which offers Canary releases and much more.
Argo Rollouts is a Kubernetes controller and set of CRDs which provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes.
Although Service Meshes like Istio provide Canary Releases, Argo Rollouts makes this process much easier and developer centric since it was built specifically for this purpose. On top of that Argo Rollouts can be integrated with any service mesh.
Argo Rollouts Features:
- Blue-Green update strategy
- Canary update strategy
- Fine-grained, weighted traffic shifting
- Automated rollbacks and promotions or Manual judgement
- Customizable metric queries and analysis of business KPIs
- Ingress controller integration: NGINX, ALB
- Service Mesh integration: Istio, Linkerd, SMI
- Metric provider integration: Prometheus, Wavefront, Kayenta, Web, Kubernetes Jobs
- Istioas a service mesh for canary releases. Istio is much more than a progressive delivery tool, it is a full service mesh. Istio does not automate the deployment, Argo Rollouts can integrate with Istio to achieve this.
- Flagger is very similar to Argo Rollouts and it very well integrated with Flux, so if your ar using Flux consider Flagger.
- Spinnaker was the first continuous delivery tool for Kubernetes, it has many features but it is a bit more complicated to use and set up.
Crossplane is my new favorite K8s tool, I’m very exited about this project because it brings to Kubernetes a critical missing piece: manage 3rd party services as if they were K8s resources. This means, that you can provision a cloud provider databases such AWS RDS or GCP Cloud SQL like you would provision a database in K8s, using K8s resources defined in YAML.
With Crossplane, there is no need to separate infrastructure and code using different tools and methodologies. You can define everything using K8s resources. This way, you don’t need to learn new tools such as Terraform and keep them separately.
Crossplane is an open source Kubernetes add-on that enables platform teams to assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume, without having to write any code.
Crossplane extends your Kubernetes cluster, providing you with CRDs for any infrastructure or managed cloud service. Furthermore, it allows you to fully implement continuous deployment because contrary to other tools such Terraform, Crossplane uses existing K8s capabilities such as control loops to continuously watch your cluster and detect any configuration drifting acting on it automatically. For example, if you define a managed database instance and someone manually change it, Crossplane will automatically detect the issue and set it back to the previous value. This enforces infrastructure as code and GitOps principles. Crossplane works great with Argo CD which can watch the source code and make sure your code repo is the single source of truth and any changes in the code are propagated to the cluster and also external cloud services. Without Crossplane you could only implement GitOps in your K8s services but not your cloud services without using a separate process, now you can do this, which is awesome.
- Terraform which is the most famous IaC tool but it is not native to K8s, requires new skills and does not automatically watches from configuration drifts.
- Pulumi which is a Terraform alternative which works using programming languages that can be tested and understood by developers.
I already talked about Serverless in the past, so check my previous article to know more about this. The problem with Serverless is that it is tightly coupled to the cloud provider since the provider can create a great ecosystem for event driven applications.
For Kubernetes, if you want to run functions as code and use an event driven architecture, your best choice is Knative. Knative is build to run functions on Kubernetes creating an abstraction on top of a Pod.
- Focused API with higher level abstractions for common app use-cases.
- Stand up a scalable, secure, stateless service in seconds.
- Loosely coupled features let you use the pieces you need.
- Pluggable components let you bring your own logging and monitoring, networking, and service mesh.
- Knative is portable: run it anywhere Kubernetes runs, never worry about vendor lock-in.
- Idiomatic developer experience, supporting common patterns such as GitOps, DockerOps, ManualOps.
- Knative can be used with common tools and frameworks such as Django, Ruby on Rails, Spring, and many more.
- Argo Events provide an event-driven workflow engine for Kubernetes that can integrate with cloud engines such as AWS Lambda. It is not FaaS but provides an event driven architecture to Kubernetes.
Kubernetes provides great flexibility in order to empower agile autonomous teams but with great power comes great responsibility. There has to be a set of best practices and rules to ensure a consistent and cohesive way to deploy and manage workloads which are compliant with the companies policies and security requirements.
There are several tools to enable this but none were native to Kubernetes… until now. Kyverno is a policy engine designed for Kubernetes, policies are managed as Kubernetes resources and no new language is required to write policies. Kyverno policies can validate, mutate, and generate Kubernetes resources.
You can apply any kind of policy regarding best practices, networking or security. For example, you can enforce that all your service have labels or all containers run as non root. You can check some policy examples here. Policies can be applied to the whole cluster or to a given namespace. You can also choose if you just want to audit the policies or enforce them blocking users from deploying resources.
- Open Policy Agent is a famous cloud native policy-based control engine. It used its own declarative language and it works in many environments, not only on Kubernetes. It is more difficult to manage than Kyvernobut more powerful.
One problem with Kubernetes is that developers need to know and understand very well the platform and the cluster configuration. Many would argue that the level of abstraction in K8s is too low and this causes a lot of friction for developers who just want to focus on writing and shipping applications.
The Open Application Model (OAM) was created to overcome this problem. The idea is to create a higher level of abstraction around applications which is independent of the underlying runtime. You can read the spec here.
Focused on application rather than container or orchestrator, Open Application Model [OAM] brings modular, extensible, and portable design for modeling application deployment with higher level yet consistent API.
Kubevela is an implementation of the OAM model. KubeVela is runtime agnostic, natively extensible, yet most importantly, application-centric . In Kubevela applications are first class citizens implemented as Kubernetes resources. There is a distinction between cluster operators(Platform Team) and developers (Application Team). Cluster operators manage the cluster and the different environments by defining components(deployable/provisionable entities that compose your application like helm charts) and traits. Developers define applications by assembling components and traits.
KubeVela is a Cloud Native Computing Foundation sandbox project and although it is still in its infancy, it can change the way we use Kubernetes in the near future allowing developers to focus on applications without being Kubernetes experts. However, I do have some concerns regarding the applicability of the OAM in the real world since some services like system applications, ML or big data processes depend considerably on low level details which could be tricky to incorporate in the OAM model.
- Shipa follows a similar approach enabling platform and developer teams to work together to easily deploy application to Kubernetes.
A very important aspect in any development process is Security, this has always been an issue for Kubernetes since companies who wanted to migrate to Kubernetes couldn’t easily implement their current security principles.
Snyk tries to mitigate this by providing a security framework that can easily integrate with Kubernetes. It can detect vulnerabilities in container images, your code, open source projects and much more.
If you run your workload in Kubernetes and you use volumes to store data, you need to create and manage backups. Velero provides a simple backup/restore process, disaster recovery mechanisms and data migrations.
Unlike other tools which directly access the Kubernetes etcd database to perform backups and restores, Velero uses the Kubernetes API to capture the state of cluster resources and to restore them when necessary. Additionally, Velero enables you to backup and restore your application persistent data alongside the configurations.
Another common process in software development is to manage schema evolution when using relational databases.
SchemaHero is an open-source database schema migration tool that converts a schema definition into migration scripts that can be applied in any environment. It uses Kubernetes declarative nature to manage database schema migrations. You just specify the desired state and SchemaHero manages the rest.
- LiquidBase is the most famous alternative. It is more difficult to use and it’s not Kubernetes native but it has more features.
Bitnami Sealed Secrets
We already cover many GitOps tools such as ArgoCD. Our goal is to keep everything in Git and use Kubernetes declarative nature to keep the environments in sync. We just saw how we can (and we should) keep our source of truth in Git and have automated processes handle the configuration changes.
One thing that it was usually hard to keep in Git were secrets such DB passwords or API keys, this is because you should never store secrets in your code repository. One common solution is to use an external vault such as AWS Secret Manageror HashiCorpVaultto store the secrets but this creates a lot of friction since you need to have a separate process to handle secrets. Ideally, we would like a way to safely store secrets in Git just like any other resource.
Sealed Secrets were created to overcome this issue allowing you to store your sensitive data in Git by using strong encryption. Bitnami Sealed Secrets integrate natively in Kubernetes allowing you to decrypt the secrets only by the Kubernetes controller running in Kubernetes and no one else. The controller will decrypt the data and create native K8s secrets which are safely stored. This enables us to store absolutely everything as code in our repo allowing us to perform continuous deployment safely without any external dependencies.
Sealed Secrets is composed of two parts:
- A cluster-side controller
- A client-side utility:
kubeseal utility uses asymmetric crypto to encrypt secrets that only the controller can decrypt. These encrypted secrets are encoded in a
SealedSecret K8s resource that you can store in Git.
Many companies use multi tenancy to manage different customers. This is quite common in software development but difficult to implement in Kubernetes. Namespaces are a great way to create logical partitions of the cluster as isolated slices but this is not enough in order to securely isolate customers, we need to enforce network policies, quotas and more. You can create network policies and rules per name space but this is a tedious process that it is difficult to scale. Also, tenants will not able to use more than one namespace which is a big limitation.
Hierarchical Namespaces were created to overcome some of these issues. The idea is to have a parent namespace per tenant with common network policies and quotas for the tenants and allow the creation of child namespaces. This is a great improvement but it does not have native support for a tenant in terms of security and governance. Furthermore, it hasn’t reach production status yet but version 1.0 is expected to be release in the next months.
A common approach to currently solve this, is to create a cluster per customer, this is secure and provides everything a tenant will need but this is hard to manage and very expensive.
Capsule is a tool which provides native Kubernetes support for multiple tenants within a single cluster. With Capsule, you can have a single cluster for all your tenants. Capsule will provide an “almost” native experience for the tenants(with some minor restrictions) who will be able to create multiple namespaces and use the cluster as it was entirely available for them hiding the fact that the cluster is actually shared.
In a single cluster, the Capsule Controller aggregates multiple namespaces in a lightweight Kubernetes abstraction called Tenant, which is a grouping of Kubernetes Namespaces. Within each tenant, users are free to create their namespaces and share all the assigned resources while the Policy Engine keeps the different tenants isolated from each other.
The Network and Security Policies, Resource Quota, Limit Ranges, RBAC, and other policies defined at the tenant level are automatically inherited by all the namespaces in the tenant similar to Hierarchical Namespaces. Then users are free to operate their tenants in autonomy, without the intervention of the cluster administrator. Capsule is GitOps ready since it is declarative and all the configuration can be stored in Git.
VCluster goes one step further in terms of multi tenancy, it offers virtual clusters inside a Kubernetes cluster. Each cluster runs on a regular namespace and it is fully isolated. Virtual clusters have their own API server and a separate data store, so every Kubernetes object you create in the vcluster only exists inside the vcluster. Also, you can use kube context with virtual clusters to use them like regular clusters.
As long as you can create a deployment inside a single namespace, you will be able to create a virtual cluster and become admin of this virtual cluster, tenants can create namespaces, install CRDs, configure permissions and much more.
vCluster uses k3s as its API server to make virtual clusters super lightweight and cost-efficient; and since k3s clusters are 100% compliant, virtual clusters are 100% compliant as well. vclusters are super lightweight (1 pod), consume very few resources and run on any Kubernetes cluster without requiring privileged access to the underlying cluster. Compared to Capsule, it does use a bit more resources but it offer more flexibility since multi tenancy is just one of the use cases.
- kube-burner is used for stress testing. It provides metrics and alerts.
- kubewatch is used for monitoring but mainly focus on push notifications based on Kubernetes events like resource creation or deletion. It can integrate with many tools like Slack.
- kube-fledged is a Kubernetes add-on for creating and managing a cache of container images directly on the worker nodes of a Kubernetes cluster. As a result, application pods start almost instantly, since the images need not be pulled from the registry.
In this article we have reviewed my favorite Kubernetes tools. I focused on Open Source projects that can be incorporated in any Kubernetes distribution. I didn’t cover comercial solutions such as OpenShift or Cloud Providers Add-Ons since I wanted to keep i generic, but I do encourage to explore what your cloud provider can offer you if you run Kubernetes on the cloud or using a comercial tool.
My goal is to show you that you can do everything you do on-prem in Kubernetes. I also focused more in less known tools which I think may have a lot of potential such Crossplane, Argo Rollouts or Kubevela. The tools that I’m more excited about are vCluster, Crossplane and ArgoCD/Workflows.
Feel free to get in touch if you have any questions or need any advice.