Categoria: Cloud
Selecting Performance Monitoring Tools
System monitoring is a helpful approach to provide the user with data regarding the actual timing behavior of the system. Users can perform further analysis using the data that these monitors provide. One of the goals of system monitoring is to determine whether the current execution meets the specified technical requirements.
These monitoring tools retrieve commonly viewed information, and can be used by way of the command line or a graphical user interface, as determined by the system administrator. These tools display information about the Linux system, such as free disk space, the temperature of the CPU, and other essential components, as well as networking information, such as the system IP address and current rates of upload and download.
Monitoring Tools
The Linux kernel maintains counterstructures for counting events, that increment when an event occurs. For example, disk reads and writes, and process system calls, are events that increment counters with values stored as unsigned integers. Monitoring tools read these counter values. These tools provide either per process statistics maintained in process structures, or system-wide statistics in the kernel. Monitoring tools are typically viewable by non-privileged users. The ps and top commands provide process statistics, including CPU and memory.
Monitoring Processes Using the ps Command
Troubleshooting a system requires understanding how the kernel communicates with processes, and how processes communicate with each other. At process creation, the system assigns a state to the process.
Use the ps aux command to list all users with extended user-oriented details; the resulting list includes the terminal from which processes are started, as well as processes without a terminal. A ?
sign in the TTY
column represents that the process did not start from a terminal.
[user@host]$
ps aux
USER PID %CPU %MEM VSZ RSSTTY
STAT START TIME COMMAND user 1350 0.0 0.2 233916 4808 pts/0 Ss 10:00 0:00 -bash root 1387 0.0 0.1 244904 2808?
Ss 10:01 0:00 /usr/sbin/anacron -s root 1410 0.0 0.0 0 0 ? I 10:08 0:00 [kworker/0:2... root 1435 0.0 0.0 0 0 ? I 10:31 0:00 [kworker/1:1... user 1436 0.0 0.2 266920 3816 pts/0 R+ 10:48 0:00 ps aux
The Linux version of ps supports three option formats:
- UNIX (POSIX) options, which may be grouped and must be preceded by a dash.
- BSD options, which may be grouped and must not include a dash.
- GNU long options, which are preceded by two dashes.
The output below uses the UNIX options to list every process with full details:
[user@host]$
ps -ef
UID PID PPID C STIME TTY TIME CMD root 2 0 0 09:57 ? 00:00:00 [kthreadd] root 3 2 0 09:57 ? 00:00:00 [rcu_gp] root 4 2 0 09:57 ? 00:00:00 [rcu_par_gp] ...output omitted...
Key Columns in ps OutputPID
This column shows the unique process ID.TIME
This column shows the total CPU time consumed by the process in hours:minutes:seconds format, since the start of the process.%CPU
This column shows the CPU usage during the previous second as the sum across all CPUs expressed as a percentage.RSS
This column shows the non-swapped physical memory that a process consumes in kilobytes in the resident set size, RSS
column.%MEM
This column shows the ratio of the process’ resident set size to the physical memory on the machine, expressed as a percentage.
Use the -p
option together with the pidof command to list the sshd
processes that are running.
[user@host ~]$
ps -p $(pidof sshd)
PID TTY STAT TIME COMMAND 756 ? Ss 0:00 /usr/sbin/sshd -D [email protected]... 1335 ? Ss 0:00 sshd: user [priv] 1349 ? S 0:00 sshd: user@pts/0
Use the following command to list of all processes sorted by memory usage in descending order:
[user@host ~]$
ps ax --format pid,%mem,cmd --sort -%mem
PID %MEM CMD 713 1.8 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files 715 1.8 /usr/libexec/platform-python -s /usr/sbin/firewalld --nofork --nopid 753 1.5 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P 687 1.2 /usr/lib/polkit-1/polkitd --no-debug 731 0.9 /usr/sbin/NetworkManager --no-daemon ...output omitted...
Various other options are available for ps including the o
option to customize the output and columns shown.
Monitoring Process Using top
The top command provides a real-time report of process activities with an interface for the user to filter and manipulate the monitored data. The command output shows a system-wide summary at the top and process listing at the bottom, sorted by the top CPU consuming task by default. The -n 1
option terminates the program after a single display of the process list. The following is an example output of the command:
[user@host ~]$
top -n 1
Tasks: 115 total, 1 running, 114 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 3.2 sy, 0.0 ni, 96.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 1829.0 total, 1426.5 free, 173.6 used, 228.9 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1495.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 243968 13276 8908 S 0.0 0.7 0:01.86 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp ...output omitted...
Useful Key Combinations to Sort FieldsRES
Use Shift+M to sort the processes based on resident memory.PID
Use Shift+N to sort the processes based on process ID.TIME+
Use Shift+T to sort the processes based on CPU time.
Press F and select a field from the list to use any other field for sorting.
IMPORTANT
The top command imposes a significant overhead on the system due to various system calls. While running the top command, the process running the top command is often the top CPU-consuming process.
Monitoring Memory Usage
The free command lists both free and used physical memory and swap memory. The -b
, -k
, -m
, -g
options show the output in bytes, KB, MB, or GB, respectively. The -s
option is passed as an argument that specifies the number of seconds between refreshes. For example, free -s 1 produces an update every 1 second.
[user@host ~]$
free -m
total used free shared buff/cache available Mem: 1829 172 1427 16 228 1496 Swap: 0 0 0
The near zero values in the buff/cache
and available
columns indicate a low memory situation. If the available memory is more than 20% of the total, and the used memory is close to the total memory, then these values indicate a healthy system.
Monitoring File System Usage
One stable identifier that is associated with a file system is its UUID, a very long hexadecimal number that acts as a universally unique identifier. This UUID is part of the file system and remains the same as long as the file system is not recreated. The lsblk -fp command lists the full path of the device, along with the UUIDs and mount points, as well as the type of file system in the partition. If the file system is not mounted, the mount point displays as blank.
[user@host ~]$
lsblk -fp
NAME FSTYPE LABEL UUID MOUNTPOINT /dev/vda ├─/dev/vda1 xfs 23ea8803-a396-494a-8e95-1538a53b821c /boot ├─/dev/vda2 swap cdf61ded-534c-4bd6-b458-cab18b1a72ea [SWAP] └─/dev/vda3 xfs 44330f15-2f9d-4745-ae2e-20844f22762d / /dev/vdb └─/dev/vdb1 xfs 46f543fd-78c9-4526-a857-244811be2d88
The findmnt command allows the user to take a quick look at what is mounted where, and with which options. Executing the findmnt command without any options lists out all the mounted file systems in a tree layout. Use the -s
option to read the file systems from the /etc/fstab
file. Use the -S
option to search the file systems by the source disk.
[user@host ~]$
findmnt -S /dev/vda1
TARGET SOURCE FSTYPE OPTIONS / /dev/vda1 xfs rw,relatime,seclabel,attr2,inode64,noquota
The df command provides information about the total usage of the file systems. The -h
option transforms the output into a human-readable form.
[user@host ~]$
df -h
Filesystem Size Used Avail Use% Mounted on devtmpfs 892M 0 892M 0% /dev tmpfs 915M 0 915M 0% /dev/shm tmpfs 915M 17M 899M 2% /run tmpfs 915M 0 915M 0% /sys/fs/cgroup /dev/vda1 10G 1.5G 8.6G 15% / tmpfs 183M 0 183M 0% /run/user/1000
The du command displays the total size of all the files in a given directory and its subdirectories. The -s
option suppresses the output of detailed information and displays only the total. Similar to the df -h command, the -h
option displays the output into a human-readable form.
[user@host ~]$
du -sh /home/user
16K /home/user
Using GNOME System Monitor
The System Monitor available on the GNOME desktop provides statistical data about the system status, load, and processes, as well as the ability to manipulate those processes. Similar to other monitoring tools, such as the top, ps, and free commands, the System Monitor provides both the system-wide and per-process data. These monitoring tools retrieve commonly viewed information, and can be used by way of the command line or a graphical user interface, as determined by the system administrator. Use the gnome-system-monitor command to access the application from a command terminal.
To view the CPU usage, go to the Resources tab and look at the CPU History chart.
Figure 2.2: CPU usage history in System Monitor
The virtual memory is the sum of the physical memory and the swap space in a system. A running process maps the location in physical memory to files on disk. The memory map displays the total virtual memory consumed by a running process, which determines the memory cost of running that process instance. The memory map also displays the shared libraries used by the process.
Figure 2.3: Memory map of a process in System Monitor
To display the memory map of a process in System Monitor, locate a process in the Processes tab, right-click a process in the list, and select Memory Maps.
6 Important things you need to run Kubernetes in production
Kubernetes adoption is at an all-time high. Almost every major IT organization invests in a container strategy and Kubernetes is by far the most-used and most popular container orchestration technology. While there are many flavors of Kubernetes, managed solutions like AKS, EKS and GKE are by far the most popular. Kubernetes is a very complex platform, but setting up a Kubernetes cluster is fairly easy as long as you choose a managed cloud solution. I would never advise self-managing a Kubernetes cluster unless you have a very good reason to do so.
Running Kubernetes comes with many benefits, but setting up a solid platform yourself without strong Kubernetes knowledge takes time. Setting up a Kubernetes stack according to best-practices requires expertise, and is necessary to set up a stable cluster that is future-proof. Simply running a manged cluster and deploying your application is not enough. Some additional things are needed to run a production-ready Kubernetes cluster. A good Kubernetes setup makes the life of developers a lot easier and gives them time to focus on delivering business value. In this article, I will share the most important things you need to run a Kubernetes stack in production.
1 – Infrastructure as Code (IaC)
First of all, managing your cloud infrastructure using Desired State configuration (Infrastructure as Code – IaC) comes with a lot of benefits and is a general cloud infrastructure best practice. Specifying it declarative as code will enable you to test your infrastructure (changes) in non-production environments. It discourages or prevents manual deployments, making your infrastructure deployments more consistent, reliable and repeatable. Teams implementing IaC deliver more stable environments rapidly and at scale. IaC tools like Terraform or Pulumi work great to deploy your entire Kubernetes cluster in your cloud of choice together with networking, load balancers, DNS configuration and of course an integrated Container Registry.
2 – Monitoring & Centralized logging
Kubernetes is a very stable platform. Its self-healing capabilities will solve many issues and if you don’t know where to look you wouldn’t even notice. However, that does not mean monitoring is unimportant. I have seen teams running production without proper monitoring, and suddenly a certificate expired, or a node memory overcommit caused an outage. You can easily prevent these failures with proper monitoring in place. Prometheus and Grafana are Kubernetes’ most used monitoring solutions and can be used to monitor both your platform and applications. Alerting (e.g. using the Alertmanager) should be set up for critical issues with your Kubernetes cluster, so that you can prevent downtime, failures or even data loss.
Apart from monitoring using metrics, it is also important to run centralized components like Fluentd or Filebeat to collect logging and send them to a centralized logging platform like ElasticSearch so that application error logs and log events can be traced and in a central place. These tools can be set up centrally, so standard monitoring is automatically in place for all apps without developer effort.
3 – Centralized Ingress Controller with SSL certificate management
Kubernetes has a concept of Ingress. A simple configuration that describes how traffic should flow from outside of Kubernetes to your application. A central Ingress Controller (e.g. Nginx) can be installed in the cluster to manage all incoming traffic for every application. When an Ingress Controller is linked to a public Cloud LoadBalancer, all traffic is automatically loadBalanced among Nodes, and sent to the right pods IP Addresses.
A Ingress Controller gives many benefits, because of its Centralization. It can also take care of HTTPS and SSL. An integrated component called cert-manager is a centrally deployed application in Kubernetes that takes care of HTTPS certificates. It can be configured using Let’s Encrypt, wildcard certificates or even a private Certification Authority for internal company-trusted certificates. All incoming traffic will be automatically encrypted using the HTTPS certificates and forwarded to the correct Kubernetes pods. Another thing developers won’t need to worry about.
4 – Role-Based Access Control (RBAC)
Not everyone should be a Kubernetes Administrator. We should always apply the principle of Least Privilege when it comes to Kubernetes access. Role-Based Access Control should be applied to the whole Kubernetes stack (Kubernetes API, deployment tools, dashboards, etc.). When we integrate Kubernetes with an IAM solution like Keycloak, Azure AD or AWS Cognito, we can centrally manage authentication and authorization using OAuth2 / OIDC for both platform tools and applications. Roles and groups can be defined to give users access to the resources they need to access based on their team or role.
5 – GitOps Deployments
Everyone who works with Kubernetes uses kubectl one way or another. But manually deploying to Kubernetes using the ‘kubectl apply’ command is not a best practice, most certainly not in production. Kubernetes desired state configuration should be present in GIT, and we need a deployment platform that rolls out to Kubernetes. ArgoCD and Flux are the two leading GitOps platforms for Kubernetes deployments. Both work very well for handling real-time declarative state management, making sure that Git is the single source of truth for the Kubernetes state. Even if a rogue developer tries to manually change something in production, the GitOps platform will immediately roll back the change to the desired change. With a GitOps bootstrapping technique we can manage environments, teams, projects, roles, policies, namespaces, clusters, appgroups and applications. With Git only. GitOps makes sure that all changes to all Kubernetes environments are 100% traceable, easily automated and manageable.
6 – Secret Management
Kubernetes secret manifests are used to inject secrets into your containers, either as environment variables or file mappings. Preferably, not everyone should be able to access all secrets, especially in production. Using Role-Based Access Control on secrets for team members and applications is a security best practice. Secrets can be injected into Kubernetes using a CI / CD tooling or (worse) via a local development environment, but this can result in configuration state drift. This is not traceable, and not easily manageable. The best way to sync secrets is using a central vault, like Azure Key Vault, Hashicorp Vault, AWS Secrets Manager with a central secrets operator like External Secrets Operator. This way, secret references can be stored in GIT, pointing to an entry in an external secrets Vault. For more security-focused companies it is also an option to lock out all developers from secrets in Kubernetes using RBAC. They will be able to reference secrets, and use them in containers, but will never be able to directly access them.
Conclusion
Spinning-up a managed Kubernetes cluster is easy, but setting it up correctly takes time if you don’t have the expertise. It is very important to have a good Infrastructure as Code solution, proper monitoring, RBAC and Deployment mechanisms that are secure, manageable and traceable. The earlier, the better. Setting up your Kubernetes cluster according to best practices using standardized open source tooling will help you save time, failures and headaches, especially in the long run. Of course, these are the most basic requirements for your Kubernetes stack, especially for enterprise-level companies. Other important considerations that have not been mentioned are ServiceMesh, Security scanning / compliance, end-to-end traceability, which will be discussed in a future article.
Helm vs Operator
When it comes to running complex application workloads on Kubernetes two technologies standout — Helm and Kubernetes Operators. In this post we compare them and discuss how they actually complement each other towards solving problems of day-1 and day-2 operations when it comes to running complex application workloads on Kubernetes. We also present guidelines for creating Helm charts for Operators.
What is Helm?
The basic idea of Helm is to enable reusability of Kubernetes YAML artifacts through templatization. Helm allows defining Kubernetes YAMLs with marked up properties. The actual values for these properties are defined in a separate file. Helm takes the templatized YAMLs and the values file and merges them before deploying the merged YAMLs into a cluster. The package consisting of templatized Kubernetes YAMLs and the values file is called a ‘Helm chart’. Helm project has gained considerable popularity as it solves one of the key problems that enterprises face — creating custom YAMLs for deploying the same application workload with different settings, or deploying it in different environments (dev/test/prod).
What is a Kubernetes Operator?
The basic idea of a Kubernetes Operator is to extend the Kubernetes’s level-driven reconciliation loop and API set towards running stateful application workloads natively on Kubernetes. A Kubernetes Operator consists of Kubernetes Custom Resource(s) / API(s) and Kubernetes Custom Controller(s). The Custom Resources represent an API that takes declarative inputs on the workload being abstracted and the Custom Controller implements corresponding actions on the workload. Kubernetes Operators are typically implemented in a standard programming language (Golang, Python, Java, etc.). An Operator is packaged as a container image and is deployed in a Kubernetes cluster using Kubernetes YAMLs. Once deployed, new Custom Resources (e.g. Mysql, Cassandra, etc.) are available to the end users similar to built-in Resources (e.g. Pod, Service, etc.). This allows them to orchestrate their application workflows more effectively leveraging additional Custom Resources.
Day-1 vs. Day-2 operations
Helm and Operators represent two different phases in managing complex application workloads on Kubernetes. Helm’s primary focus is on the day-1 operation of deploying Kubernetes artifacts in a cluster. The ‘domain’ that it understands is that of Kubernetes YAMLs that are composed of available Kubernetes Resources / APIs. Operators, on the other hand, are primarily focused on addressing day-2 management tasks of stateful / complex workloads such as Postgres, Cassandra, Spark, Kafka, SSL Cert Mgmt etc. on Kubernetes.
Both these mechanisms are complementary to deploying and managing such workloads on Kubernetes. To Helm, Operator deployment YAMLs represent one of the artifact types that can be potentially templatized and deployed with different settings and in different environments. To an Operator developer, Helm represents a standard tool to package, distribute and install Operator deployment YAMLs without tie-in to any Kubernetes vendor or distribution. As an Operator developer, it is tremendously useful for your users if you create Helm chart for its deployment. Below are some guidelines that you should follow when creating such Operator Helm charts.
Guidelines for creating Operator Helm charts
- Register CRDs in Helm chart (and not in Operator code)
As mentioned earlier, an Operator consists of one or more Custom Resources and associated Controller(s). In order for the Custom Resources to be recognized in a cluster, they need to be first registered in the cluster using Kubernetes’s meta API of ‘Custom Resource Definition (CRD)’. The CRD itself is a Kubernetes resource. Our first guideline is that registering the Custom Resources using CRDs should be done as Kubernetes YAML files in a Helm chart, rather than in Operator’s code (Golang/Python, etc.). The primary reason for this is that installing CRD YAML requires cluster-scope permission whereas an Operator Pod may not require cluster-scope permissions for its day-2 operations on an application workload. If you include CRD registration in your Operator’s code, then you will have to deploy your Operator Pod with cluster-scope permissions, which may come in the way of your security preferences. By defining CRD registration in Helm chart, you only need to give the Operator Pod those permissions that are necessary for it to perform the actual day-2 operations on the underlying application workload.
2. Make sure CRDs are getting installed prior to the Operator deployment
The second guideline is to define crd-install hook as part of the Operator Helm chart. What is this hook? ‘crd-install’ is a special annotation that you can add to your Helm chart’s CRD manifest. When installing a chart in a cluster, Helm installs any YAML manifests with this annotation before installing any other manifests of a chart. By adding this annotation you will ensure that the Custom Resources that your Operator works with are registered in the cluster before the Operator is deployed. This will prevent errors that happen if Operator starts running without its CRDs registered in the cluster. Note that in Helm 3.0, this hook is going away. Instead, CRDs are getting a special directory inside the charts directory.
3. Define Custom Resource validation rules in CRD YAML
Next guideline is about adding Custom Resource validation rules as part of CRD YAML. Kubernetes Custom Resources are designed to follow the Open API Spec. Additionally from Operator logic perspective, Custom Resource instances should have valid values for its defined Spec properties. You can define validity rules for the Custom Resource Spec properties as part of CRD YAML. These validation rules are used by Kubernetes machinery before a Custom Resource YAML reaches your Operator through one of the CRUD (Create, Read, Update, Delete) operations on the Custom Resource. This guideline helps prevent end user of the Custom Resource making unintentional errors while using it.
4. Use values.yaml or ConfigMaps for Operator configurables
Your Operator itself may need to be customized for different environments or even for different namespaces within a cluster. The customizations may range from small modifications, such as customizing the base MySQL image used by a MySQL Operator, to larger modifications such as installing different default set of plugins on different Moodle instances by a Moodle Operator. As mentioned earlier, Helm supports mechanisms of Spec property markers and Values.yaml file for templatization/customization. You can leverage these for small modifications that typically can be represented as singular properties in Operator deployment manifests. For supporting larger modifications to an Operator, consider passing the configuration data via ConfigMaps. The names of such ConfigMaps need to be passed to the Operator. For that you can use Helm’s property markers and Values.yaml file.
5. Add Platform-as-Code annotations to enable easy discovery and consumption of Operator’s Custom Resources
The last guideline that we have is to add ‘Platform-as-Code’ annotations on your Operator’s CRD YAML. There are two annotations that we recommend — ‘composition’ and ‘man’. The value of the ‘platform-as-code/composition’ annotation is listing of all Kubernetes built-in resources that are created by the Operator when the Custom Resource instance is created. The value of the ‘platform-as-code/man’ annotation is the name of a ConfigMap that packages ‘man page’ like usage information about the Custom Resource. Make sure to include this ConfigMap in your Operator’s Helm chart. These two annotations are part of our KubePlus API Add-on that simplifies discovery, binding and orchestration of Kubernetes Custom Resources to create platform workflows in typical multi-Operator environments.
Conclusion
Helm and Operators are complementary technologies. Helm is geared towards performing day-1 operations of templatization and deployment of Kubernetes YAMLs — in this case Operator deployment. Operator is geared towards handling day-2 operations of managing application workloads on Kubernetes. You will need both when running stateful / complex workloads on Kubernetes.
At CloudARK, we have been helping platform engineering teams build their custom platform stacks assembling multiple Operators to run their enterprise workloads. Our novel Platform-as-Code technology is designed to simplify workflow management in multi-Operator environments by bringing in consistency across Operators and simplifying consumption of Custom Resources towards defining required platform workflows. For this we have developed comprehensive Operator curation guidelines and open source Platform-as-Code tooling.
Reach out to us to find out how you too can build your Kubernetes Native platforms with Platform-as-Code approach.
Additional References:
Best 11 Kubernetes tool for 2021
Introduction
In this article I will try to summarize my favorite tools for Kubernetes with special emphasis on the newest and lesser known tools which I think will become very popular.
This is just my personal list based on my experience but, in order to avoid biases, I will try to also mention alternatives to each tool so you can compare and decide based on your needs. I will keep this article as short as I can and I will try to provide links so you can explore more on your own. My goal is to answer the question: “How can I do X in Kubernetes?” by describing tools for different software development tasks.
K3D
K3D is my favorite way to run Kubernetes(K8s) clusters on my laptop. It is extremely lightweight and very fast. It is a wrapper around K3S using Docker. So, you only need Docker to run it and it has a very low resource usage. The only problem is that it is not fully K8s compliant, but this shouldn’t be an issue for local development. For test environments you can use other solutions. K3D is faster than Kind, but Kind is fully compliant.
Alternatives
Krew
Krew is an essential tool to manage Kubectl plugins, this is a must have for any K8s user. I won’t go into the details of the more than 145 plugins available but at least install kubens and kubectx.
Lens
Lens is an IDE for K8s for SREs, Ops and Developers. It works with any Kubernetes distribution: on-prem or in the cloud. It is fast, easy to use and provides real time observability. With Lens it is very easy to manage many clusters. This is a must have if you are a cluster operator.
Alternatives
- K9sis an excellent choice for those who prefer a lightweight terminal alternative. K9s continually watches Kubernetes for changes and offers subsequent commands to interact with your observed resources.
Helm
Helm shouldn’t need an introduction, it is the most famous package manager for Kubernetes. And yes, you should use package managers in K8s, same as you use it in programming languages. Helm allows you to pack your application in Charts which abstract complex application into reusable simple components that are easy to define, install and update.
It also provides a powerful templating engine. Helm is mature, has lots of pre defined charts, great support and it is easy to use.
Alternatives
- Kustomize is a newer and great alternative for helm which does not use a templating engine but an overlay engine where you have base definitions and overlays on top of them.
ArgoCD
I believe that GitOps is one of the best ideas of the last decade. In software development, we should use a single source of truth to track all the moving pieces required to build software and Git is a the perfect tool to do that. The idea is to have a Git repository that contains the application code and also declarative descriptions of the infrastructure(IaC) which represent the desired production environment state; and an automated process to make the desired environment match the described state in the repository.
GitOps: versioned CI/CD on top of declarative infrastructure. Stop scripting and start shipping.
Although with Terraform or similar tools you can have your infrastructure as code(IaC), this is not enough to be able to sync your desired state in Git with production. We need a way to continuous monitor the environments and make sure there is no configuration drift. With Terraform you will have to write scripts that run terraform apply
and check if the status matches the Terraform state but this is tedious and hard to maintain.
Kubernetes has been build with the idea of control loops from the ground up, this means that Kubernetes is always watching the state of the cluster to make sure it matches the desired state, for example, that the number of replicas running matches the desired number of replicas. The idea of GitOps is to extend this to applications, so you can define your services as code, for example, by defining Helm Charts, and use a tool that leverages K8s capabilities to monitor the state of your App and adjust the cluster accordingly. That is, if update your code repo, or your helm chart the production cluster is also updated. This is true continuous deployment. The core principle is that application deployment and lifecycle management should be automated, auditable, and easy to understand.
For me this idea is revolutionary and if done properly, will enable organizations to focus more on features and less on writing scripts for automation. This concept can be extended to other areas of Software Development, for example, you can store your documentation in your code to track the history of changes and make sure the documentation is up to date; or track architectural decision using ADRs.
In my opinion, the best GitOps tool in Kubernetes is ArgoCD. You can read more about here. ArgoCD is part of the Argo ecosystem which includes some other great tools, some of which, we will discuss later.
With ArgoCD you can have each environment in a code repository where you define all the configuration for that environment. Argo CD automates the deployment of the desired application state in the specified target environments.
Argo CD is implemented as a kubernetes controller which continuously monitors running applications and compares the current, live state against the desired target state (as specified in the Git repo). Argo CD reports and visualizes the differences and can automatically or manually sync the live state back to the desired target state.
Alternatives
- Flux which just released a new version with many improvements. It offers very similar functionality.
Argo Workflows and Argo Events
In Kubernetes, you may also need to run batch jobs or complex workflows. This could be part of your data pipeline, asynchronous processes or even CI/CD. On top of that, you may need to run even driven microservices that react to certain events like a file was uploaded or a message was sent to a queue. For all of this, we have Argo Workflows and Argo Events.
Although they are separate projects, they tend to be deployed together.
Argo Workflows is an orchestration engine similar to Apache Airflow but native to Kubernetes. It uses custom CRDs to define complex workflows using steps or DAGs using YAML which feels more natural in K8s. It has an nice UI, retries mechanisms, cron based jobs, inputs and outputs tacking and much more. You can use it to orchestrate data pipelines, batch jobs and much more.
Sometimes, you may want to integrate your pipelines with Async services like stream engines like Kafka, queues, webhooks or deep storage services. For example, you may want to react to events like a file uploaded to S3. For this, you will use Argo Events.
These two tools combines provide an easy and powerful solution for all your pipelines needs including CI/CD pipelines which will allow you to run your CI/CD pipelines natively in Kubernetes.
Alternatives
Kanico
We just saw how we can run Kubernetes native CI/CD pipelines using Argo Workflows. One common task is to build Docker images, this is usually tedious in Kubernetes since the build process actually runs on a container itself and you need to use workarounds to use the Docker engine of the host.
The bottom line is that you shouldn’t use Docker to build your images: use Kanico instead. Kaniko doesn’t depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This enables building container images in environments that can’t easily or securely run a Docker daemon, such as a standard Kubernetes cluster. This removes all the issues regarding building images inside a K8s cluster.
Istio
Istio is the most famous service mesh on the market, it is open source and very popular. I won’t go into details regarding what a service mesh is because it is a huge topic, but if you are building microservices, and probably you should, then you will need a service mesh to manage the communication, observability, error handling, security and all of the other cross cutting aspects that come as part of the microservice architecture. Instead of polluting the code of each microservice with duplicate logic, leverage the service mesh to do it for you.
In short, a service mesh is a dedicated infrastructure layer that you can add to your applications. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code.
Istio if used to run microseconds and although you can run Istio and use microservices anywhere, Kubernetes has been proven over and over again as the best platform to run them. Istio can also extend your K8s cluster to other services such as VMs allowing you to have Hybrid environments which are extremely useful when migrating to Kubernetes.
Alternatives
- Linkerd is a lighter and probably faster service mesh. Linkerd is built for security from the ground up, ranging from features like on-by-default mTLS, a data plane that is built in a Rust, memory-safe language, and regular security audits
- Consul is a service mesh built for any runtime and cloud provider, so it works great for hybrid deployments across K8s and cloud providers. It is a great choice if not all your workloads run on Kubernetes.
Argo Rollouts
We mentioned already that you can use Kubernetes to run your CI/CD pipeline using Argo Workflows or a similar tools using Kanico to build your images. The next logical step is to continue and do continuous deployments. This is is extremely challenging to do in a real word scenario due to the high risk involved, that’s why most companies just do continuous delivery, which means that they have the automation in place but they still have manual approvals and verification, this manual step is cause by the fact that the team cannot fully trust their automation.
So how do you build that trust to be able to get rid of all the scripts and fully automate everything from source code all the way to production? The answer is: observability. You need to focus the resources more on metrics and gather all the data needed to accurately represent the state of your application. The goal is to use a set of metrics to build that trust. If you have all the data in Prometheus then you can automate the deployment because you can automate the progressive roll out of your application based on those metrics.
In short, you need more advanced deployment techniques than what K8s offers out of the box which are Rolling Updates. We need progressive delivery using canary deployments. The goal is to progressively route traffic to the new version of an application, wait for metrics to be collected, analyze them and match them against pre define rules. If everything is okay, we increase the traffic; if there are any issues we roll back the deployment.
To do this in Kubernetes, you can use Argo Rollouts which offers Canary releases and much more.
Argo Rollouts is a Kubernetes controller and set of CRDs which provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes.
Although Service Meshes like Istio provide Canary Releases, Argo Rollouts makes this process much easier and developer centric since it was built specifically for this purpose. On top of that Argo Rollouts can be integrated with any service mesh.
Argo Rollouts Features:
- Blue-Green update strategy
- Canary update strategy
- Fine-grained, weighted traffic shifting
- Automated rollbacks and promotions or Manual judgement
- Customizable metric queries and analysis of business KPIs
- Ingress controller integration: NGINX, ALB
- Service Mesh integration: Istio, Linkerd, SMI
- Metric provider integration: Prometheus, Wavefront, Kayenta, Web, Kubernetes Jobs
Alternatives
- Istioas a service mesh for canary releases. Istio is much more than a progressive delivery tool, it is a full service mesh. Istio does not automate the deployment, Argo Rollouts can integrate with Istio to achieve this.
- Flagger is very similar to Argo Rollouts and it very well integrated with Flux, so if your ar using Flux consider Flagger.
- Spinnaker was the first continuous delivery tool for Kubernetes, it has many features but it is a bit more complicated to use and set up.
Crossplane
Crossplane is my new favorite K8s tool, I’m very exited about this project because it brings to Kubernetes a critical missing piece: manage 3rd party services as if they were K8s resources. This means, that you can provision a cloud provider databases such AWS RDS or GCP Cloud SQL like you would provision a database in K8s, using K8s resources defined in YAML.
With Crossplane, there is no need to separate infrastructure and code using different tools and methodologies. You can define everything using K8s resources. This way, you don’t need to learn new tools such as Terraform and keep them separately.
Crossplane is an open source Kubernetes add-on that enables platform teams to assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume, without having to write any code.
Crossplane extends your Kubernetes cluster, providing you with CRDs for any infrastructure or managed cloud service. Furthermore, it allows you to fully implement continuous deployment because contrary to other tools such Terraform, Crossplane uses existing K8s capabilities such as control loops to continuously watch your cluster and detect any configuration drifting acting on it automatically. For example, if you define a managed database instance and someone manually change it, Crossplane will automatically detect the issue and set it back to the previous value. This enforces infrastructure as code and GitOps principles. Crossplane works great with Argo CD which can watch the source code and make sure your code repo is the single source of truth and any changes in the code are propagated to the cluster and also external cloud services. Without Crossplane you could only implement GitOps in your K8s services but not your cloud services without using a separate process, now you can do this, which is awesome.
Alternatives
- Terraform which is the most famous IaC tool but it is not native to K8s, requires new skills and does not automatically watches from configuration drifts.
- Pulumi which is a Terraform alternative which works using programming languages that can be tested and understood by developers.
Knative
If you develop your applications in the cloud you probably have used some Serverless technologies such as AWS Lambdawhich is an event driven paradigm known as FaaS.
I already talked about Serverless in the past, so check my previous article to know more about this. The problem with Serverless is that it is tightly coupled to the cloud provider since the provider can create a great ecosystem for event driven applications.
For Kubernetes, if you want to run functions as code and use an event driven architecture, your best choice is Knative. Knative is build to run functions on Kubernetes creating an abstraction on top of a Pod.
Features:
- Focused API with higher level abstractions for common app use-cases.
- Stand up a scalable, secure, stateless service in seconds.
- Loosely coupled features let you use the pieces you need.
- Pluggable components let you bring your own logging and monitoring, networking, and service mesh.
- Knative is portable: run it anywhere Kubernetes runs, never worry about vendor lock-in.
- Idiomatic developer experience, supporting common patterns such as GitOps, DockerOps, ManualOps.
- Knative can be used with common tools and frameworks such as Django, Ruby on Rails, Spring, and many more.
Alternatives
- Argo Events provide an event-driven workflow engine for Kubernetes that can integrate with cloud engines such as AWS Lambda. It is not FaaS but provides an event driven architecture to Kubernetes.
kyverno
Kubernetes provides great flexibility in order to empower agile autonomous teams but with great power comes great responsibility. There has to be a set of best practices and rules to ensure a consistent and cohesive way to deploy and manage workloads which are compliant with the companies policies and security requirements.
There are several tools to enable this but none were native to Kubernetes… until now. Kyverno is a policy engine designed for Kubernetes, policies are managed as Kubernetes resources and no new language is required to write policies. Kyverno policies can validate, mutate, and generate Kubernetes resources.
You can apply any kind of policy regarding best practices, networking or security. For example, you can enforce that all your service have labels or all containers run as non root. You can check some policy examples here. Policies can be applied to the whole cluster or to a given namespace. You can also choose if you just want to audit the policies or enforce them blocking users from deploying resources.
Alternatives
- Open Policy Agent is a famous cloud native policy-based control engine. It used its own declarative language and it works in many environments, not only on Kubernetes. It is more difficult to manage than Kyvernobut more powerful.
Kubevela
One problem with Kubernetes is that developers need to know and understand very well the platform and the cluster configuration. Many would argue that the level of abstraction in K8s is too low and this causes a lot of friction for developers who just want to focus on writing and shipping applications.
The Open Application Model (OAM) was created to overcome this problem. The idea is to create a higher level of abstraction around applications which is independent of the underlying runtime. You can read the spec here.
Focused on application rather than container or orchestrator, Open Application Model [OAM] brings modular, extensible, and portable design for modeling application deployment with higher level yet consistent API.
Kubevela is an implementation of the OAM model. KubeVela is runtime agnostic, natively extensible, yet most importantly, application-centric . In Kubevela applications are first class citizens implemented as Kubernetes resources. There is a distinction between cluster operators(Platform Team) and developers (Application Team). Cluster operators manage the cluster and the different environments by defining components(deployable/provisionable entities that compose your application like helm charts) and traits. Developers define applications by assembling components and traits.
KubeVela is a Cloud Native Computing Foundation sandbox project and although it is still in its infancy, it can change the way we use Kubernetes in the near future allowing developers to focus on applications without being Kubernetes experts. However, I do have some concerns regarding the applicability of the OAM in the real world since some services like system applications, ML or big data processes depend considerably on low level details which could be tricky to incorporate in the OAM model.
Alternatives
- Shipa follows a similar approach enabling platform and developer teams to work together to easily deploy application to Kubernetes.
Snyk
A very important aspect in any development process is Security, this has always been an issue for Kubernetes since companies who wanted to migrate to Kubernetes couldn’t easily implement their current security principles.
Snyk tries to mitigate this by providing a security framework that can easily integrate with Kubernetes. It can detect vulnerabilities in container images, your code, open source projects and much more.
Velero
If you run your workload in Kubernetes and you use volumes to store data, you need to create and manage backups. Velero provides a simple backup/restore process, disaster recovery mechanisms and data migrations.
Unlike other tools which directly access the Kubernetes etcd database to perform backups and restores, Velero uses the Kubernetes API to capture the state of cluster resources and to restore them when necessary. Additionally, Velero enables you to backup and restore your application persistent data alongside the configurations.
Schema Hero
Another common process in software development is to manage schema evolution when using relational databases.
SchemaHero is an open-source database schema migration tool that converts a schema definition into migration scripts that can be applied in any environment. It uses Kubernetes declarative nature to manage database schema migrations. You just specify the desired state and SchemaHero manages the rest.
Alternatives
- LiquidBase is the most famous alternative. It is more difficult to use and it’s not Kubernetes native but it has more features.
Bitnami Sealed Secrets
We already cover many GitOps tools such as ArgoCD. Our goal is to keep everything in Git and use Kubernetes declarative nature to keep the environments in sync. We just saw how we can (and we should) keep our source of truth in Git and have automated processes handle the configuration changes.
One thing that it was usually hard to keep in Git were secrets such DB passwords or API keys, this is because you should never store secrets in your code repository. One common solution is to use an external vault such as AWS Secret Manageror HashiCorpVaultto store the secrets but this creates a lot of friction since you need to have a separate process to handle secrets. Ideally, we would like a way to safely store secrets in Git just like any other resource.
Sealed Secrets were created to overcome this issue allowing you to store your sensitive data in Git by using strong encryption. Bitnami Sealed Secrets integrate natively in Kubernetes allowing you to decrypt the secrets only by the Kubernetes controller running in Kubernetes and no one else. The controller will decrypt the data and create native K8s secrets which are safely stored. This enables us to store absolutely everything as code in our repo allowing us to perform continuous deployment safely without any external dependencies.
Sealed Secrets is composed of two parts:
- A cluster-side controller
- A client-side utility:
kubeseal
The kubeseal
utility uses asymmetric crypto to encrypt secrets that only the controller can decrypt. These encrypted secrets are encoded in a SealedSecret
K8s resource that you can store in Git.
Alternatives
Capsule
Many companies use multi tenancy to manage different customers. This is quite common in software development but difficult to implement in Kubernetes. Namespaces are a great way to create logical partitions of the cluster as isolated slices but this is not enough in order to securely isolate customers, we need to enforce network policies, quotas and more. You can create network policies and rules per name space but this is a tedious process that it is difficult to scale. Also, tenants will not able to use more than one namespace which is a big limitation.
Hierarchical Namespaces were created to overcome some of these issues. The idea is to have a parent namespace per tenant with common network policies and quotas for the tenants and allow the creation of child namespaces. This is a great improvement but it does not have native support for a tenant in terms of security and governance. Furthermore, it hasn’t reach production status yet but version 1.0 is expected to be release in the next months.
A common approach to currently solve this, is to create a cluster per customer, this is secure and provides everything a tenant will need but this is hard to manage and very expensive.
Capsule is a tool which provides native Kubernetes support for multiple tenants within a single cluster. With Capsule, you can have a single cluster for all your tenants. Capsule will provide an “almost” native experience for the tenants(with some minor restrictions) who will be able to create multiple namespaces and use the cluster as it was entirely available for them hiding the fact that the cluster is actually shared.
In a single cluster, the Capsule Controller aggregates multiple namespaces in a lightweight Kubernetes abstraction called Tenant, which is a grouping of Kubernetes Namespaces. Within each tenant, users are free to create their namespaces and share all the assigned resources while the Policy Engine keeps the different tenants isolated from each other.
The Network and Security Policies, Resource Quota, Limit Ranges, RBAC, and other policies defined at the tenant level are automatically inherited by all the namespaces in the tenant similar to Hierarchical Namespaces. Then users are free to operate their tenants in autonomy, without the intervention of the cluster administrator. Capsule is GitOps ready since it is declarative and all the configuration can be stored in Git.
vCluster
VCluster goes one step further in terms of multi tenancy, it offers virtual clusters inside a Kubernetes cluster. Each cluster runs on a regular namespace and it is fully isolated. Virtual clusters have their own API server and a separate data store, so every Kubernetes object you create in the vcluster only exists inside the vcluster. Also, you can use kube context with virtual clusters to use them like regular clusters.
As long as you can create a deployment inside a single namespace, you will be able to create a virtual cluster and become admin of this virtual cluster, tenants can create namespaces, install CRDs, configure permissions and much more.
vCluster uses k3s as its API server to make virtual clusters super lightweight and cost-efficient; and since k3s clusters are 100% compliant, virtual clusters are 100% compliant as well. vclusters are super lightweight (1 pod), consume very few resources and run on any Kubernetes cluster without requiring privileged access to the underlying cluster. Compared to Capsule, it does use a bit more resources but it offer more flexibility since multi tenancy is just one of the use cases.
Other Tools
- kube-burner is used for stress testing. It provides metrics and alerts.
- kubewatch is used for monitoring but mainly focus on push notifications based on Kubernetes events like resource creation or deletion. It can integrate with many tools like Slack.
- kube-fledged is a Kubernetes add-on for creating and managing a cache of container images directly on the worker nodes of a Kubernetes cluster. As a result, application pods start almost instantly, since the images need not be pulled from the registry.
Conclusion
In this article we have reviewed my favorite Kubernetes tools. I focused on Open Source projects that can be incorporated in any Kubernetes distribution. I didn’t cover comercial solutions such as OpenShift or Cloud Providers Add-Ons since I wanted to keep i generic, but I do encourage to explore what your cloud provider can offer you if you run Kubernetes on the cloud or using a comercial tool.
My goal is to show you that you can do everything you do on-prem in Kubernetes. I also focused more in less known tools which I think may have a lot of potential such Crossplane, Argo Rollouts or Kubevela. The tools that I’m more excited about are vCluster, Crossplane and ArgoCD/Workflows.
Feel free to get in touch if you have any questions or need any advice.
How does ‘kubectl exec’ work?
Last Friday, one of my colleagues approached me and asked a question about how to exec a command in a pod with client-go. I didn’t know the answer and I noticed that I had never thought about the mechanism in “kubectl exec”. I had some ideas about how it should be, but I wasn’t 100% sure. I noted the topic to check again and I have learnt a lot after reading some blogs, docs and source codes. In this blog post, I am going to share my understanding and findings.
Setup
I cloned https://github.com/ecomm-integration-ballerina/kubernetes-cluster in order to create a k8s cluster in my MacBook. I fixed IP addresses of the nodes in kubelet configurations since the default configuration didn’t let me run kubectl exec
. You can find the root cause here.
- Any machine = my MacBook
- IP of master node = 192.168.205.10
- IP of worker node = 192.168.205.11
- API server port = 6443
Components
- kubectl exec process: When we run “kubectl exec …” in a machine, a process starts. You can run it in any machine which has an access to k8s api server.
- api server: Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.
- kubelet: An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.
- container runtime: The software that is responsible for running containers. Examples: docker, cri-o, containerd…
- kernel: kernel of the OS in the worker node which is responsible to manage processes.
- target container: A container which is a part of a pod and which is running on one of the worker nodes.
Findings
1. Activities in Client Side
- Create a pod in default namespace
// any machine $ kubectl run exec-test-nginx --image=nginx
- Then run an exec command and
sleep 5000
to make observation// any machine $ kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh # sleep 5000
- We can observe the kubectl process (pid=8507 in this case)
// any machine $ ps -ef |grep kubectl 501 8507 8409 0 7:19PM ttys000 0:00.13 kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh
- When we check network activities of the process, we can see that it has some connections to api-server (192.168.205.10.6443)
// any machine $ netstat -atnv |grep 8507 tcp4 0 0 192.168.205.1.51673 192.168.205.10.6443 ESTABLISHED 131072 131768 8507 0 0x0102 0x00000020 tcp4 0 0 192.168.205.1.51672 192.168.205.10.6443 ESTABLISHED 131072 131768 8507 0 0x0102 0x00000028
- Let’s check the code. kubectl creates a POST request with subresource
exec
and sends a rest request. req := restClient.Post().
Resource(“pods”).
Name(pod.Name).
Namespace(pod.Namespace).
SubResource(“exec”)
req.VersionedParams(&corev1.PodExecOptions{
Container: containerName,
Command: p.Command,
Stdin: p.Stdin,
Stdout: p.Out != nil,
Stderr: p.ErrOut != nil,
TTY: t.Raw,
}, scheme.ParameterCodec)
return p.Executor.Execute(“POST”, req.URL(), p.Config, p.In, p.Out, p.ErrOut, t.Raw, sizeQueue)This Gist brought to you by gist-it.view rawstaging/src/k8s.io/kubectl/pkg/cmd/exec/exec.go
2. Activities in Master Node
- We can observe the request in api-server side as well.
handler.go:143] kube-apiserver: POST "/api/v1/namespaces/default/pods/exec-test-nginx-6558988d5-fgxgg/exec" satisfied by gorestful with webservice /api/v1 upgradeaware.go:261] Connecting to backend proxy (intercepting redirects) https://192.168.205.11:10250/exec/default/exec-test-nginx-6558988d5-fgxgg/exec-test-nginx?command=sh&input=1&output=1&tty=1 Headers: map[Connection:[Upgrade] Content-Length:[0] Upgrade:[SPDY/3.1] User-Agent:[kubectl/v1.12.10 (darwin/amd64) kubernetes/e3c1340] X-Forwarded-For:[192.168.205.1] X-Stream-Protocol-Version:[v4.channel.k8s.io v3.channel.k8s.io v2.channel.k8s.io channel.k8s.io]]
Notice that the http request includes a protocol upgrade request. SPDY allows for separate stdin/stdout/stderr/spdy-error “streams” to be multiplexed over a single TCP connection. - Api server receives the request and binds it into a
PodExecOptions
// PodExecOptions is the query options to a Pod’s remote exec call
type PodExecOptions struct {
metav1.TypeMeta
// Stdin if true indicates that stdin is to be redirected for the exec call
Stdin bool
// Stdout if true indicates that stdout is to be redirected for the exec call
Stdout bool
// Stderr if true indicates that stderr is to be redirected for the exec call
Stderr bool
// TTY if true indicates that a tty will be allocated for the exec call
TTY bool
// Container in which to execute the command.
Container string
// Command is the remote command to execute; argv array; not executed within a shell.
Command []string
}This Gist brought to you by gist-it.view rawpkg/apis/core/types.go - To be able to take necessary actions, api-server needs to know which location it should contact.// ExecLocation returns the exec URL for a pod container. If opts.Container is blank
// and only one container is present in the pod, that container is used.
func ExecLocation(
getter ResourceGetter,
connInfo client.ConnectionInfoGetter,
ctx context.Context,
name string,
opts *api.PodExecOptions,
) (*url.URL, http.RoundTripper, error) {
return streamLocation(getter, connInfo, ctx, name, opts, opts.Container, “exec”)
}This Gist brought to you by gist-it.view rawpkg/registry/core/pod/strategy.go
Of course the endpoint is derived from node info. nodeName := types.NodeName(pod.Spec.NodeName)
if len(nodeName) == 0 {
// If pod has not been assigned a host, return an empty location
return nil, nil, errors.NewBadRequest(fmt.Sprintf(“pod %s does not have a host assigned”, name))
}
nodeInfo, err := connInfo.GetConnectionInfo(ctx, nodeName)This Gist brought to you by gist-it.view rawpkg/registry/core/pod/strategy.go
GOTCHA! KUBELET HAS A PORT (node.Status.DaemonEndpoints.KubeletEndpoint.Port
) TO WHICH API-SERVER CAN CONNECT.// GetConnectionInfo retrieves connection info from the status of a Node API object.
func (k *NodeConnectionInfoGetter) GetConnectionInfo(ctx context.Context, nodeName types.NodeName) (*ConnectionInfo, error) {
node, err := k.nodes.Get(ctx, string(nodeName), metav1.GetOptions{})
if err != nil {
return nil, err
}
// Find a kubelet-reported address, using preferred address type
host, err := nodeutil.GetPreferredNodeAddress(node, k.preferredAddressTypes)
if err != nil {
return nil, err
}
// Use the kubelet-reported port, if present
port := int(node.Status.DaemonEndpoints.KubeletEndpoint.Port)
if port <= 0 {
port = k.defaultPort
}
return &ConnectionInfo{
Scheme: k.scheme,
Hostname: host,
Port: strconv.Itoa(port),
Transport: k.transport,
}, nil
}This Gist brought to you by gist-it.view rawpkg/kubelet/client/kubelet_client.go
Master-Node Communication > Master to Cluster > apiserver to kubeletThese connections terminate at the kubelet’s HTTPS endpoint. By default, the apiserver does not verify the kubelet’s serving certificate, which makes the connection subject to man-in-the-middle attacks, and unsafe to run over untrusted and/or public networks. - Now, api server knows the endpoint and it opens a connections.// Connect returns a handler for the pod exec proxy
func (r *ExecREST) Connect(ctx context.Context, name string, opts runtime.Object, responder rest.Responder) (http.Handler, error) {
execOpts, ok := opts.(*api.PodExecOptions)
if !ok {
return nil, fmt.Errorf(“invalid options object: %#v”, opts)
}
location, transport, err := pod.ExecLocation(r.Store, r.KubeletConn, ctx, name, execOpts)
if err != nil {
return nil, err
}
return newThrottledUpgradeAwareProxyHandler(location, transport, false, true, true, responder), nil
}This Gist brought to you by gist-it.view rawpkg/registry/core/pod/rest/subresources.go - Let’s check what is going on the master node.
First, learn the ip of the worker node. It is 192.168.205.11
in this case.
// any machine
$ kubectl get nodes k8s-node-1 -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node-1 Ready <none> 9h v1.15.3 192.168.205.11 <none> Ubuntu 16.04.6 LTS 4.4.0-159-generic docker://17.3.3
Then get the kubelet port. It is 10250
in this case.
// any machine
$ kubectl get nodes k8s-node-1 -o jsonpath='{.status.daemonEndpoints.kubeletEndpoint}'
map[Port:10250]
Then check the network. Is there a connection to worker node(192.168.205.11)? THE CONNECTİON IS THERE. When I kill the exec process, it disappears so I know it is set by api-server because of my exec command.
// master node
$ netstat -atn |grep 192.168.205.11
tcp 0 0 192.168.205.10:37870 192.168.205.11:10250 ESTABLISHED
...
- Now the connection between kubectl and api-server is still open and there is another connection between api-server and kubelet.
3. Activities in Worker Node
- Let’s continue by connecting to the worker node and checking what is going on the worker node.
First, we can observe the connection here as well. The second line. 192.168.205.10
is the IP of master node.
// worker node
$ netstat -atn |grep 10250
tcp6 0 0 :::10250 :::* LISTEN
tcp6 0 0 192.168.205.11:10250 192.168.205.10:37870 ESTABLISHED
What about our sleep command? HOORAYYYY!! OUR COMMAND IS THERE!!!!
// worker node
$ ps -afx
...
31463 ? Sl 0:00 \_ docker-containerd-shim 7d974065bbb3107074ce31c51f5ef40aea8dcd535ae11a7b8f2dd180b8ed583a /var/run/docker/libcontainerd/7d974065bbb3107074ce31c51
31478 pts/0 Ss 0:00 \_ sh
31485 pts/0 S+ 0:00 \_ sleep 5000
...
- Wait! How did kubelet do it?
- kubelet has a daemon which serves an api over a port for api-server requests.// Server is the library interface to serve the stream requests.
type Server interface {
http.Handler
// Get the serving URL for the requests.
// Requests must not be nil. Responses may be nil iff an error is returned.
GetExec(*runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error)
GetAttach(req *runtimeapi.AttachRequest) (*runtimeapi.AttachResponse, error)
GetPortForward(*runtimeapi.PortForwardRequest) (*runtimeapi.PortForwardResponse, error)
// Start the server.
// addr is the address to serve on (address:port) stayUp indicates whether the server should
// listen until Stop() is called, or automatically stop after all expected connections are
// closed. Calling Get{Exec,Attach,PortForward} increments the expected connection count.
// Function does not return until the server is stopped.
Start(stayUp bool) error
// Stop the server, and terminate any open connections.
Stop() error
}This Gist brought to you by gist-it.view rawpkg/kubelet/server/streaming/server.go - kubelet computes a response endpoint for exec requests.func (s *server) GetExec(req *runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error) {
if err := validateExecRequest(req); err != nil {
return nil, err
}
token, err := s.cache.Insert(req)
if err != nil {
return nil, err
}
return &runtimeapi.ExecResponse{
Url: s.buildURL(“exec”, token),
}, nil
}This Gist brought to you by gist-it.view rawpkg/kubelet/server/streaming/server.go
Don’t confuse. It doesn’t return the result of the command. It returns an endpoint for communication.
type ExecResponse struct {
// Fully qualified URL of the exec streaming server.
Url string `protobuf:"bytes,1,opt,name=url,proto3" json:"url,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_sizecache int32 `json:"-"`
}
This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go
kubelet implements RuntimeServiceClient
interface which is part of Container Runtime Interface.
// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream.
type RuntimeServiceClient interface {
// Version returns the runtime name, runtime version, and runtime API version.
Version(ctx context.Context, in *VersionRequest, opts ...grpc.CallOption) (*VersionResponse, error)
// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
RunPodSandbox(ctx context.Context, in *RunPodSandboxRequest, opts ...grpc.CallOption) (*RunPodSandboxResponse, error)
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
StopPodSandbox(ctx context.Context, in *StopPodSandboxRequest, opts ...grpc.CallOption) (*StopPodSandboxResponse, error)
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
RemovePodSandbox(ctx context.Context, in *RemovePodSandboxRequest, opts ...grpc.CallOption) (*RemovePodSandboxResponse, error)
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
PodSandboxStatus(ctx context.Context, in *PodSandboxStatusRequest, opts ...grpc.CallOption) (*PodSandboxStatusResponse, error)
// ListPodSandbox returns a list of PodSandboxes.
ListPodSandbox(ctx context.Context, in *ListPodSandboxRequest, opts ...grpc.CallOption) (*ListPodSandboxResponse, error)
// CreateContainer creates a new container in specified PodSandbox
CreateContainer(ctx context.Context, in *CreateContainerRequest, opts ...grpc.CallOption) (*CreateContainerResponse, error)
// StartContainer starts the container.
StartContainer(ctx context.Context, in *StartContainerRequest, opts ...grpc.CallOption) (*StartContainerResponse, error)
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// TODO: what must the runtime do after the grace period is reached?
StopContainer(ctx context.Context, in *StopContainerRequest, opts ...grpc.CallOption) (*StopContainerResponse, error)
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
RemoveContainer(ctx context.Context, in *RemoveContainerRequest, opts ...grpc.CallOption) (*RemoveContainerResponse, error)
// ListContainers lists all containers by filters.
ListContainers(ctx context.Context, in *ListContainersRequest, opts ...grpc.CallOption) (*ListContainersResponse, error)
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
ContainerStatus(ctx context.Context, in *ContainerStatusRequest, opts ...grpc.CallOption) (*ContainerStatusResponse, error)
// UpdateContainerResources updates ContainerConfig of the container.
UpdateContainerResources(ctx context.Context, in *UpdateContainerResourcesRequest, opts ...grpc.CallOption) (*UpdateContainerResourcesResponse, error)
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
ReopenContainerLog(ctx context.Context, in *ReopenContainerLogRequest, opts ...grpc.CallOption) (*ReopenContainerLogResponse, error)
// ExecSync runs a command in a container synchronously.
ExecSync(ctx context.Context, in *ExecSyncRequest, opts ...grpc.CallOption) (*ExecSyncResponse, error)
// Exec prepares a streaming endpoint to execute a command in the container.
Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container.
Attach(ctx context.Context, in *AttachRequest, opts ...grpc.CallOption) (*AttachResponse, error)
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
PortForward(ctx context.Context, in *PortForwardRequest, opts ...grpc.CallOption) (*PortForwardResponse, error)
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
ContainerStats(ctx context.Context, in *ContainerStatsRequest, opts ...grpc.CallOption) (*ContainerStatsResponse, error)
// ListContainerStats returns stats of all running containers.
ListContainerStats(ctx context.Context, in *ListContainerStatsRequest, opts ...grpc.CallOption) (*ListContainerStatsResponse, error)
// UpdateRuntimeConfig updates the runtime configuration based on the given request.
UpdateRuntimeConfig(ctx context.Context, in *UpdateRuntimeConfigRequest, opts ...grpc.CallOption) (*UpdateRuntimeConfigResponse, error)
// Status returns the status of the runtime.
Status(ctx context.Context, in *StatusRequest, opts ...grpc.CallOption) (*StatusResponse, error)
}
This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go
It just uses gRPC to invoke a method through Container Runtime Interface.
type runtimeServiceClient struct {
cc *grpc.ClientConn
}
This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go
func (c *runtimeServiceClient) Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error) {
out := new(ExecResponse)
err := c.cc.Invoke(ctx, "/runtime.v1alpha2.RuntimeService/Exec", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go
Container Runtime is responsible to implement RuntimeServiceServer
// RuntimeServiceServer is the server API for RuntimeService service.
type RuntimeServiceServer interface {
// Version returns the runtime name, runtime version, and runtime API version.
Version(context.Context, *VersionRequest) (*VersionResponse, error)
// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
// ListPodSandbox returns a list of PodSandboxes.
ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
// CreateContainer creates a new container in specified PodSandbox
CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
// StartContainer starts the container.
StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// TODO: what must the runtime do after the grace period is reached?
StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
// ListContainers lists all containers by filters.
ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
// UpdateContainerResources updates ContainerConfig of the container.
UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
// ExecSync runs a command in a container synchronously.
ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
// Exec prepares a streaming endpoint to execute a command in the container.
Exec(context.Context, *ExecRequest) (*ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container.
Attach(context.Context, *AttachRequest) (*AttachResponse, error)
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
// ListContainerStats returns stats of all running containers.
ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
// UpdateRuntimeConfig updates the runtime configuration based on the given request.
UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
// Status returns the status of the runtime.
Status(context.Context, *StatusRequest) (*StatusResponse, error)
}
This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go
- If it is so, we need to observe a connection between kubelet and container runtime. Right? Let’s check.
Run this command before and after running exec command and check the diff. This one is the diff in my case.
// worker node
$ ss -a -p |grep kubelet
...
u_str ESTAB 0 0 * 157937 * 157387 users:(("kubelet",pid=5714,fd=33))
...
Hımmm. There is a new connection via unix sockets between kubelet(pid=5714) and something. Who can be? YES. IT IS DOCKER(pid=1186).
// worker node
$ ss -a -p |grep 157387
...
u_str ESTAB 0 0 * 157937 * 157387 users:(("kubelet",pid=5714,fd=33))
u_str ESTAB 0 0 /var/run/docker.sock 157387 * 157937 users:(("dockerd",pid=1186,fd=14))
...
Remember. This is the docker daemon process(pid=1186) which runs our command.
// worker node.
$ ps -afx
...
1186 ? Ssl 0:55 /usr/bin/dockerd -H fd://
17784 ? Sl 0:00 \_ docker-containerd-shim 53a0a08547b2f95986402d7f3b3e78702516244df049ba6c5aa012e81264aa3c /var/run/docker/libcontainerd/53a0a08547b2f95986402d7f3
17801 pts/2 Ss 0:00 \_ sh
17827 pts/2 S+ 0:00 \_ sleep 5000
...
4. Activities in Container Runtime
- Let’s check cri-o’s source code to understand how it can happen. The logic is similar in docker.
It has a server which implements RuntimeServiceServer.
// Server implements the RuntimeService and ImageService
type Server struct {
config libconfig.Config
seccompProfile *seccomp.Seccomp
stream StreamService
netPlugin ocicni.CNIPlugin
hostportManager hostport.HostPortManager
appArmorProfile string
hostIP string
bindAddress string
*lib.ContainerServer
monitorsChan chan struct{}
defaultIDMappings *idtools.IDMappings
systemContext *types.SystemContext // Never nil
updateLock sync.RWMutex
seccompEnabled bool
appArmorEnabled bool
This Gist brought to you by gist-it.view rawserver/server.go
// Exec prepares a streaming endpoint to execute a command in the container.
func (s *Server) Exec(ctx context.Context, req *pb.ExecRequest) (resp *pb.ExecResponse, err error) {
const operation = "exec"
defer func() {
recordOperation(operation, time.Now())
recordError(operation, err)
}()
resp, err = s.getExec(req)
if err != nil {
return nil, fmt.Errorf("unable to prepare exec endpoint: %v", err)
}
return resp, nil
}
This Gist brought to you by gist-it.view rawserver/container_exec.go
At the end of the chain, container runtime executes the command in the worker node.
// ExecContainer prepares a streaming endpoint to execute a command in the container.
func (r *runtimeOCI) ExecContainer(c *Container, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error {
processFile, err := prepareProcessExec(c, cmd, tty)
if err != nil {
return err
}
defer os.RemoveAll(processFile.Name())
args := []string{rootFlag, r.root, "exec"}
args = append(args, "--process", processFile.Name(), c.ID())
execCmd := exec.Command(r.path, args...)
if v, found := os.LookupEnv("XDG_RUNTIME_DIR"); found {
execCmd.Env = append(execCmd.Env, fmt.Sprintf("XDG_RUNTIME_DIR=%s", v))
}
var cmdErr, copyError error
if tty {
cmdErr = ttyCmd(execCmd, stdin, stdout, resize)
} else {
if stdin != nil {
// Use an os.Pipe here as it returns true *os.File objects.
// This way, if you run 'kubectl exec <pod> -i bash' (no tty) and type 'exit',
// the call below to execCmd.Run() can unblock because its Stdin is the read half
// of the pipe.
r, w, err := os.Pipe()
if err != nil {
return err
}
go func() { _, copyError = pools.Copy(w, stdin) }()
execCmd.Stdin = r
}
if stdout != nil {
execCmd.Stdout = stdout
}
if stderr != nil {
execCmd.Stderr = stderr
}
cmdErr = execCmd.Run()
}
if copyError != nil {
return copyError
}
if exitErr, ok := cmdErr.(*exec.ExitError); ok {
return &utilexec.ExitErrorWrapper{ExitError: exitErr}
}
return cmdErr
}
This Gist brought to you by gist-it.view rawinternal/oci/runtime_oci.go
Finally, kernel executes commands.
Reminders
- api-server can also initialize a connection to kubelet.
- These connections persist until the interactive exec ends.
- Connection between kubectl and api-server
- Connection between api-server and kubelet
- Connection between kubelet and container runtime
- kubectl or api-server cannot run anything in the worker nodes. kubelet can run but it also interacts with container runtime for this kind of actions.
NGINX/PHP-FPM graceful shutdown and 502 errors
We have a PHP application running with Kubernetes in pods with two dedicated containers — NGINX и PHP-FPM.
The problem is that during downscaling clients get 502 errors. E.g. when a pod is stopping, its containers can not correctly close existing connections.
So, in this post, we will take a closer look at the pods’ termination process in general, and NGINX and PHP-FPM containers in particular.
Testing will be performed on the AWS Elastic Kubernetes Service by the Yandex.Tank utility.
Ingress resource will create an AWS Application Load Balancer with the AWS ALB Ingress Controller.
Для управления контейнерами на Kubernetes WorkerNodes испольузется Docker.
Pod Lifecycle — Termination of Pods
So, let’s take an overview of the pods’ stopping and termination process.
Basically, a pod is a set of processes running on a Kubernetes WorkerNode, which are stopped by standard IPC (Inter-Process Communication) signals.
To give the pod the ability to finish all its operations, a container runtime at first ties softly stop it (graceful shutdown) by sending a SIGTERM
signal to a PID 1 in each container of this pod (see docker stop). Also, a cluster starts counting a grace period before force kill this pod by sending a SIGKILL
signal.
The SIGTERM
can be overridden by using the STOPSIGNAL
in an image used to spin up a container.
Thus, the whole flow of the pod’s deleting is (actually, the part below is a kinda copy of the official documentation):
- a user issues a
kubectl delete pod
orkubectl scale deployment
command which triggers the flow and the cluster start countdown of the grace period with the default value set to the 30 second - the API server of the cluster updates the pod’s status — from the Running state, it becomes the Terminating (see Container states). A
kubelet
on the WorkerNode where this pod is running, receives this status update and start the pod’s termination process: - if a container(s) in the pod have a
preStop
hook –kubelet
will run it. If the hook is still running the default 30 seconds on the grace period – another 2 seconds will be added. The grace period can be set with theterminationGracePeriodSeconds
- when a
preStop
hook is finished, akubelet
will send a notification to the Docker runtime to stop containers related to the pod. The Docker daemon will send theSIGTERM
signal to a process with the PID 1 in each container. Containers will get the signal in random order. - simultaneously with the beginning of the graceful shutdown — Kubernetes Control Plane (its
kube-controller-manager
) will remove the pod from the endpoints (see Kubernetes – Endpoints) and a corresponding Service will stop sending traffic to this pod - after the grace period countdown is finished, a
kubelet
will start force shutdown – Docker will send theSIGKILL
signal to all remaining processes in all containers of the pod which can not be ignored and those process will be immediately terminated without change to correctly finish their operations kubelet
triggers deletion of the pod from the API server- API server deletes a record about this pod from the
etcd
Actually, there are two issues:
- the NGINX and PHP-FPM perceives the
SIGTERM
signal as a force как “brutal murder” and will finish their processes immediately , и завершают работу немедленно, without concern about existing connections (see Controlling nginx and php-fpm(8) – Linux man page) - the 2 and 3 steps — sending the
SIGTERM
and an endpoint deletion – are performed at the same time. Still, an Ingress Service will update its data about endpoints not momentarily and a pod can be killed before then an Ingress will stop sending traffic to it causing 502 error for clients as the pod can not accept new connections
E.g. if we have a connection to an NGINX server, the NGINX master process during the fast shutdown will just drop this connection and our client will receive the 502 error, see the Avoiding dropped connections in nginx containers with “STOPSIGNAL SIGQUIT”.
NGINX STOPSIGNAL
and 502
Okay, now we got some understanding of how it’s going — let’s try to reproduce the first issue with NGINX.
The example below is taken from the post above and will be deployed to a Kubernetes cluster.
Prepare a Dockerfile:
FROM nginx
RUN echo 'server {\n\
listen 80 default_server;\n\
location / {\n\
proxy_pass http://httpbin.org/delay/10;\n\
}\n\
}' > /etc/nginx/conf.d/default.conf
CMD ["nginx", "-g", "daemon off;"]
Here NGINX will proxy_pass
a request to the http://httpbin.org which will respond with a 10 seconds delay to emulate a PHP-backend.
Build an image and push it to a repository:
$ docker build -t setevoy/nginx-sigterm .
$ docker push setevoy/nginx-sigterm
Now, add a Deployment manifest to spin up 10 pods from this image.
Here is the full file with a Namespace, Service, and Ingress, in the following part of this post, will add only updated parts of the manifest:
---
apiVersion: v1
kind: Namespace
metadata:
name: test-namespace
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
namespace: test-namespace
labels:
app: test
spec:
replicas: 10
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: web
image: setevoy/nginx-sigterm
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 100Mi
readinessProbe:
tcpSocket:
port: 80
---
apiVersion: v1
kind: Service
metadata:
name: test-svc
namespace: test-namespace
spec:
type: NodePort
selector:
app: test
ports:
- protocol: TCP
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-ingress
namespace: test-namespace
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
spec:
rules:
- http:
paths:
- backend:
serviceName: test-svc
servicePort: 80
Deploy it:
$ kubectl apply -f test-deployment.yaml
namespace/test-namespace created
deployment.apps/test-deployment created
service/test-svc created
ingress.extensions/test-ingress created
Check the Ingress:
$ curl -I aadca942-testnamespace-tes-5874–698012771.us-east-2.elb.amazonaws.com
HTTP/1.1 200 OK
And we have 10 pods running:
$ kubectl -n test-namespace get pod
NAME READY STATUS RESTARTS AGE
test-deployment-ccb7ff8b6–2d6gn 1/1 Running 0 26s
test-deployment-ccb7ff8b6–4scxc 1/1 Running 0 35s
test-deployment-ccb7ff8b6–8b2cj 1/1 Running 0 35s
test-deployment-ccb7ff8b6-bvzgz 1/1 Running 0 35s
test-deployment-ccb7ff8b6-db6jj 1/1 Running 0 35s
test-deployment-ccb7ff8b6-h9zsm 1/1 Running 0 20s
test-deployment-ccb7ff8b6-n5rhz 1/1 Running 0 23s
test-deployment-ccb7ff8b6-smpjd 1/1 Running 0 23s
test-deployment-ccb7ff8b6-x5dc2 1/1 Running 0 35s
test-deployment-ccb7ff8b6-zlqxs 1/1 Running 0 25s
Prepare a load.yaml
for the Yandex.Tank:
phantom:
address: aadca942-testnamespace-tes-5874-698012771.us-east-2.elb.amazonaws.com
header_http: "1.1"
headers:
- "[Host: aadca942-testnamespace-tes-5874-698012771.us-east-2.elb.amazonaws.com]"
uris:
- /
load_profile:
load_type: rps
schedule: const(100,30m)
ssl: false
console:
enabled: true
telegraf:
enabled: false
package: yandextank.plugins.Telegraf
config: monitoring.xml
Here, we will perform 1 request per second to pods behind our Ingress.
Run tests:
All good so far.
Now, scale down the Deployment to only one pod:
$ kubectl -n test-namespace scale deploy test-deployment — replicas=1
deployment.apps/test-deployment scaled
Pods became Terminating:
$ kubectl -n test-namespace get pod
NAME READY STATUS RESTARTS AGE
test-deployment-647ddf455–67gv8 1/1 Terminating 0 4m15s
test-deployment-647ddf455–6wmcq 1/1 Terminating 0 4m15s
test-deployment-647ddf455-cjvj6 1/1 Terminating 0 4m15s
test-deployment-647ddf455-dh7pc 1/1 Terminating 0 4m15s
test-deployment-647ddf455-dvh7g 1/1 Terminating 0 4m15s
test-deployment-647ddf455-gpwc6 1/1 Terminating 0 4m15s
test-deployment-647ddf455-nbgkn 1/1 Terminating 0 4m15s
test-deployment-647ddf455-tm27p 1/1 Running 0 26m
…
And we got our 502 errors:
Next, update the Dockerfile — add the STOPSIGNAL SIGQUIT
:
FROM nginx
RUN echo 'server {\n\
listen 80 default_server;\n\
location / {\n\
proxy_pass http://httpbin.org/delay/10;\n\
}\n\
}' > /etc/nginx/conf.d/default.conf
STOPSIGNAL SIGQUIT
CMD ["nginx", "-g", "daemon off;"]
Build, push:
$ docker build -t setevoy/nginx-sigquit .
docker push setevoy/nginx-sigquit
Update the Deployment with the new image:
...
spec:
containers:
- name: web
image: setevoy/nginx-sigquit
ports:
- containerPort: 80
...
Redeploy, and check again.
Run tests:
Scale down the deployment again:
$ kubectl -n test-namespace scale deploy test-deployment — replicas=1
deployment.apps/test-deployment scaled
And no errors this time:
Great!
Traffic, preStop
, and sleep
But still, if repeat tests few times we still can get some 502 errors:
This time most likely we are facing the second issue — endpoints update is performed at the same time when the SIGTERM
Is sent.
Let’s add a preStop
hook with the sleep
to give some time to update endpoints and our Ingress, so after the cluster will receive a request to stop a pod, a kubelet
on a WorkerNode will wait for 5 seconds before sending the SIGTERM
:
...
spec:
containers:
- name: web
image: setevoy/nginx-sigquit
ports:
- containerPort: 80
lifecycle:
preStop:
exec:
command: ["/bin/sleep","5"]
...
Repeat tests — and now everything is fine
Our PHP-FPM had no such issue as its image was initially built with the STOPSIGNAL SIGQUIT
.
Other possible solutions
And of course, during debugging I’ve tried some other approaches to mitigate the issue.
See links at the end of this post and here I’ll describe them in short terms.
preStop
and nginx -s quit
One of the solutions was to add a preStop
hook which will send QUIT
to NGINX:
lifecycle:
preStop:
exec:
command:
- /usr/sbin/nginx
- -s
- quit
Or:
...
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -SIGQUIT
- 1
....
But it didn’t help. Not sure why as the idea seems to be correct — instead of waiting for the TERM
from Kubernetes/Docker – we gracefully stopping the NGINX master process by sending QUIT
.
You can also run the strace
utility check which signal is really received by the NGINX.
NGINX + PHP-FPM, supervisord
, and stopsignal
Our application is running in two containers in one pod, but during the debugging, I’ve also tried to use a single container with both NGINX and PHP-FPM, for example, trafex/alpine-nginx-php7.
There I’ve tried to add to stopsignal
to the supervisor.conf
for both NGINX and PHP-FPM with the QUIT
value, but this also didn’t help although the idea also seems to be correct.
Still, one can try this way.
PHP-FPM, and process_control_timeout
In the Graceful shutdown in Kubernetes is not always trivial and on the Stackoveflow in the Nginx / PHP FPM graceful stop (SIGQUIT): not so graceful question is a note that FPM’s master process is killed before its child and this can lead to the 502 as well.
Not our current case, but pay your attention to the process_control_timeout
.
NGINX, HTTP, and keep-alive session
Also, it can be a good idea to use the [Connection: close]
header – then the client will close its connection right after a request is finished and this can decrease 502 errors count.
But anyway they will persist if NGINX will get the SIGTERM
during processing a request.
See the HTTP persistent connection.
Deploying Kubernetes on bare metal with Rancher 2.0
Contents
- Install Rancher server
- Create a Kubernetes cluster
- Add Kubernetes nodes
- Install StorageOS as the Kubernetes storage class
- Understand Nginx Ingress in Rancher
Install Rancher
Create a VM with Docker and Docker Compose installed and install Rancher 2.0 with docker compose:
- Rancher docker-compose file: docker-compose.yaml
- Run these commands to install Rancher with docker compose:
git clone https://github.com/polinchw/rancher-docker-compose
cd rancher-docker-compose
docker-compose up -d
Create your Kubernetes cluster with Rancher
Install a custom Kubernetes cluster with Rancher. Use the ‘Custom’ cluster.
Add Kubernetes nodes and join the Kubernetes cluster
Run the following commands on all the VMs that your Kubernetes cluster will run on. The final docker command will have the VM join the new Kubernetes cluster.
Replace the –server and –token with your Rancher server and cluster token.
#!/bin/bash
#sudo apt update
#sudo apt -y dist-upgrade
#Ubuntu (Docker install)
#sudo apt -y install docker.io
sudo apt -y install linux-image-extra-$(uname -r)
#Debian 9 (Docker install)
#sudo apt -y install apt-transport-https ca-certificates curl gnupg2 software-properties-common
#curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
#sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
#sudo apt update
#sudo apt -y install docker-ce
sudo mkdir -p /etc/systemd/system/docker.service.d/
sudo cat <<EOF > /etc/systemd/system/docker.service.d/mount_propagation_flags.conf
[Service]
MountFlags=shared
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker.service
#This is dependent on your Rancher server
sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.1.0-rc9 --server https://75.77.159.159 --token rb8k8kkqw55jqnqbbf4ssdjqtw6hndhfxxcghgv8257kx4p6qsqq55 --ca-checksum 641b2888ce3f1091d20149a495d10457154428f440475b42291b6af1b6c0dd06 --etcd --controlplane --worker
Download the kub config file for the cluster
After you download the kub config file you can use it by running this command:
export KUBECONFIG=$HOME/.kube/rancher-config
Install Helm on the cluster
git clone https://github.com/polinchw/set-up-tiller
cd set-up-tiller
chmod u+x set-up-tiller.sh
./set-up-tiller.sh
helm init --service-account tiller
Install StorageOS Helm Chart
helm repo add storageos https://charts.storageos.com
helm install --name storageos --namespace storageos-operator --version 1.1.3 storageos/storageoscluster-operator
Add the Storage OS Secret
apiVersion: v1
kind: Secret
metadata:
name: storageos-api
namespace: default
labels:
app: storageos
type: kubernetes.io/storageos
data:
# echo -n '<secret>' | base64
apiUsername: c3RvcmFnZW9z
apiPassword: c3RvcmFnZW9z
Add the StorageOSCluster
apiVersion: storageos.com/v1
kind: StorageOSCluster
metadata:
name: example-storageos
namespace: default
spec:
secretRefName: storageos-api
secretRefNamespace: default
csi:
enable: true
Set StorageOS as the default storage class
kubectl patch storageclass fast -p ‘{“metadata”: {“annotations”:{“storageclass.kubernetes.io/is-default-class”:”true”}}}’
Using the default Nginx Igress
Rancher automatically installs the nginx ingress controller on all the nodes in the cluster.
If you are able to expose one of the VMs in the cluster to the outside world with a public IP then you can connect to the ingress based services on ports 80 and 443.
Any app you want to be accessible through the default nginx ingress must be added to the ‘default’ project in Rancher.
Tutorial: creazione di un cluster con un’attività Fargate utilizzando la CLI di Amazon ECS
Questo tutorial mostra come configurare un cluster e distribuire un servizio con attività che utilizzano il tipo di lancio Fargate.
Prerequisiti
Verifica i seguenti requisiti preliminari:
- Configura un account AWS.
- Installa la CLI di Amazon ECS. Per ulteriori informazioni, consulta Installazione dell’interfaccia a riga di comando Amazon ECS.
- Istalla e configura la AWS CLI. Per ulteriori informazioni, consulta la sezione relativa all’interfaccia a riga di comando di AWS.
Fase 1: Crea il ruolo IAM per l’esecuzione dell’attività
L’agente del container Amazon ECS effettua chiamate all’API di AWS per tuo conto, pertanto richiede una policy e un ruolo IAM che consentano al servizio di stabilire che l’agente appartiene a te. Questo ruolo IAM viene definito un ruolo IAM di esecuzione delle attività. Se disponi già di un ruolo per l’esecuzione delle attività pronto per essere utilizzato, puoi ignorare questa fase. Per ulteriori informazioni, consulta Ruolo IAM per l’esecuzione di attività Amazon ECS.
Per creare il ruolo IAM per l’esecuzione delle attività utilizzando AWS CLI
- Crea un file denominato
task-execution-assume-role.json
con i seguenti contenuti:{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
- Crea il ruolo per l’esecuzione delle attività
aws iam --region us-west-2 create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://task-execution-assume-role.json
- Collega la policy relativa al ruolo per l’esecuzione delle attività:
aws iam --region us-west-2 attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Fase 2: Configura la CLI di Amazon ECS
Per poter effettuare richieste API a tuo nome, la CLI di Amazon ECS necessita delle credenziali, che può estrarre da variabili di ambiente, da un profilo AWS o da un profilo Amazon ECS. Per ulteriori informazioni, consulta Configurazione della CLI di Amazon ECS.
Per creare una configurazione della CLI di Amazon ECS
- Crea una configurazione cluster, che definisce la regione AWS, i prefissi di creazione delle risorse e il nome del cluster da utilizzare con la CLI di Amazon ECS:
ecs-cli configure --cluster tutorial --default-launch-type FARGATE --config-name tutorial --region us-west-2
- Crea un profilo CLI utilizzando l’ID chiave di accesso e la chiave segreta:
ecs-cli configure profile --access-key AWS_ACCESS_KEY_ID --secret-key AWS_SECRET_ACCESS_KEY --profile-name tutorial-profile
Fase 3: creare un cluster e configurare il gruppo di sicurezza
Per creare un cluster ECS e un gruppo di sicurezza
- Creare un cluster Amazon ECS con il comando ecs-cli up. Poiché nella configurazione del cluster hai specificato Fargate come tipo di lancio predefinito, il comando crea un cluster vuoto e un VPC configurato con due sottoreti pubbliche.
ecs-cli up --cluster-config tutorial --ecs-profile tutorial-profile
Possono essere necessari alcuni minuti perché le risorse vengano create e il comando venga completato. L’output di questo comando contiene gli ID di sottorete e i VPC creati. Prendi nota di questi ID poiché verranno utilizzati in un secondo momento. - Utilizzando l’AWS CLI, recupera l’ID del gruppo di sicurezza predefinito per il VPC. Utilizza l’ID VPC dell’output precedente:
aws ec2 describe-security-groups --filters Name=vpc-id,Values=VPC_ID --region us-west-2
L’output di questo comando contiene l’ID del gruppo di sicurezza, utilizzato nella fase successiva. - Tramite l’AWS CLI, aggiungi una regola del gruppo di sicurezza per consentire l’accesso in entrata sulla porta 80:
aws ec2 authorize-security-group-ingress --group-id security_group_id --protocol tcp --port 80 --cidr 0.0.0.0/0 --region us-west-2
Fase 4: Crea un file Compose
In questa fase dovrai generare un semplice file Docker Compose che crea un’applicazione Web PHP. Attualmente, la CLI di Amazon ECS supporta le versione 1, 2 e 3 della sintassi del file di Docker Compose. Questo tutorial utilizza Docker Compose v3.
Di seguito è riportato il file di Compose, che puoi denominare docker-compose.yml
. Il container web
espone la porta 80 per il traffico in entrata al server Web, oltre a configurare il posizionamento dei log di container nel gruppo di log CloudWatch creato in precedenza. Questa riportata è la best practice per le attività Fargate.
version: '3'
services:
web:
image: amazon/amazon-ecs-sample
ports:
- "80:80"
logging:
driver: awslogs
options:
awslogs-group: tutorial
awslogs-region: us-west-2
awslogs-stream-prefix: web
Nota
Se il tuo account contiene già un gruppo di log CloudWatch Logs denominato tutorial
nella regione us-west-2
, scegli un nome univoco in modo che l’interfaccia a riga di comando ECS crei un nuovo gruppo di log per questo tutorial.
Oltre alle informazioni del file di Docker Compose, dovrai specificare alcuni parametri specifici di Amazon ECS necessari per il servizio. Utilizzando gli ID per VPC, sottorete e gruppo di sicurezza ottenuti nel passo precedente, crea un file denominato ecs-params.yml
con il seguente contenuto:
version: 1
task_definition:
task_execution_role: ecsTaskExecutionRole
ecs_network_mode: awsvpc
task_size:
mem_limit: 0.5GB
cpu_limit: 256
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet ID 1"
- "subnet ID 2"
security_groups:
- "security group ID"
assign_public_ip: ENABLED
Fase 5: Distribuisci il file Compose a un cluster
Dopo aver creato il file Compose, puoi distribuirlo al cluster con il comando ecs-cli compose service up. Per impostazione predefinita, il comando cerca i file denominati docker-compose.yml
ed ecs-params.yml
nella directory corrente; puoi specificare un altro file di Docker Compose con l’opzione --file
e un altro file ECS Params con l’opzione --ecs-params
. Per impostazione predefinita, le risorse create da questo comando contengono la directory corrente nel titolo, ma puoi sostituire questo valore con l’opzione --project-name
. L’opzione --create-log-groups
crea i gruppi di log CloudWatch per i log di container.
ecs-cli compose --project-name tutorial service up --create-log-groups --cluster-config tutorial --ecs-profile tutorial-profile
Fase 6: Visualizza i container in esecuzione su un cluster
Dopo aver distribuito il file Compose, puoi visualizzare i container in esecuzione nel servizio con il comando ecs-cli compose service ps.
ecs-cli compose --project-name tutorial service ps --cluster-config tutorial --ecs-profile tutorial-profile
Output:
Name State Ports TaskDefinition Health tutorial/0c2862e6e39e4eff92ca3e4f843c5b9a/web RUNNING 34.222.202.55:80->80/tcp tutorial:1 UNKNOWN
Nell’esempio precedente, dal file Compose puoi visualizzare sia il container web
, sia l’indirizzo IP e la porta del server Web. Se il browser Web fa riferimento a tale indirizzo, viene visualizzata l’applicazione Web PHP. Nell’output è riportato anche il valore task-id
del container. Copia l’ID attività, che ti servirà nella fase successiva.
Fase 7: Visualizza i log di container
Visualizza i log per l’attività:
ecs-cli logs --task-id 0c2862e6e39e4eff92ca3e4f843c5b9a --follow --cluster-config tutorial --ecs-profile tutorial-profile
Nota
L’opzione --follow
indica alla CLI di Amazon ECS di eseguire continuamente il polling per i log.
Fase 8: Dimensiona le attività sul cluster
Con il comando ecs-cli compose service scale puoi ampliare il numero di attività per aumentare il numero di istanze dell’applicazione. In questo esempio, il conteggio in esecuzione dell’applicazione viene portato a due.
ecs-cli compose --project-name tutorial service scale 2 --cluster-config tutorial --ecs-profile tutorial-profile
Ora nel cluster saranno presenti due container in più:
ecs-cli compose --project-name tutorial service ps --cluster-config tutorial --ecs-profile tutorial-profile
Output:
Name State Ports TaskDefinition Health tutorial/0c2862e6e39e4eff92ca3e4f843c5b9a/web RUNNING 34.222.202.55:80->80/tcp tutorial:1 UNKNOWN tutorial/d9fbbc931d2e47ae928fcf433041648f/web RUNNING 34.220.230.191:80->80/tcp tutorial:1 UNKNOWN
Fase 9: visualizzare l’applicazione Web
Inserisci l’indirizzo IP dell’attività nel browser Web per visualizzare una pagina Web contenente l’applicazione Web Simple PHP App (App PHP semplice).
Fase 10: Elimina
Al termine di questo tutorial, dovrai eliminare le risorse in modo che non comportino ulteriori addebiti. Per prima cosa, elimina il servizio: in questo modo i container esistenti verranno interrotti e non tenteranno di eseguire altre attività.
ecs-cli compose --project-name tutorial service down --cluster-config tutorial --ecs-profile tutorial-profile
Ora arresta il cluster per eliminare le risorse create in precedenza con il comando ecs-cli up.
ecs-cli down --force --cluster-config tutorial --ecs-profile tutorial-profile
Ready-to-use commands and tips for kubectl
Kubectl is the most important Kubernetes command-line tool that allows you to run commands against clusters. We at Flant internally share our knowledge of using it via formal wiki-like instructions as well as Slack messages (we also have a handy and smart search engine in place — but that’s a whole different story…). Over the years, we have accumulated a large number of various kubectl tips and tricks. Now, we’ve decided to share some of our cheat sheets with a wider community.
I am sure our readers might be familiar with many of them. But still, I hope you will learn something new and, thereby, improve your productivity.
NB: While some of the commands & techniques listed below were compiled by our engineers, others were found on the Web. In the latter case, we checked them thoroughly and found them useful.
Well, let’s get started!
Getting lists of pods and nodes
1. I guess you are all aware of how to get a list of pods across all Kubernetes namespaces using the --all-namespaces
flag. Many people are so used to it that they have not noticed the emergence of its shorter version, -A
(it exists since at least Kubernetes 1.15).
2. How do you find all non-running pods (i.e., with a state other than Running
)?
kubectl get pods -A --field-selector=status.phase!=Running | grep -v Complete
By the way, examining the --field-selector
flag more closely (see the relevant documentation) might be a good general recommendation.
3. Here is how you can get the list of nodes and their memory size:
kubectl get no -o json | \
jq -r '.items | sort_by(.status.capacity.memory)[]|[.metadata.name,.status.capacity.memory]| @tsv'
4. Getting the list of nodes and the number of pods running on them:
kubectl get po -o json --all-namespaces | \
jq '.items | group_by(.spec.nodeName) | map({"nodeName": .[0].spec.nodeName, "count": length}) | sort_by(.count)'
5. Sometimes, DaemonSet does not schedule a pod on a node for whatever reason. Manually searching for them is a tedious task, so here is a mini-script to get a list of such nodes:
ns=my-namespace
pod_template=my-pod
kubectl get node | grep -v \"$(kubectl -n ${ns} get pod --all-namespaces -o wide | fgrep ${pod_template} | awk '{print $8}' | xargs -n 1 echo -n "\|" | sed 's/[[:space:]]*//g')\"
6. This is how you can use kubectl top
to get a list of pods that eat up CPU and memory resources:
# cpu
kubectl top pods -A | sort --reverse --key 3 --numeric
# memory
kubectl top pods -A | sort --reverse --key 4 --numeric
7. Sorting the list of pods (in this case, by the number of restarts):
kubectl get pods --sort-by=.status.containerStatuses[0].restartCount
Of course, you can sort them by other fields, too (see PodStatus and ContainerStatus for details).
Getting other data
1. When tuning the Ingress resource, we inevitably go down to the service itself and then search for pods based on its selector. I used to look for this selector in the service manifest, but later switched to the -o wide
flag:
kubectl -n jaeger get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTORjaeger-cassandra ClusterIP None <none> 9042/TCP 77d app=cassandracluster,cassandracluster=jaeger-cassandra,cluster=jaeger-cassandra
As you can see, in this case, we get the selector used by our service to find the appropriate pods.
2. Here is how you can easily print limits and requests of each pod:
kubectl get pods -n my-namespace -o=custom-columns='NAME:spec.containers[*].name,MEMREQ:spec.containers[*].resources.requests.memory,MEMLIM:spec.containers[*].resources.limits.memory,CPUREQ:spec.containers[*].resources.requests.cpu,CPULIM:spec.containers[*].resources.limits.cpu'
3. The kubectl run
command (as well as create
, apply
, patch
) has a great feature that allows you to see the expected changes without actually applying them — the --dry-run
flag. When it is used with -o yaml
, this command outputs the manifest of the required object. For example:
kubectl run test --image=grafana/grafana --dry-run -o yamlapiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
run: test
name: test
spec:
replicas: 1
selector:
matchLabels:
run: test
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
run: test
spec:
containers:
- image: grafana/grafana
name: test
resources: {}
status: {}
All you have to do now is to save it to a file, delete a couple of system/unnecessary fields, et voila.
NB: Please note that the kubectl run
behavior has been changed in Kubernetes v1.18 (now, it generates Pods instead of Deployments). You can find a great summary on this issue here.
4. Getting a description of the manifest of a given resource:
kubectl explain hpaKIND: HorizontalPodAutoscaler
VERSION: autoscaling/v1DESCRIPTION:
configuration of a horizontal pod autoscaler.FIELDS:
apiVersion <string>
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#resourceskind <string>
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kindsmetadata <Object>
Standard object metadata. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadataspec <Object>
behaviour of autoscaler. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.status <Object>
current information about the autoscaler.
Well, that is a piece of extensive and very helpful information, I must say.
Networking
1. Here is how you can get internal IP addresses of cluster nodes:
kubectl get nodes -o json | \
jq -r '.items[].status.addresses[]? | select (.type == "InternalIP") | .address' | \
paste -sd "\n" -
2. And this way, you can print all services and their respective nodePorts:
kubectl get --all-namespaces svc -o json | \
jq -r '.items[] | [.metadata.name,([.spec.ports[].nodePort | tostring ] | join("|"))]| @tsv'
3. In situations where there are problems with the CNI (for example, with Flannel), you have to check the routes to identify the problem pod. Pod subnets that are used in the cluster can be very helpful in this task:
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' | tr " " "\n"
Logs
1. Print logs with a human-readable timestamp (if it is not set):
kubectl -n my-namespace logs -f my-pod --timestamps2020-07-08T14:01:59.581788788Z fail: Microsoft.EntityFrameworkCore.Query[10100]
Logs look so much better now, don’t they?
2. You do not have to wait until the entire log of the pod’s container is printed out — just use --tail
:
kubectl -n my-namespace logs -f my-pod --tail=50
3. Here is how you can print all the logs from all containers of a pod:
kubectl -n my-namespace logs -f my-pod --all-containers
4. Getting logs from all pods using a label to filter:
kubectl -n my-namespace logs -f -l app=nginx
5. Getting logs of the “previous” container (for example, if it has crashed):
kubectl -n my-namespace logs my-pod --previous
Other quick actions
1. Here is how you can quickly copy secrets from one namespace to another:
kubectl get secrets -o json --namespace namespace-old | \
jq '.items[].metadata.namespace = "namespace-new"' | \
kubectl create-f -
2. Run these two commands to create a self-signed certificate for testing:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=grafana.mysite.ru/O=MyOrganization"
kubectl -n myapp create secret tls selfsecret --key tls.key --cert tls.crt
Helpful links on the topic
In lieu of conclusion — here is a small list of similar publications and cheat sheets’ collections we’ve found online:
- The official cheatsheet from the Kubernetes documentation;
- A short practical introduction plus a detailed 2-page table by Linux Academy. It provides novice engineers with all the basic kubectl commands at a glance:
- An exhaustive list of commands by Blue Matador divided into sections;
- A gist compilation of links to kubectl cheatsheets, articles on the topic, as well as some commands;
- The Kubernetes-Cheat-Sheet GitHub repository by another enthusiast containing kubectl commands categorized by topics;
- The kubectl-aliases GitHub repository which is a real paradise for alias lovers.