How does ‘kubectl exec’ work?

Last Friday, one of my colleagues approached me and asked a question about how to exec a command in a pod with client-go. I didn’t know the answer and I noticed that I had never thought about the mechanism in “kubectl exec”. I had some ideas about how it should be, but I wasn’t 100% sure. I noted the topic to check again and I have learnt a lot after reading some blogs, docs and source codes. In this blog post, I am going to share my understanding and findings.

Setup

I cloned https://github.com/ecomm-integration-ballerina/kubernetes-cluster in order to create a k8s cluster in my MacBook. I fixed IP addresses of the nodes in kubelet configurations since the default configuration didn’t let me run kubectl exec. You can find the root cause here.

  • Any machine = my MacBook
  • IP of master node = 192.168.205.10
  • IP of worker node = 192.168.205.11
  • API server port = 6443

Components

  • kubectl exec process: When we run “kubectl exec …” in a machine, a process starts. You can run it in any machine which has an access to k8s api server.
  • api server: Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.
  • kubelet: An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.
  • container runtime: The software that is responsible for running containers. Examples: docker, cri-o, containerd…
  • kernel: kernel of the OS in the worker node which is responsible to manage processes.
  • target container: A container which is a part of a pod and which is running on one of the worker nodes.

Findings

1. Activities in Client Side

  • Create a pod in default namespace// any machine $ kubectl run exec-test-nginx --image=nginx
  • Then run an exec command and sleep 5000 to make observation// any machine $ kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh # sleep 5000
  • We can observe the kubectl process (pid=8507 in this case)// any machine $ ps -ef |grep kubectl 501 8507 8409 0 7:19PM ttys000 0:00.13 kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh
  • When we check network activities of the process, we can see that it has some connections to api-server (192.168.205.10.6443)// any machine $ netstat -atnv |grep 8507 tcp4 0 0 192.168.205.1.51673 192.168.205.10.6443 ESTABLISHED 131072 131768 8507 0 0x0102 0x00000020 tcp4 0 0 192.168.205.1.51672 192.168.205.10.6443 ESTABLISHED 131072 131768 8507 0 0x0102 0x00000028
  • Let’s check the code. kubectl creates a POST request with subresource exec and sends a rest request.                req := restClient.Post().
                            Resource(“pods”).
                            Name(pod.Name).
                            Namespace(pod.Namespace).
                            SubResource(“exec”)
                    req.VersionedParams(&corev1.PodExecOptions{
                            Container: containerName,
                            Command:   p.Command,
                            Stdin:     p.Stdin,
                            Stdout:    p.Out != nil,
                            Stderr:    p.ErrOut != nil,
                            TTY:       t.Raw,
                    }, scheme.ParameterCodec)

                    return p.Executor.Execute(“POST”, req.URL(), p.Config, p.In, p.Out, p.ErrOut, t.Raw, sizeQueue)This Gist brought to you by gist-it.view rawstaging/src/k8s.io/kubectl/pkg/cmd/exec/exec.go

2. Activities in Master Node

  • We can observe the request in api-server side as well.handler.go:143] kube-apiserver: POST "/api/v1/namespaces/default/pods/exec-test-nginx-6558988d5-fgxgg/exec" satisfied by gorestful with webservice /api/v1 upgradeaware.go:261] Connecting to backend proxy (intercepting redirects) https://192.168.205.11:10250/exec/default/exec-test-nginx-6558988d5-fgxgg/exec-test-nginx?command=sh&input=1&output=1&tty=1 Headers: map[Connection:[Upgrade] Content-Length:[0] Upgrade:[SPDY/3.1] User-Agent:[kubectl/v1.12.10 (darwin/amd64) kubernetes/e3c1340] X-Forwarded-For:[192.168.205.1] X-Stream-Protocol-Version:[v4.channel.k8s.io v3.channel.k8s.io v2.channel.k8s.io channel.k8s.io]] Notice that the http request includes a protocol upgrade request. SPDY allows for separate stdin/stdout/stderr/spdy-error “streams” to be multiplexed over a single TCP connection.
  • Api server receives the request and binds it into a PodExecOptions// PodExecOptions is the query options to a Pod’s remote exec call
    type PodExecOptions struct {
            metav1.TypeMeta

            // Stdin if true indicates that stdin is to be redirected for the exec call
            Stdin bool

            // Stdout if true indicates that stdout is to be redirected for the exec call
            Stdout bool

            // Stderr if true indicates that stderr is to be redirected for the exec call
            Stderr bool

            // TTY if true indicates that a tty will be allocated for the exec call
            TTY bool

            // Container in which to execute the command.
            Container string

            // Command is the remote command to execute; argv array; not executed within a shell.
            Command []string
    }This Gist brought to you by gist-it.view rawpkg/apis/core/types.go
  • To be able to take necessary actions, api-server needs to know which location it should contact.// ExecLocation returns the exec URL for a pod container. If opts.Container is blank
    // and only one container is present in the pod, that container is used.
    func ExecLocation(
            getter ResourceGetter,
            connInfo client.ConnectionInfoGetter,
            ctx context.Context,
            name string,
            opts *api.PodExecOptions,
    ) (*url.URL, http.RoundTripper, error) {
            return streamLocation(getter, connInfo, ctx, name, opts, opts.Container, “exec”)
    }This Gist brought to you by gist-it.view rawpkg/registry/core/pod/strategy.go
    Of course the endpoint is derived from node info.        nodeName := types.NodeName(pod.Spec.NodeName)
            if len(nodeName) == 0 {
                    // If pod has not been assigned a host, return an empty location
                    return nil, nil, errors.NewBadRequest(fmt.Sprintf(“pod %s does not have a host assigned”, name))
            }
            nodeInfo, err := connInfo.GetConnectionInfo(ctx, nodeName)This Gist brought to you by gist-it.view rawpkg/registry/core/pod/strategy.go
    GOTCHA! KUBELET HAS A PORT (node.Status.DaemonEndpoints.KubeletEndpoint.Port) TO WHICH API-SERVER CAN CONNECT.// GetConnectionInfo retrieves connection info from the status of a Node API object.
    func (k *NodeConnectionInfoGetter) GetConnectionInfo(ctx context.Context, nodeName types.NodeName) (*ConnectionInfo, error) {
            node, err := k.nodes.Get(ctx, string(nodeName), metav1.GetOptions{})
            if err != nil {
                    return nil, err
            }

            // Find a kubelet-reported address, using preferred address type
            host, err := nodeutil.GetPreferredNodeAddress(node, k.preferredAddressTypes)
            if err != nil {
                    return nil, err
            }

            // Use the kubelet-reported port, if present
            port := int(node.Status.DaemonEndpoints.KubeletEndpoint.Port)
            if port <= 0 {
                    port = k.defaultPort
            }

            return &ConnectionInfo{
                    Scheme:    k.scheme,
                    Hostname:  host,
                    Port:      strconv.Itoa(port),
                    Transport: k.transport,
            }, nil
    }This Gist brought to you by gist-it.view rawpkg/kubelet/client/kubelet_client.go
    Master-Node Communication > Master to Cluster > apiserver to kubeletThese connections terminate at the kubelet’s HTTPS endpoint. By default, the apiserver does not verify the kubelet’s serving certificate, which makes the connection subject to man-in-the-middle attacks, and unsafe to run over untrusted and/or public networks.
  • Now, api server knows the endpoint and it opens a connections.// Connect returns a handler for the pod exec proxy
    func (r *ExecREST) Connect(ctx context.Context, name string, opts runtime.Object, responder rest.Responder) (http.Handler, error) {
            execOpts, ok := opts.(*api.PodExecOptions)
            if !ok {
                    return nil, fmt.Errorf(“invalid options object: %#v”, opts)
            }
            location, transport, err := pod.ExecLocation(r.Store, r.KubeletConn, ctx, name, execOpts)
            if err != nil {
                    return nil, err
            }
            return newThrottledUpgradeAwareProxyHandler(location, transport, false, true, true, responder), nil
    }This Gist brought to you by gist-it.view rawpkg/registry/core/pod/rest/subresources.go
  • Let’s check what is going on the master node.

First, learn the ip of the worker node. It is 192.168.205.11 in this case.

// any machine
$ kubectl get nodes k8s-node-1 -o wide
NAME         STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s-node-1   Ready    <none>   9h    v1.15.3   192.168.205.11   <none>        Ubuntu 16.04.6 LTS   4.4.0-159-generic   docker://17.3.3

Then get the kubelet port. It is 10250 in this case.

// any machine
$ kubectl get nodes k8s-node-1 -o jsonpath='{.status.daemonEndpoints.kubeletEndpoint}'
map[Port:10250]

Then check the network. Is there a connection to worker node(192.168.205.11)? THE CONNECTİON IS THERE. When I kill the exec process, it disappears so I know it is set by api-server because of my exec command.

// master node
$ netstat -atn |grep 192.168.205.11
tcp        0      0 192.168.205.10:37870    192.168.205.11:10250    ESTABLISHED
...

  • Now the connection between kubectl and api-server is still open and there is another connection between api-server and kubelet.

3. Activities in Worker Node

  • Let’s continue by connecting to the worker node and checking what is going on the worker node.

First, we can observe the connection here as well. The second line. 192.168.205.10 is the IP of master node.

  // worker node
  $ netstat -atn |grep 10250
  tcp6       0      0 :::10250                :::*                    LISTEN
  tcp6       0      0 192.168.205.11:10250    192.168.205.10:37870    ESTABLISHED

What about our sleep command? HOORAYYYY!! OUR COMMAND IS THERE!!!!

  // worker node
  $ ps -afx
  ...
  31463 ?        Sl     0:00      \_ docker-containerd-shim 7d974065bbb3107074ce31c51f5ef40aea8dcd535ae11a7b8f2dd180b8ed583a /var/run/docker/libcontainerd/7d974065bbb3107074ce31c51
  31478 pts/0    Ss     0:00          \_ sh
  31485 pts/0    S+     0:00              \_ sleep 5000
  ...

  • Wait! How did kubelet do it?
  • kubelet has a daemon which serves an api over a port for api-server requests.// Server is the library interface to serve the stream requests.
    type Server interface {
            http.Handler

            // Get the serving URL for the requests.
            // Requests must not be nil. Responses may be nil iff an error is returned.
            GetExec(*runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error)
            GetAttach(req *runtimeapi.AttachRequest) (*runtimeapi.AttachResponse, error)
            GetPortForward(*runtimeapi.PortForwardRequest) (*runtimeapi.PortForwardResponse, error)

            // Start the server.
            // addr is the address to serve on (address:port) stayUp indicates whether the server should
            // listen until Stop() is called, or automatically stop after all expected connections are
            // closed. Calling Get{Exec,Attach,PortForward} increments the expected connection count.
            // Function does not return until the server is stopped.
            Start(stayUp bool) error
            // Stop the server, and terminate any open connections.
            Stop() error
    }This Gist brought to you by gist-it.view rawpkg/kubelet/server/streaming/server.go
  • kubelet computes a response endpoint for exec requests.func (s *server) GetExec(req *runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error) {
            if err := validateExecRequest(req); err != nil {
                    return nil, err
            }
            token, err := s.cache.Insert(req)
            if err != nil {
                    return nil, err
            }
            return &runtimeapi.ExecResponse{
                    Url: s.buildURL(“exec”, token),
            }, nil
    }This Gist brought to you by gist-it.view rawpkg/kubelet/server/streaming/server.go

Don’t confuse. It doesn’t return the result of the command. It returns an endpoint for communication.

type ExecResponse struct {
        // Fully qualified URL of the exec streaming server.
        Url                  string   `protobuf:"bytes,1,opt,name=url,proto3" json:"url,omitempty"`
        XXX_NoUnkeyedLiteral struct{} `json:"-"`
        XXX_sizecache        int32    `json:"-"`
}

This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go

kubelet implements RuntimeServiceClient interface which is part of Container Runtime Interface.

// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream.
type RuntimeServiceClient interface {
        // Version returns the runtime name, runtime version, and runtime API version.
        Version(ctx context.Context, in *VersionRequest, opts ...grpc.CallOption) (*VersionResponse, error)
        // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
        // the sandbox is in the ready state on success.
        RunPodSandbox(ctx context.Context, in *RunPodSandboxRequest, opts ...grpc.CallOption) (*RunPodSandboxResponse, error)
        // StopPodSandbox stops any running process that is part of the sandbox and
        // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
        // If there are any running containers in the sandbox, they must be forcibly
        // terminated.
        // This call is idempotent, and must not return an error if all relevant
        // resources have already been reclaimed. kubelet will call StopPodSandbox
        // at least once before calling RemovePodSandbox. It will also attempt to
        // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
        // multiple StopPodSandbox calls are expected.
        StopPodSandbox(ctx context.Context, in *StopPodSandboxRequest, opts ...grpc.CallOption) (*StopPodSandboxResponse, error)
        // RemovePodSandbox removes the sandbox. If there are any running containers
        // in the sandbox, they must be forcibly terminated and removed.
        // This call is idempotent, and must not return an error if the sandbox has
        // already been removed.
        RemovePodSandbox(ctx context.Context, in *RemovePodSandboxRequest, opts ...grpc.CallOption) (*RemovePodSandboxResponse, error)
        // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
        // present, returns an error.
        PodSandboxStatus(ctx context.Context, in *PodSandboxStatusRequest, opts ...grpc.CallOption) (*PodSandboxStatusResponse, error)
        // ListPodSandbox returns a list of PodSandboxes.
        ListPodSandbox(ctx context.Context, in *ListPodSandboxRequest, opts ...grpc.CallOption) (*ListPodSandboxResponse, error)
        // CreateContainer creates a new container in specified PodSandbox
        CreateContainer(ctx context.Context, in *CreateContainerRequest, opts ...grpc.CallOption) (*CreateContainerResponse, error)
        // StartContainer starts the container.
        StartContainer(ctx context.Context, in *StartContainerRequest, opts ...grpc.CallOption) (*StartContainerResponse, error)
        // StopContainer stops a running container with a grace period (i.e., timeout).
        // This call is idempotent, and must not return an error if the container has
        // already been stopped.
        // TODO: what must the runtime do after the grace period is reached?
        StopContainer(ctx context.Context, in *StopContainerRequest, opts ...grpc.CallOption) (*StopContainerResponse, error)
        // RemoveContainer removes the container. If the container is running, the
        // container must be forcibly removed.
        // This call is idempotent, and must not return an error if the container has
        // already been removed.
        RemoveContainer(ctx context.Context, in *RemoveContainerRequest, opts ...grpc.CallOption) (*RemoveContainerResponse, error)
        // ListContainers lists all containers by filters.
        ListContainers(ctx context.Context, in *ListContainersRequest, opts ...grpc.CallOption) (*ListContainersResponse, error)
        // ContainerStatus returns status of the container. If the container is not
        // present, returns an error.
        ContainerStatus(ctx context.Context, in *ContainerStatusRequest, opts ...grpc.CallOption) (*ContainerStatusResponse, error)
        // UpdateContainerResources updates ContainerConfig of the container.
        UpdateContainerResources(ctx context.Context, in *UpdateContainerResourcesRequest, opts ...grpc.CallOption) (*UpdateContainerResourcesResponse, error)
        // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
        // for the container. This is often called after the log file has been
        // rotated. If the container is not running, container runtime can choose
        // to either create a new log file and return nil, or return an error.
        // Once it returns error, new container log file MUST NOT be created.
        ReopenContainerLog(ctx context.Context, in *ReopenContainerLogRequest, opts ...grpc.CallOption) (*ReopenContainerLogResponse, error)
        // ExecSync runs a command in a container synchronously.
        ExecSync(ctx context.Context, in *ExecSyncRequest, opts ...grpc.CallOption) (*ExecSyncResponse, error)
        // Exec prepares a streaming endpoint to execute a command in the container.
        Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error)
        // Attach prepares a streaming endpoint to attach to a running container.
        Attach(ctx context.Context, in *AttachRequest, opts ...grpc.CallOption) (*AttachResponse, error)
        // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
        PortForward(ctx context.Context, in *PortForwardRequest, opts ...grpc.CallOption) (*PortForwardResponse, error)
        // ContainerStats returns stats of the container. If the container does not
        // exist, the call returns an error.
        ContainerStats(ctx context.Context, in *ContainerStatsRequest, opts ...grpc.CallOption) (*ContainerStatsResponse, error)
        // ListContainerStats returns stats of all running containers.
        ListContainerStats(ctx context.Context, in *ListContainerStatsRequest, opts ...grpc.CallOption) (*ListContainerStatsResponse, error)
        // UpdateRuntimeConfig updates the runtime configuration based on the given request.
        UpdateRuntimeConfig(ctx context.Context, in *UpdateRuntimeConfigRequest, opts ...grpc.CallOption) (*UpdateRuntimeConfigResponse, error)
        // Status returns the status of the runtime.
        Status(ctx context.Context, in *StatusRequest, opts ...grpc.CallOption) (*StatusResponse, error)
}

This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go

It just uses gRPC to invoke a method through Container Runtime Interface.

type runtimeServiceClient struct {
        cc *grpc.ClientConn
}

This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go

func (c *runtimeServiceClient) Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error) {
        out := new(ExecResponse)
        err := c.cc.Invoke(ctx, "/runtime.v1alpha2.RuntimeService/Exec", in, out, opts...)
        if err != nil {
                return nil, err
        }
        return out, nil
}

This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go

Container Runtime is responsible to implement RuntimeServiceServer

// RuntimeServiceServer is the server API for RuntimeService service.
type RuntimeServiceServer interface {
        // Version returns the runtime name, runtime version, and runtime API version.
        Version(context.Context, *VersionRequest) (*VersionResponse, error)
        // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
        // the sandbox is in the ready state on success.
        RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
        // StopPodSandbox stops any running process that is part of the sandbox and
        // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
        // If there are any running containers in the sandbox, they must be forcibly
        // terminated.
        // This call is idempotent, and must not return an error if all relevant
        // resources have already been reclaimed. kubelet will call StopPodSandbox
        // at least once before calling RemovePodSandbox. It will also attempt to
        // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
        // multiple StopPodSandbox calls are expected.
        StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
        // RemovePodSandbox removes the sandbox. If there are any running containers
        // in the sandbox, they must be forcibly terminated and removed.
        // This call is idempotent, and must not return an error if the sandbox has
        // already been removed.
        RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
        // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
        // present, returns an error.
        PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
        // ListPodSandbox returns a list of PodSandboxes.
        ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
        // CreateContainer creates a new container in specified PodSandbox
        CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
        // StartContainer starts the container.
        StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
        // StopContainer stops a running container with a grace period (i.e., timeout).
        // This call is idempotent, and must not return an error if the container has
        // already been stopped.
        // TODO: what must the runtime do after the grace period is reached?
        StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
        // RemoveContainer removes the container. If the container is running, the
        // container must be forcibly removed.
        // This call is idempotent, and must not return an error if the container has
        // already been removed.
        RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
        // ListContainers lists all containers by filters.
        ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
        // ContainerStatus returns status of the container. If the container is not
        // present, returns an error.
        ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
        // UpdateContainerResources updates ContainerConfig of the container.
        UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
        // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
        // for the container. This is often called after the log file has been
        // rotated. If the container is not running, container runtime can choose
        // to either create a new log file and return nil, or return an error.
        // Once it returns error, new container log file MUST NOT be created.
        ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
        // ExecSync runs a command in a container synchronously.
        ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
        // Exec prepares a streaming endpoint to execute a command in the container.
        Exec(context.Context, *ExecRequest) (*ExecResponse, error)
        // Attach prepares a streaming endpoint to attach to a running container.
        Attach(context.Context, *AttachRequest) (*AttachResponse, error)
        // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
        PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
        // ContainerStats returns stats of the container. If the container does not
        // exist, the call returns an error.
        ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
        // ListContainerStats returns stats of all running containers.
        ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
        // UpdateRuntimeConfig updates the runtime configuration based on the given request.
        UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
        // Status returns the status of the runtime.
        Status(context.Context, *StatusRequest) (*StatusResponse, error)
}

This Gist brought to you by gist-it.view rawstaging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go

  • If it is so, we need to observe a connection between kubelet and container runtime. Right? Let’s check.

Run this command before and after running exec command and check the diff. This one is the diff in my case.

// worker node
$ ss -a -p |grep kubelet
...
u_str  ESTAB      0      0       * 157937                * 157387                users:(("kubelet",pid=5714,fd=33))
...

Hımmm. There is a new connection via unix sockets between kubelet(pid=5714) and something. Who can be? YES. IT IS DOCKER(pid=1186).

// worker node
$ ss -a -p |grep 157387
...
u_str  ESTAB      0      0       * 157937                * 157387                users:(("kubelet",pid=5714,fd=33))
u_str  ESTAB      0      0      /var/run/docker.sock 157387                * 157937                users:(("dockerd",pid=1186,fd=14))
...

Remember. This is the docker daemon process(pid=1186) which runs our command.

// worker node.
$ ps -afx
...
 1186 ?        Ssl    0:55 /usr/bin/dockerd -H fd://
17784 ?        Sl     0:00      \_ docker-containerd-shim 53a0a08547b2f95986402d7f3b3e78702516244df049ba6c5aa012e81264aa3c /var/run/docker/libcontainerd/53a0a08547b2f95986402d7f3
17801 pts/2    Ss     0:00          \_ sh
17827 pts/2    S+     0:00              \_ sleep 5000
...

4. Activities in Container Runtime

  • Let’s check cri-o’s source code to understand how it can happen. The logic is similar in docker.

It has a server which implements RuntimeServiceServer.

// Server implements the RuntimeService and ImageService
type Server struct {
        config          libconfig.Config
        seccompProfile  *seccomp.Seccomp
        stream          StreamService
        netPlugin       ocicni.CNIPlugin
        hostportManager hostport.HostPortManager

        appArmorProfile string
        hostIP          string
        bindAddress     string

        *lib.ContainerServer
        monitorsChan      chan struct{}
        defaultIDMappings *idtools.IDMappings
        systemContext     *types.SystemContext // Never nil

        updateLock sync.RWMutex

        seccompEnabled  bool
        appArmorEnabled bool

This Gist brought to you by gist-it.view rawserver/server.go

// Exec prepares a streaming endpoint to execute a command in the container.
func (s *Server) Exec(ctx context.Context, req *pb.ExecRequest) (resp *pb.ExecResponse, err error) {
        const operation = "exec"
        defer func() {
                recordOperation(operation, time.Now())
                recordError(operation, err)
        }()

        resp, err = s.getExec(req)
        if err != nil {
                return nil, fmt.Errorf("unable to prepare exec endpoint: %v", err)
        }

        return resp, nil
}

This Gist brought to you by gist-it.view rawserver/container_exec.go

At the end of the chain, container runtime executes the command in the worker node.

// ExecContainer prepares a streaming endpoint to execute a command in the container.
func (r *runtimeOCI) ExecContainer(c *Container, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error {
        processFile, err := prepareProcessExec(c, cmd, tty)
        if err != nil {
                return err
        }
        defer os.RemoveAll(processFile.Name())

        args := []string{rootFlag, r.root, "exec"}
        args = append(args, "--process", processFile.Name(), c.ID())
        execCmd := exec.Command(r.path, args...)
        if v, found := os.LookupEnv("XDG_RUNTIME_DIR"); found {
                execCmd.Env = append(execCmd.Env, fmt.Sprintf("XDG_RUNTIME_DIR=%s", v))
        }
        var cmdErr, copyError error
        if tty {
                cmdErr = ttyCmd(execCmd, stdin, stdout, resize)
        } else {
                if stdin != nil {
                        // Use an os.Pipe here as it returns true *os.File objects.
                        // This way, if you run 'kubectl exec <pod> -i bash' (no tty) and type 'exit',
                        // the call below to execCmd.Run() can unblock because its Stdin is the read half
                        // of the pipe.
                        r, w, err := os.Pipe()
                        if err != nil {
                                return err
                        }
                        go func() { _, copyError = pools.Copy(w, stdin) }()

                        execCmd.Stdin = r
                }
                if stdout != nil {
                        execCmd.Stdout = stdout
                }
                if stderr != nil {
                        execCmd.Stderr = stderr
                }

                cmdErr = execCmd.Run()
        }

        if copyError != nil {
                return copyError
        }
        if exitErr, ok := cmdErr.(*exec.ExitError); ok {
                return &utilexec.ExitErrorWrapper{ExitError: exitErr}
        }
        return cmdErr
}

This Gist brought to you by gist-it.view rawinternal/oci/runtime_oci.go

Finally, kernel executes commands.

Reminders

  • api-server can also initialize a connection to kubelet.
  • These connections persist until the interactive exec ends.
    • Connection between kubectl and api-server
    • Connection between api-server and kubelet
    • Connection between kubelet and container runtime
  • kubectl or api-server cannot run anything in the worker nodes. kubelet can run but it also interacts with container runtime for this kind of actions.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *