Kubernetes: Basics

April 6, 2018 9-minute read

What is Kubernetes

Led by Google, the Kubernetes project aims to build a robust platform for running many containers. It allows for the automation of deployments, scaling, and management of containerized applications. Kubernetes enables teams to more effortlessly move workloads across cloud infrastructures by grouping containers into logical units for easy management and discovery.

Why migrate to Kubernetes?

more efficient use of cloud resources – resulting in cleaner architecture and huge potential savings in cloud costs
removal of centralized team release responsibilities in lieu of specific functional teams
decentralized schedule for release cycles across product areas

Architecture: Master nodes

A Kubernetes cluster is composed of two types of nodes; master nodes and worker nodes. Master nodes run the control plane, consisting of the following components:

API server is the external REST interface for the cluster. Clients interact with the cluster via the API, which validates and persists changes within etcd.
The Scheduler is responsible for the resolution of desired state changes, by deciding where resources/objects should be placed within the cluster.
The Controller Manager is the core component that manages controller resource implementations within the cluster where users can specify number of replicas or failover minimum criteria across the cluster.
etcd distributed key-value store

All cluster nodes run kubelets which interact with the Docker APIs in order to execute cluster commands, execute/destroy containers, etc…

Architecture: Worker Nodes, Networking

Each worker node runs kube-proxy, a simple network proxy and load balancer. Kubernetes, kube-proxy integration with Container Networking Interface (CNI) allows for cluster network orchestration. Each Pod gets its own IP address. Pod lifecycle events trigger attachment/removal from the underlying software-defined network. CNI implementations allow the cluster to cross-resolve Pods without resorting to NAT.

There are various implementations of CNI. The most commonly used ones are:

Calico
Weavenet by WeaveWorks
Flannel

Namespaces

provide a logical partition of the cluster’s resources
kubernetes resources are scoped to a namespace
all resources within a namespace must have unique names
kubernetes allows for assigned quotas for resource limitations within a namespace
namespace of default is used if none other is specified
the kube-system is used for etcd instances, the API servers, controller managers, schedulers, kube proxies, and weave networking CNI pods

API Versioning

Kubernetes provides a clear and consistent API by versioning at the API level rather than at the resource level. The Kubernetes API maturity levels include; stable, beta, alpha.

Kubernetes employs API Groups to aid in frictionless extension of the platform. An API Group consists of a REST path and API version. The core or legacy API Group is /api/v1 and newer, named groups are at REST path: /apis/$GROUP_NAME/$VERSION.

Use kubectl api-versions to get a list of all APIs supported by your clusters API Server.

Pods

A pod the smallest deployable unit in Kubernetes. A pod is a logical group of one or more containers that share the same IP address and port space.

Containers within a pod can find each other via localhost, and can also communicate with each other using standard inter-process communication mechanisms.

Pods are not durable.

apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  containers:
  - name: redis
    image: redis

Pods and Selectors/Labels

A label is a key/value pair that is attached to a Kubernetes resource. Labels can be attached to resources at creation time, as well as added and modified at any later time.

A label selector can be to organize Kubernetes resources that have labels. An equality-based selector defines a condition for selecting resources that have the specified label value. A set-based selector defines a condition for selecting resources that have a label value within the specified set of values.

Services

A service uses a selector to define a logical group of pods and defines a policy to access such logical groups. Services are durable and provide some mechanism of pod discovery. There are several types of services in Kubernetes:

A ClusterIP service exposes pods to connections from inside the cluster (this is the default).
A NodePort service exposes pods to external traffic by forwarding traffic from a port on each node of the cluster to the container port.
A LoadBalancer service also exposes pods to external traffic, as NodePort service does, and additionally supplies load balancer functionality.

apiVersion: v1
kind: Pod
metadata:
  name: redis
  labels:
    color: purple
spec:
  containers:
  - name: redis
    image: redis
    ports:
    - containerPort: 6379
      name: redis-server
apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  labels:
    name: redis-svc
spec:
  ports:
    - port: 2222
      targetPort: redis-server
  selector:
    color: purple

Pod Accessibility, Debugging

The aforementioned example defined a pod with a redis container and a corresponding ClusterIP-based service. The network infrastructure of our cluster is such that there are no network pathways to the cluster except for the API Server, thus, how do we interact with our container and service?

Run a busybox container in “interactive” mode allowing you to access cluster services within your namespace: kubectl run -i --tty busybox --image=busybox --restart=Never --rm=true -- sh
Execute a port-forward from your machine to a specific pod: kubectl port-forward $pod_name $local_port:$remote_port

Controllers, Deployments, and Replicas

A controller manages a set of pods and ensures that the cluster is in the specified state. Unlike manually created pods, the pods maintained by a replication controller are automatically replaced if they fail, get deleted, or are terminated.

A deployment defines a desired state for logical group of pods and replica sets. It creates new resources or replaces the existing resources, if necessary. A deployment can be updated, rolled out, or rolled back.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: redis-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        color: purple
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - containerPort: 6379
          name: redis-server
apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  labels:
    name: redis-svc
spec:
  ports:
    - port: 2222
      targetPort: redis-server
  selector:
    color: purple

A replication controller is responsible for running the specified number of pod copies (replicas) across the cluster.

A replica set is the next-generation replication controller. A replication controller supports only equality-based selectors (== vs !=), while a replica set supports set-based selectors (environment in [production, qa]).

Persistence: Volumes and Claims

A container file system is ephemeral: if a container crashes, the changes to its file system are lost. A volume is defined at the pod level, and is used to preserve data across container crashes. A volume can be also used to share data between containers in a pod. A volume has the same lifecycle as the the pod that encloses it—when a pod is deleted, the volume is deleted as well.

A persistent volume has a lifecycle independent of any individual pod.

A persistent volume claim defines a specific amount of storage requested and specific access modes. Kubernetes finds a matching persistent volume and binds it with the persistent volume claim.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 128M
  storageClassName: gp2
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: redis-deploy
spec:
  replicas: 1
  template:
    metadata:
      labels:
        color: purple
    spec:
      containers:
      - name: master
        image: redis
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-data-vol
          mountPath: /data
      volumes:
      - name: redis-data-vol
        persistentVolumeClaim:
          claimName: redis-pvc
apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  labels:
    name: redis-svc
spec:
  ports:
    - port: 2222
      targetPort: 6379
  selector:
    color: purple

Attachment to PVCs must adhere to the PVC access modes:

ReadWriteOnce – one bound Pod with R/W access
ReadOnlyMany – many bound Pods with R only access
ReadWriteMany – many bound Pods with R/W access

Cloud providers implement access modes based on their underlying resource specifications which are not consistent across providers. For example, AWS only implements ReadWriteOnce meaning PVCs must only be bound one-to-one with Pods.

See the persistent volume provider access modes table here for more info.

StatefulSets

Similar to Deployments, StatefulSets manage Pods that are based on an identical container spec. However, although their specs are the same, the Pods in a StatefulSet are not interchangeable. Each Pod has a persistent identifier that it maintains across any rescheduling enabling stable network ids and storage (PVCs).

StatefulSets also operate according to the Controller pattern. You define your desired state in a StatefulSet object, and the StatefulSet controller makes any necessary updates to the get there from the current state.

apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  labels:
    app: redis
spec:
  ports:
  - port: 2222
    targetPort: 6379
  clusterIP: None
  selector:
    color: purple
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: "redis"
  replicas: 2
  selector:
    matchLabels:
      color: purple
  template:
    metadata:
      labels:
        color: purple
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-pvc
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: redis-pvc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

Config Maps and Secrets

A Kubernetes secret allows users to pass sensitive information to containers. A secret can then be referenced when declaring a container definition and read from with containers.

A Kubernetes config map allows users to externalize application configuration parameters from a container image and define application configuration details, such as key/value pairs, directory content, or file content. Config map values can be consumed by applications through:

environment variables
local disks (mount points)
command line arguments

Config Maps: Environment Vars

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-configmap
data:
  data-1: value-1
  data-2: value-2
apiVersion: v1
kind: Pod
metadata:
  name: config-env-test-pod
spec:
  containers:
    - name: test-container
      image: busybox
      command: [ "/bin/sh", "-c", "env" ]
      env:
        - name: KUBE_CONFIG_1
          valueFrom:
            configMapKeyRef:
              name: test-configmap
              key: data-1
        - name: KUBE_CONFIG_2
          valueFrom:
            configMapKeyRef:
              name: test-configmap
              key: data-2
  restartPolicy: Never

Config Maps: Mounted Volumes

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-configmap
data:
  data-1: value-1
apiVersion: v1
kind: Pod
metadata:
  name: config-volume-test-pod
spec:
  containers:
    - name: test-container
      image: busybox
      command: [ "/bin/sh", "-c", "cat/etc/config/path/to/special-key" ]
      volumeMounts:
      - name: config-volume
        mountPath: /etc/config
  volumes:
    - name: config-volume
      configMap:
        name: test-configmap
        items:
        - key: data-1
          path: path/to/special-key
  restartPolicy: Never

Health Checking

Kubernetes allows for two types of health checking of Pods:

Readiness probes allow the kubelet to assess if a service is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready.
Liveness probes allow the kubelet to monitor and assess when to restart a Container.

There are three different Health Check evaluation criteria available for both the Readiness and Liveliness probes:

HTTP Health Checks - The Kubelet will call a web hook. If it returns between 200 and 399, it is considered success, failure otherwise.
Container Exec - The Kubelet will execute a command inside your container. If it exits with status 0 it will be considered a success.
TCP Socket - The Kubelet will attempt to open a socket to your container. If it can establish a connection, the container is considered healthy, if it can’t it is considered a failure.

apiVersion: v1
kind: Pod
metadata:
  name: redis-tcp-probe
spec:
  containers:
  - name: redis
    image: redis
    livenessProbe:
      tcpSocket:
        port: 6379
      initialDelaySeconds: 30
      timeoutSeconds: 1
    ports:
    - containerPort: 6379

Init Containers

In addition to having multiple containers, Pods can also have zero to multiple InitContainers. These are useful for bootstrapping data stores, prepping dirs, waiting on custom services, or really anything that you want to happen before your regular containers start.

InitContainers run serially and must all exit cleanly for your regular containers begin execution.

apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  containers:
  - name: redis
    image: redis
    volumeMounts:
    - name: redis-data-vol
      mountPath: /data
  initContainers:
  - image: busybox
    name: seed-rdb-data-file
    volumeMounts:
    - name: redis-data-vol
      mountPath: /data
    command: ["/bin/sh", "-c", "echo UkVESVMwMDA4+glyZWRpcy12ZXIFNC4wLjH6CnJlZGlzLWJpdHPAQPoFY3RpbWXCZzvlWfoIdXNlZC1tZW3CcHoPAPoMYW9mLXByZWFtYmxlwAD6B3JlcGwtaWQoN2E1YjQzNmZmNTFkODYwMGQxOGEyNzYwYmRmY2EwOWRmZDM0YTZlY/oLcmVwbC1vZmZzZXTAAP4A+wEAAA5mYXZvcml0ZS1jb2xvcgdibHVycGxl/ydubEDBnEVJ | base64 -d > /data/dump.rdb"]
  volumes:
  - name: redis-data-vol
    emptyDir: {}

Resource Limits

Pods can specify the amount of memory and CPU it requires to function. Specifying physical resource requirements allows the Kubernetes Scheduler to make better decisions about where it will place Pods.

Memory limits are measured in bytes and can be declared as bytes, as fixed point integers (eg: 128M), and power-of-two equivalents (eg: 123Mi).

CPU limits are measured as units, generally as a CPU core and can be fractional; 1, .5, etc… A Pod with a CPU limit of .5 is allowed 2x the power of one limited to .25.

See Kubernetes Resource Requests and Limits

apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  containers:
  - name: redis
    image: redis
resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi

Resource Scheduling

Out of the box, the Kubernetes Scheduler works to ensure that:

Pods are only placed on nodes with free physical resources
Pods of the same StatefulSet or ReplicaSet are spread across different nodes
resource utilization across Nodes remains as evenly balanced as possible

Kubernetes does allow customizations, however allowing for:

node affinity/anti-affinity
taints and tolerations
pod affinity/anti-affinity
custom schedulers

Pod Events

What happens when a Pod is created?

Create pod instruction, manifest transmitted to and verified by the Kubernetes API Server
The API Server writes data to etcd
The Scheduler listens for changes (new pod) via the API Server and binds the pod to a node. The API server receives the bind operation result and writes the data to etcd.
Kubelets listens for changes (bind pod) via the API Server and executes docker commands on the node. The kubelet transmits execution result and status information the API Server which writes data to etcd.

Additional Topics

Jobs (cluster level)
Cron jobs (namespaced)
RBAC
Ingress rules