A TLDR (Too Long Don't Read) Introduction to Kubernetes
Site Reliability Engineering is the Google approach towards Software Development and Operations (DevOps). Software Engineers with a mixed skillset on Unix and Networking is hired to perform tasks that have in the past been typically performed manually by operations.
Engineers will tend to automate their operation tasks and get involved for 50% of their time in software development. The mixed skillset and the higher degree of automation also allows for implicit documentation across the team and a quicker release into production of new software artifacts. Ideally Google's SRE principles are very similar to those of DevOps. The team members homogeneously share responsibilities over "the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)".
As such, monitoring i.e. logging, event processing and automated alerts (along with dashboards) are necessary to allow promptly response and guarantee that service-level agreements or indicators (e.g., latency, error rate, throughput, availability) are respected. Moreover, postmortems are to be written for each recorded incident, in order to trigger an investigation that can determine the root cause and potentially yield to better understanding of what in the monitoring capabilities did not work.
Demand forecasting and capacity planning are important to verify (mainly by performing load testing) that sizing of resources has been done properly.
In SRE the cluster management software has a crucial role, since we do not act on individual nodes but simply submit jobs to a master which is in charge for finding suitable resources and monitoring the job execution. The Kubernetes cluster manager which we describe in this post is a derived of Google's Borg, on which Google's site reliability engineering methodology is based.
Kubernetes (K8s) is an open source orchestrator for containerized applications (i.e. based on containers, see my previous blog post on Docker). The reason for containers is mainly runtime immutability and reproducibility, which provides projects with higher release speed and stability, as well as allows for better decoupling between load balancers, application APIs and actual service implementations, by generally by using declarative languages to configure set points and service-level agreements for the cluster.
Kubernetes Architecture
This is how the cluster is being organized:
- Nodes - distinguished in a master, generally running the API server and managing/scheduling the cluster resources, and a set of workers running the actual application containers;
- Domain name system (DNS) service to index all services available in the cluster and allow for service discovery;
- Kubernetes UI showing resources and running services;
- Proxy running on every cluster node, it routes the traffic to the individual services running on the node;
This is how a Kubernetes application is organized:
- Namespace, used to organize cluster resources and distinguish them across applications
- Pods, with a pod managing containers and volumes on an individual computing environment (i.e., single cluster node) and as such it is the smallest deployable unit on K8s. A pod groups resources that should necessarily run on the same node. Within a pod, resources share the same IP/hostname. Each pod has a YAML or JSON manifest, used to configure the pod in a declarative manner. Based on the manifest, K8s seeks a machine where it can fit and instantiates it. The pod manifest lists the containers the pod has to run and the minimum (requests) and maximum (limits) resources for each, as well as the exposed ports and mounted volumes. The pod can also expose a REST interface for readiness and liveliness health checks. For instance:
apiVersion: v1
kind: Pod
metadata:
name: podname
spec:
volumes:
- name: "volumeName"
hostPath:
path: "/path/on/host"
containers:
- image: containerImageName
name: containerName
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"
volumeMounts:
- mountPath: "/mount/point"
name: "volumeName"
ports:
- containerPort: 8080
name: portname
protocol: TCP
readinessProbe:
httpGet:
path: /pathrds
port: 8080
periodSeconds: 5
initialDelaySeconds: 0
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
livelinessProbe:
httpGet:
path: /pathlvl
port: 8080
periodSeconds: 5
initialDelaySeconds: 0
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
with the readiness and liveliness health checks reachable at the defined ports and path (which need to be forwarded). Specifically, periodSeconds defines the interval for the check, failure- and successThreshold define the numbers of trials respectively before considering the check failed or succeeded.
- ReplicaSets, which specify how many replicas for a pod should be instantiated to meet higher demand and provide the service with self-healing capabilities. ReplicaSets are used to define scalable services. In a micro service architecture, a replicaset can be used to control a specific micro service. The pod manifest specifies in a declarative way the desired state in terms of number of replicas for the same pod, while a control loop is used to monitor the current state, in order to potentially terminate and restart unresponsive pods and keep the system in the desired state. ReplicaSets manage set of Pods but are not directly coupled to them, i.e., they start and control them using the k8s API but are not directly related to the specific instances, since they are meant to control stateless services, which also allows for adopting existing pods or add additional ones at run time (elastic scaling). An example specification is explained here and reported below:
apiVersion: v1
kind: ReplicaSet
metadata:
name: ReplicaSetUniqueName
spec:
replicas: 2
template:
metadata:
labels:
label1: "value1"
spec:
containers:
- name: serviceName
image: "dockerImage:version"
as visible, the replicaSet simply wraps the pod template by adding additional metadata and a number of pod replicas to be kept. The labels are used to discover and distinguish the pods and running containers belonging to a specific replicaSet. Based on those labels, the k8s API will be returning a different set of pods, consequently leading to different control actions (e.g. ramping up additional pods if not enough). A ReplicaSet can also be autoscaled, for instance based on certain computing resources, such as CPU or memory (see kubectl table below).
- DaemonSets are used, contrarily to ReplicaSets, to define a service that should run on each individual node (thus the name daemon rather than replica) and does not require coordination across different nodes (as the replicaset). An example specification is reported below:
apiVersion: v1
kind: DaemonSet
metadata:
name: uniqueDSName
namespace: nsname
labels:
label: "value"
spec:
template:
metadata:
labels:
label1: "value1"
spec:
nodeSelector:
labelName: "value"
containers:
...
as previously, the daemonset wraps a pod specification. The daemonset does instantiate a pod on each cluster node, unless a nodeSelector is specified. In the example the labelName is set to value, so that only nodes having that label set to that value can be used to host a pod (see table for how to set labels to nodes). A rolling update can be set to automatically update all pods in the set.
- Jobs are processes that are expected to terminate upon fulfilling their purpose. In case of failure of a controlled pod, this will be re-created based on the template specification. A job can be defined using the restart=OnFailure option and started using the run command:
kubectl run jobname \
--image=dockerimage \
--restart=OnFailure \
-- --flag1 \
--var1 val1
alternatively a yaml file can be specified, as usual:
apiVersion: v1
kind: Job
metadata:
name: jname
labels:
...
spec:
parallelism: 2
template:
metadata:
...
spec:
containers:
...
- Deployments can be used to manage multiple versions and specify releases, which can be rolled out without any down time. A deployment can be specified using the usual yaml format:
apiVersion: v1
kind: Deployment
metadata:
...
spec:
replicas: 2
selector:
...
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
...
with the strategy part defining how the rollout should take place, either RollingUpdate which rolls out a different version without any down time, or Recreate, which terminates all pods and creates them using the newer version. For the RollingUpdate, the maxunavailable defines the maximum number of pods that can be unavailable during a rolling update. A maxUnvailable set to 100% of pods means that no pod should become unavailable and consequently additional pods should be initiated before older ones are terminated. The maxSurge defines how many additional pods can be rolled out in the process. Setting this value to the one of the pods in the deployment would mean ramping up all those for the newer version before terminating any of the older one.
Installing Kubernetes
The "kubeadm init" command can be used to firstly start a master node, and then join it from each worker node using the "kubeadm join". Visit this link to get started.
Running Kubernetes
Kubernetes can be ran directly on one of the public cloud providers such as Google (i.e., the Google container service) and Azure (i.e, Azure container service) or installed on the Amazon EC2 (see tutorial here). Kubernetes can be tested quickly using minikube, a single node Kubernetes cluster, which can be simply started by running "minikube start".
Deploying a Service
service can be deployed using the run command:
kubectl run deploymentname \
--image=dockerimage \
--replicas=n \
--labels="lab1=val1,lab2=val2"
where a docker image is specified to be started, along with a number of replicas and a set of labels to be used as metadata to annotate the instance.
The labels being used in the system can be retrieved with:
kubectl get deployments --show-labels
Similarly we can retrieve the objects using a certain label with:
kubectl get deployments -L labelname
Labels can also be added using the label command:
kubectl label deployments deploymentname "label=value"
kubectl label nodes nodename "label=value"
Kubernetes Commands
A Kubernetes cluster is controlled from the kubectl CLI. Here are some commands for a quick reference:
Kubectl Command | Description |
---|---|
kubectl get nodes | Lists all the cluster worker nodes |
kubectl describe nodes node-id | Returns information on a node (e.g. resources, number of pods) |
kubectl --namespace=nsname | The flag should be used to select a specific namespace |
kubectl get pods | Lists all pods running on the cluster |
kubectl apply -f pod-manifest.yaml | Instantiates a pod based on a manifest yaml file |
kubectl logs podname | Retrieves the logs for the podname pod |
kubectl describe pods podname | Returns info on the running podname pod |
kubectl describe rs rsname | Returns info on the running replicaset rsname |
kubectl exec podname command | Executes the command in the podname pod |
kubectl exec -it podname command | Executes the command and opens an interactive session in the podname pod (e.g. bash) |
kubectl port-forward podname lport:rport | Forwards lport:rport on podname |
kubectl delete pods/podname | Gracefully terminates and then deletes podname |
kubectl delete -f pod-manifest.yaml | Gracefully terminates and then deletes the pod or replicaset defined in pod-manifest.yaml |
kubectl edit deployment/deploymentname | Fetches the deployment, opens its manifest in an editor |
kubectl scale rsname --replicas=n | Forces the number of replicas to scale to n |
kubectl scale deploymentname --replicas=n |
Forces the number of replicas to scale to n |
kubectl autoscale rs rsname --min=1 --max=10 --cpu-percent=80 |
Autoscales the number of pods in (1, 10) based on a 80% cpu threshold |
kubectl get hpa | Returns the defined autoscalers |
kubectl delete rs rsname | Deletes the replicaset rsname |
kubectl describe daemonset dsname | Returns info for the dsname daemonset |
kubectl label nodes nodename "label=value" | Adds the label to the node metadata |
kubectl rollout status deployments dname | Returns the status of a deployment rollout |
kubectl rollout pause deployments dname | Pauses a deployment rollout |
kubectl rollout resume deployments dname | Resumes a deployment rollout |
kubectl rollout history deployment dname | Retrieves the deployment rollout history |
Have fun!
Andrea.
Bibliography
Bibliography
- K. Hightower et Al. Kubernetes - Up & Running. Dive into the future of infrastructure. O'Reilly 2017. Online available at https://landing.google.com/sre/book.
- B. Beyer et Al. Site Reliability Engineering. How Google runs production systems. O'Reilly 2017.
Excellent article on the importance Android, Keep updating the recent updates of OS. Thank you admin.
ReplyDeleteAndroid Training Institute in Chennai | Android Training Institute in anna nagar | Android Training Institute in omr | Android Training Institute in porur | Android Training Institute in tambaram | Android Training Institute in velachery
Excellent Blog! I would Thanks for sharing this wonderful content.its very useful to us.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
ReplyDeleteoracle training in chennai
oracle training institute in chennai
oracle training in bangalore
oracle training in hyderabad
oracle training
hadoop training in chennai
hadoop training in bangalore
ReplyDeleteAt the point when I at first
best interiors