Sunday, June 9, 2019

Creating your K8s cluster: possibilities

Kubernetes has been growing in popularity, as it offers a unified architecture to host containerized services, which can be easily and seamlessly released, monitored, scaled, as well as ran on both on-premise, public and private cloud, as well as hybrid. Given its great flexibility, K8s is not only leading the micro-service market but also that of analytics, previously led by hadoop-based solutions.

There exist multiple possibilities:
As discussed in a previous post, Kubernetes Operations (KOPS) or a specific SDK client can be used to create and manage K8s clusters.
In this post I want to achieve the following:
  1. show how a Kubernetes cluster can be easily set up locally on multiple nodes, which in the example are provided as Vagrant VMs; To complicate my life a bit, the master runs on ArchLinux and the workers on CentOS and Ubuntu.
  2. show how the data-mill project can be set up to manage the newly created cluster, especially for what concerns the definition of application-level flavours to be used for Data Science purposes;


1. K8s Cluster installation using Kubeadm

1.1 Vagrant setup

To connect to each node, a key should be created with ssh-keygen -t rsa -b 4096 -f key -N "" and automatically loaded by the provisioner, and added to ~/.ssh/authorized_keys.

This Vagrantfile is used to specify the resources needed to provision a master and 2 worker VMs used to contain the k8s agents.  Specifically,  we define different operating systems for the nodes, archlinux for the master and ubuntu and centos for the nodes, respectively.

The VMs can be created by running:  vagrant up
Once created we can monitor their status with:  vagrant global-status
And similarly destroyed with:  vagrant destroy

1.1 K8s setup

To simplify the configuration, we divided the setup in:
  • start.sh - exports the environment variables and calls the setup script

    #!/usr/bin/env bash

    export CLUSTER_USER="vagrant"
    export MASTER_HOST="192.168.50.10"
    export MASTER_PORT="6443"
    export WORKERS="192.168.50.11 192.168.50.12"
    export SSH_KEY_PATH="$HOME/Documents/k8s-setup/nodes/key"
    export CIDR_NET="10.244.0.0/16"
    export KUBE_VERSION="1.14.1"

    ./setup.sh
  • setup.sh - performs the installation operations on the master and the worker nodes, i.e., the setup script loads the private ssh key to the master node, along with the setup script itself. The setup script is also uploaded on each of the worker nodes. The main difference is that the init action is called on the setup script for the master node, while the add action is called on the same script on each of the worker nodes. Specifically, the setup script consists of the following steps:
    • on the master node (init)
      • installation of docker, using overlay2 as file system
      • deactivation of memory swapping (otherwise the kubelet won't start)
      • installation of kubeadm, kubectl, cni (container network interface)
      • cluster init, using kubeadm init
      • exporting of the join tokens, uploading of the join string on the worker nodes
      • setting KUBECONFIG to the just generated (by kubeadm) admin.conf file, so that kubectl can be used to interact with the cluster (from the master node)
      • adding flannel as cluster resource to manage container networking
      • adding kubernetes dashboard to manage the cluster
    • on the worker nodes (add)
      • installation of docker, using overlay2 as file system
      • deactivation of memory swapping (otherwise the kubelet won't start)
      • installation of kubeadm, kubectl, cni
      • addition of the worker to the cluster using kubeadm join
      • when using vagrant the kubelet config file (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf) is modified to specify the node ip (e.g., Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.50.11"), as otherwise the eth0 is used while we connect over eth1
      • To use kubectl on the master we did set KUBECONFIG=~/.kube/admin.conf. We can copy this file and use the same approach to be able to interact with the cluster from any worker node (if necessary);
    In order to interact with the cluster, the KUBECONFIG variable is set for kubectl from the location we desire to connect from, i.e., our k8s client node, by copying the admin.conf from the master node (KUBECONFIG=~/.kube/admin.conf). This is already done by the setup.sh script, so we can debug our nodes with: kubectl --kubeconfig admin.conf describe nodes., as well as kubectl --kubeconfig admin.conf get nodes -o json to view possible taints on the nodes that prevent certain pods to start. A smart thing to do is however to append the config path to the existing value of KUBECONFIG and then switch the current cluster context. For instance KUBECONFIG=KUBECONFIG:$HOME/admin.conf.
While the methodology has general value, the actual script are to be intended for demonstration purposes and not to be considered for production targets.

2. Setting up the data-mill project

We introduced data-mill in a previous post, and showed how this project simplifies the deployment of so called flavours, i.e. organized collections of components for data ingestion, processing, serving. 

2.1 Installation

The installation of data-mill is described here


The default destination folder is set with:

export DATA_MILL_HOME=$HOME

An installation script is provided in the project folder to copy the files for the latest released version, as well as make the project callable anywhere using a link from the /usr/local/bin folder.

wget https://raw.githubusercontent.com/data-mill-cloud/data-mill/master/install.sh --directory-prefix=$DATA_MILL_HOME
cd $DATA_MILL_HOME
sudo chmod +x install.sh
./install.sh
rm install.sh


2.2 Flavour setup and installation of single components


Data-mill can handle 3 types of clusters:
  • local - using either minikube, microk8s and multipass-microk8s to spawn a local k8s single-node cluster for development purposes;
  • remote - using KOPS to provision a K8s cluster on GKE and EKS;
  • hybrid - using a local KUBECONFIG config file to manage an existing cluster, regardless of it being locally or remotely available; When running this mode, the cluster lifecycle is managed outside data-mill, for instance using the methodology just described at Sect. 1.
The default flavour contains a hybrid cluster definition. We can easily set up the project by calling start:

./run.sh -h -s -f flavours/default.yaml

This sets the kubeconfig for the project and performs helm init for the cluster. The default flavour, lists all, as the components to use, which is not be meant as specific flavour but as reference of all available components. Specifically, installation of all components is a deprecated feature, given the heavy requirements, but feel free to do it. We can then add components to our newly created cluster:

./run.sh -h -i -f flavours/default.yaml -c grafana

And similarly remove them, with:

./run.sh -h -u -f flavours/default.yaml -c grafana

Or just use a specific flavour (e.g. datalake), other than default (which would otherwise install every component):

./run.sh -h -i -f flavours/datalake_flavour.yaml

Easy, isn't it?

No comments:

Post a Comment