Saturday, December 4, 2021

TLDR Intro to Skaffold

Introduction

Skaffold is a tool aiding developers with the build and deployment lifecycle of k8s applications. The core idea is to add a skaffold.yaml file defining the various pipeline stages, such as build, test, deploy, portForward.  The following image shows the skaffold stages:


Specifically, the commands:

  • init generates a skaffold.yaml file after detecting Docker, Jib and Buildpacks builders;
  • file sync - copies changed files to an already deployed containers to avoid the need to rebuild, redeploy and restart the corresponding pod; sync can be: i) manual (with src and dest fields defined a priori), ii) infer (with dest inferred by the builder, e.g. from docker add commands) and iii) auto (both src and dest configured for all files).
  • build wraps specific build tools to unify and ease local, in-cluster (e.g. using kaniko) and remote builds, as well as the management of target images on registries; As shown in this example dependencies between artifacts can also be specified, so that a base image is firstly built and then inherited by another artifact:

    apiVersion: skaffold/v2beta26
    kind: Config
    build:
      artifacts:
      - image: app
        context: app
        requires:
        - image: base
          alias: BASE
      - image: base
        context: base
    deploy:
      kubectl:
        manifests:
        - app/k8s-pod.yaml

    Specifically, the base image builds the Dockerfile in the base folder (as specified by the context), while the app folder contains and defines the Dockerfile for the app one. This also allows the app image to copy artifacts as in a docker multistage build.
  • test to perform checks after building and before the deploy stage
  • tag to define how built images should be tagged, such as using a digest of source files (inputDigest), a gitCommit, a datetime, or a formatting of environment variables (envTemplate)

    build:
      tagPolicy:
        envTemplate:
          template: "{{.FOO}}"
      artifacts:
      - image: gcr.io/k8s-skaffold/example

    using the variable FOO;

  • deploy 
    • performs a render step to generate actual K8s manifests, Helm templates or Kustomize overlays
    • deploys the rendered manifests to the cluster
    • waits until deployed resources stabilize, as specified by healthchecks
Beside those basic building blocks there exist shortcuts to trigger end-to-end pipelines:
  • run - to build and deploy the workflow defined in the skaffold.yaml only once (as a CI/CD tool)
  • dev - initially behaving like run for the end-to-end build and deployment runs, but with skaffold watching the source files for changes, so that they can be re-built, pushed, test and re-deployed to the target cluster upon any changes; 
  • debug - behaves like skaffold dev but also configures containers for debugging by exposing debugging ports that can be port-forwarded to the local machine and debugged from an IDE.

See the full skaffold specification reference here.


---
Cheers!


References

  • https://skaffold.dev/
  • https://github.com/GoogleContainerTools/skaffold/tree/main/examples

Saturday, November 20, 2021

TLDR Intro to Kustomize

 1. Motivation

Suppose you have to deploy off-the-shelf a complex component, be it defined as a plain Yaml or a Helm chart, which require some customization. A possibility may be to edit the yaml or fork the existing helm to a new chart. Cool, you have now become the maintainer! From now on you will be integrating changes to the project in your derived version. Beside the additional work required to ensure compatibility, this brings in the risk of adding bugs and misconfigurations.

2. Kustomize

2.1 About

Kustomize comes to the rescue by allowing for the combination of source configurations and the definition and application of declarative changes as overrides or patches. For instance, whereas one centralized configuration may be defined for a deployment multiple flavours may be defined depending on the target environment of choice (e.g. test, QA, prod).

2.2 Installation

Kustomize is now fully integrated with kubectl since 1.14 so there are no additional requirements to care of. Alternatively, kustomize is a go project that can be installed with a usual go get (i.e. go get sigs.k8s.io/kustomize/kustomize/v3). See here for an overview.

If you need kubectl or have an older version, you may want to have a look here to update it.

Depending on your choice you will be using one of the following:

  1. kustomize build <path>
  2. kubectl kustomize <path> 
To render and apply in one single step the latter can be replaced with kubectl apply -k <path>.

2.2 Basic Usage

The basic idea behind kustomize is to define a kustomization.yaml file along with the existing configuration files. The kustomization defines modifications and a list of resources on which they are to be applied to. 

For instance [1]:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
  app: my-app
resources:
- deployment.yaml
- service.yaml

When running kustomize build or kubectl kustomize on the directory a yaml configuration is rendered and print to stdout.

Kustomize can also compose configurations residing in different directories, by creating multiple bases. As in the example above, a base is defined by creating a kustomization.yaml file, listing a number of resources. Once done, a base can be recalled by listing it as a base, for instance:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
  app: my-app
bases:
  - ./mysql
  - ./my-app

2.3 Overlays

The power of kustomize comes from the possibility of defining overlays over base configurations [2]. Overlays define patches over base configurations, to specialize behavior (e.g. patchesJson6902) or to add additional resources (e.g. patchesStrategicMerge).  Here is a complete list of those features.

For instance:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
- deployment.yaml

patchesJson6902:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: my-nginx
  path: patch.yaml

with the patch.yaml being:

- op: replace
  path: /spec/replicas
  value: 3

where the overlay does change the number of replicas to 3.

3. Integration in GitOps - ArgoCD

There are multiple possibilities to use Kustomize in ArgoCD: i) simply apply kustomize to the path monitored by the ArgoCD Application resource, as in this example, and ii) apply kustomize to the output rendered by Helm [3, 4]. The latter requires the following steps:

1. kustomized-helm is added to the plugins in the argocd-cm configmap

configManagementPlugins: |
  - name: kustomized-helm
    init:
      command: ["/bin/sh", "-c"]
      args: ["helm dependency build"]
    generate:
      command: [sh, -c]
      args: ["helm template --release-name release-name . > all.yaml && kustomize build"]


2. a kustomization.yaml is placed in the folder monitored by the ArgoCD Application, in order to refer to the all.yaml rendered as output of the helm template command

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - all.yaml
patchesJson6902:
...

3. the plugin is referred directly by the ArgoCD Application under spec.source.plugin.name: kustomized-helm

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapplication
  namespace: argocd
spec:
  project: myproject
  source:
    path: myapplication
    repoURL: {{ .Values.spec.source.repoURL }}
    targetRevision: {{ .Values.spec.source.targetRevision }}
    plugin:
      name: kustomized-helm
  destination:
    namespace: myproject
    server: {{ .Values.spec.destination.server }}

A full example of the integration between ArgoCD and Kustomize is reported here.

You are done. If you want to read more, please have a look at the official documentation [5] and this very good overview [6].

Cheers.

Resources

  1. https://www.mirantis.com/blog/introduction-to-kustomize-part-1-creating-a-kubernetes-app-out-of-multiple-pieces/
  2. https://www.densify.com/kubernetes-tools/kustomize
  3. https://dev.to/camptocamp-ops/use-kustomize-to-post-render-helm-charts-in-argocd-2ml6
  4. https://github.com/argoproj/argocd-example-apps/tree/master/plugins/kustomized-helm
  5. https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/
  6. https://skryvets.com/blog/2019/05/15/kubernetes-kustomize-json-patches-6902/

Wednesday, October 6, 2021

TLDR intro to Delta Lake

What is delta lake?

Delta Lake is a project originated by the creators of Apache Spark, specifically designed to provide a thin layer on top of existing data lakes. The data is saved as snappy-compressed parquet files. The main goal of Delta is to bring ACID guarantees on data lakes so as to combine the best of horizontally scalable OLAP analytical workloads with the transactional reliability of OLTP ones. Ultimately, the goal of delta is to unify streaming and batch data processing. This is achieved mainly by adding a transaction log, recording every change made to the data. This allows rolling back to previous versions in what is called time traveling, as well as provides an audit log of data modifications and enables support for deletes, updates and merge operations.

The protocol has the following properties [1]:

  • serializable writes - multiple concurrent writes to a delta table are serializable; 
  • snapshot isolation on reads - readers can read a consistent table snapshot even in presence of multiple writers;
  • scalability to billions of partition files
  • self-describing - metadata are stored alongside data to eliminate the need for a separate metastore and simplify data management, as well as provide back compatibility on existing infrastructure
  • incremental processing - tailing the transaction log is enough to determine data added in a certain time period, in order to efficiently convert the table/file to a stream;

Transaction log

When writing a Spark DataFrame (i.e., df.write.format("delta").save(path) ) to Delta or when creating a Delta-formatted Hive table, a _delta_log folder is added along with the data folder.

The transaction log indicates which files are to be included in the currently active version. Without this, when multiple processes write to the same folder, in case of error there is no mechanism to clean it up from those files that were already written. So without a transaction log and with direct read access to the data files all of them are retrieved.

For each transaction there exists a CRC file along with a JSON file. The CRC file contains information related to the number of files and size that helps Spark optimize its queries.

Upon modifications (insert, delete, update or merge) the transaction log is modified by atomic operations named commits. Each commit is written as a JSON file, starting with 000000.json and subsequent commits generate additional JSON files named with an ascending ID. The transaction log contains an add.path column with the list of files being added by the commit and a remove.path column with all those being removed; 

An OPTIMIZE operation is available to compact the files in the current state. This has the effect of adding a new transaction with data residing in a lower number of partitions. The effect of optimize is also visible by looking at the metrics numFilesAdded and numFilesRemoved.

Upon data removal, the operation is recorded in the transaction log. In this case, a new commit is added with all data entries but those deleted. In spite of the fact that data is no longer present in the latest version of the table, data are in fact retained and it is possible to rollback to a previous state. Since those files are not automatically removed from disk, a VACUUM operation exists. 

Since a streaming producer would be writing many transactions, the small-file problem would quickly arise by solely relying on commit files. The solution is to use periodic checkpointing, specifically meant to save the entire table snapshot after the nth commit occurred. This also avoids readers to have to rerun the transaction log (i.e., many tiny inefficient JSON files) to reproduce the table state. In practice, this is done incrementally, so every commit is being added by Spark and the resulting table is cached, so that it can be directly used by depending Spark operations. Upon reaching the 10th commit, a checkpoint file is produced and saved in Parquet format.

When looking at the parquet checkpoint file, you will notice that all previous n transactions are contained, along with a stats_parsed column containing similar information to that of the individual CRC files.

Optimistic concurrency control

Transactions resulting by concurrent writers are ensured to complete without conflict. This is implicitly achieved when working on different parts of the tables (different partitions). However, the optimistic protocol works when producers write to the same parts of the table simultaneously. Serializability is achieved by implementing a policy of mutual exclusion:

if there is a conflict (another writer is attempting a commit), check whether what was read has changed, if so read the latest version and attempt to commit with a newer id, otherwise, just commit the version and go on;

For instance, suppose:

  • user 1 reads 000000.json, user 2 reads 000001.json
  • user 1 attempts to commit 000001.json, user 2 attempts to commit 000002.json at the same time
  • only one commit can be 000001.json
  • user 1 sees there is a newer commit than 000000.json named 000001.json, while user 2 sees it has already the newest version and can commit 000002.json
Therefore, the solution lets one of the user succeed and the other one retry with another commit.
When both users attempt deleting the same data twice, the only solution is instead to let one succeed and the other fail and throw an error, since delete is not idempotent and upon reading the newer version the user can not apply any longer the delete operation.

This mechanism is implemented using multiversion concurrency control (MVCC), which provides transactional guarantees (i.e. serializability and snapshot isolation) without needing to physically lock the resource, consequently allowing for higher performance.

Table Utils

Table History

Table history, i.e. the list of operations along with user timestamp and other metadata can be retrieved with a DESCRIBE HISTORY <table> or with DeltaTable.forPath(spark, "mypath").history(). Table history is retained by default for 30 days and can be configured using the config spark.databricks.delta.logRetentionDuration. Upon new commits, the transaction log is cleaned up for those commits older than the set retention period. 

Vacuum

While the transaction log is automatically cleaned up at every new commit (if older than the set retention) the data file must be deleted explicitly. Dangling files, i.e. files no longer referenced as they were overwritten or deleted, can be removed by running the VACUUM operation. VACUUM is never called automatically and when used the default threshold used for the files to be removed is 7 days, i.e. files older than 7 days will be removed. Generally, this interval should be longer than the longest-running transaction or the longest period that any input source can lag behind the most recent update to the table. The config spark.databricks.delta.deletedFileRetentionDuration controls the threshold between the time files were marked for deletion and the moment they can be actually deleted by VACUUM.
The config spark.databricks.delta.vacuum.parallelDelete.enabled can be set to true to vacuum delete the files in parallel (as based on the number of shuffle partitions).

Miscellaneous

Convert Parquet table to Delta table

Converting Parquet tables to Delta can be easily done with:
  • CONVERT TO DELTA parquet.`<path-to-table>`
  • DeltaTable.convertToDelta(spark, "parquet.`<path-to-table>`")
Conversion back to Parquet is easily done by running VACUUM with a retention of 0 hours to delete all dangling data files and then deleting the _delta_log directory.

Integration with non-Spark systems

To allow non-Spark systems, such as Presto, to integrate with Delta lake without accessing the transaction log, it is possible to generate a manifest file with:
  • GENERATE symlink_format_manifest FOR TABLE delta.`<path-to-delta-table>` 
  • DeltaTable.forPath("path-to-delta-table").generate("symlink_format_manifest").
A file named _symlink_format_manifest is created at the delta table path.

Selecting specific table version

Versions of Delta tables can be accessed by timestamp or a version number. These can be listed with:
  • DESCRIBE HISTORY <table-name>

A specific version can be queried, such as:
  • SELECT * FROM <table> VERSION AS OF <version>
  • SELECT * FROM <table> VERSION AS OF <datetime>

Delta as a Stream

A Delta table can be used as a source in a stream processing pipeline, such as Spark Streaming. For instance:
  • spark.readStream.format("delta").load("delta-table-path")
  • spark.readStream.format("delta").table("delta-table-name")
Similarly, delta can be used as a sink:
  • stream.writeStream.format("delta").outputMode("append").option("checkpointLocation", <path>).start("delta-table-path")
  • stream.writeStream.format("delta").outputMode("append").option("checkpointLocation", <path>).table("delta-table-name")

Monday, October 4, 2021

Speeding up large scale data quality validation: the Gilberto project

We already discussed in previous posts of the importance of data quality validation as part of the data operations toolbox, along with a metadata management solution such as Mastro. However, we never really discussed the integration of those two aspects. Data catalogues and Feature stores are becoming standard design patterns in enabling on one hand data annotation, discovery and lineage tracking; while on the other hand feature stores assure basic quality constraint on produced features, in order to enable reuse and presidium of data assets by continuously monitoring it for deviations from expected requirements. While catalogues allow for building a dependency graph of data assets, they do not go into the statistical details of monitored data assets; Similarly, feature stores are meant to abstract repetitive feature extraction processes that can occur on different environments, be it a data science notebook or a scheduled pipeline, often written with different frameworks and languages; Feature stores also allow for the versioning of those computations in order to achieve repeatability. A similar goal is meant for ML models by means of model registries.

As it appears, catalogues and feature stores target lineage and computational aspects rather than data-centric ones, that is, without going into data quality matters, such as completeness, accuracy, timeliness, integrity, validity (e.g. wrt a type schema), distribution;  These are target by so call metrics stores, which store data quality metrics as calculated by specific runs of quality validation pipelines;

Example data quality frameworks are:

All of those require to some extent some customization for the data source of interest. Great expectations is a Python-centric tool meant to provide a meta-language to define constraints that can be run by multiple backends. Similarly, TFX DV originated from the Tensorflow project and can extract meaningful statistics from data represented in common formats, such as TFRecord; Deequ is a library written in Scala Spark and offers in my opinion the most general purpose tool out of those, especially when targeting data generally sitting in data warehouses, on either HDFS or S3, as it is common nowadays and where Spark really excels. Deequ benefits from the great integration of Spark on modern data processing technologies to offer mainly the following:

  • profiling - extraction of statistics on input Data(frames);
  • constraint suggestion - extraction of meaningful constraints based on profiled Data(frames)
  • validation - enforcement of Checks to detect deviations
  • anomaly detection - to detect deviations over time from common characteristics
Deequ is an amazing tool, but still requires some customization, to load those data sources and define checks. Moreover, whereas computed metrics can be saved to so called metrics repositories, they are provided as either an InMemoryMetricsRepository and a FileSystemMetricsRepository. The former is basically a Concurrent Hash map, while the second is a connector allowing for writing a json file of kind metrics.json to HDFS or S3. Clearly, this has various drawbacks. Most of all, writing to a unique file blob all metrics does not scale and does not allow for querying from Presto and other engines alike.

To overcome this issues we:
  • introduce the Gilberto project, meant to curtail the boilerplate coding required with Deequ; the developer can define checks in Scala code files which can be deployed on a distributed FS along the artifact or mounted on a local volume, for instance on a k8s cluster;
    An example Check file


    Gilberto is able to use reflection to dynamically load and enforce those checks and return standard error codes. This makes the tool easily integrable in workflow management systems, such as Argo-workflows; Gilberto is meant to be ran on both Yarn and k8s alike. Check the sections YARN_DEPLOY and K8S_DEPLOY for an example. For K8s, the master branch contains a full-fledged deployment script. You can use that in combination with a version available in one of the other branches, such as for Spark 3.1.2 or Spark 2.4.7, which you can either build locally or pull from Dockerhub.


  • introduce metrics stores in the Mastro project, meant to store various kinds of metricsets, including those generated by Deequ/Gilberto and sent via a PUT over a REST interface.
    type definition for MetricSet

    As such, Metric Sets can be easily integrated with Mastro's Data Assets (stored in the catalogue) and Feature Sets (as stored in the featurestore), which does close the gap we discussed at the beginning of this post; Also, being the format the same used by existing Deequ's Metrics Repositories, this enables anomaly detection use cases, since metrics can be retrieved by tags and time, also using a REST interface.
    type definition for DeequMetric
    Getting started with mastro is super easy. Beside a docker-compose there is also a Helm Chart, to help you get started on K8s. The only prerequisite is a DB, we use bitnami/mongo for most of our tests.

Shall you be interested, please have a look at the projects. I am looking forward to hearing your feedback!

Andrea


References:

Thursday, September 23, 2021

Testing Terraform Scripts without spending a fortune


Introduction

How many times did you test your infrastructure by actually setting it up entirely? How much did it cost? In this post I showcase the localstack framework, aiming to be a mock of AWS. Yes you heard it well, the idea is to test your AWS services against the framework, rather than spending money to actually set it up and test your application on it, with the many moving parts involved and possibly breaking on the way.

Localstack

Localstack is a mock service for AWS, that means, you can test your AWS-based services without actually needing to ramp up anything. Specifically, this is the list of AWS services available in the free Localstack version:

  • ACM
  • API Gateway
  • CloudFormation
  • CloudWatch
  • CloudWatch Logs
  • DynamoDB
  • DynamoDB Streams
  • EC2
  • Elasticsearch Service
  • EventBridge (CloudWatch Events)
  • Firehose
  • IAM
  • Kinesis
  • KMS
  • Lambda
  • Redshift
  • Route53
  • S3
  • SecretsManager
  • SES
  • SNS
  • SQS
  • SSM
  • StepFunctions
  • STS

Localstack can be installed as pip package or directly ran as docker container. The project also contains a docker-compose.yml file.

All services are exposed at the so-called edge port, on 4566. To test localstack you can setup the AWS client, directly downloading it from here, specifically:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install


we can now create an aws profile with dummy credentials, since they are not validated anyways by localstack:

aws configure --profile localstack

We can now test the creation of a S3 bucket:

aws s3 --profile localstack --endpoint-url http://localhost:4566 mb s3://mybucket

The bucket is empty.

aws s3 --profile localstack --endpoint-url http://localhost:4566 ls s3://mybucket

Let's create an empty file and upload it to the bucket: 

touch test_file.txt

aws s3 --profile localstack --endpoint-url http://localhost:4566 cp test_file.txt s3://mybucket

upload: ./test_file.txt to s3://mybucket/test_file.txt

Cool, now we got something: 

aws s3 --profile localstack --endpoint-url http://localhost:4566 ls s3://mybucket
2020-11-03 19:49:32          0 test_file.txt

Terraform

Terraform has become a de-facto standard Infastructure-as-a-Code (IaC) tool for building and managing infrastructure in an unambiguous and repeatable way.

Getting Started with Terraform

For a tutorial with Terraform, please have a look here.

Terraform consists of:
  • Resources to be managed; orthogonally to those, meta-arguments can be defined, such as depends-on, count to create multiple instances of the same resource type, lifecycle to define Terraform-related behavior such as upon update or deletion;
  • Modules - grouping a set of resources into a reusable named component that can be be published and maintaned as a whole; this enables code reuse and a more maintainable architecture; a natural design pattern is to separate code in a repository for modules and another one for live infrastructure;
  • Providers - managers of specific resource types; providers are indexed on the Terraform Registry and can come from either Hashicorp, verified organizations or community members; No longer maintained ones are listed as "Archived". For instance, the AWS Provider is maintained directly by Hashicorp. The documentation is available here and the Github repo here.
  • Input Variables - used to abstract and parametrize providers;
  • Outputs - specifying values to export from a module; Terraform prints those specified output values to stdout when applying the configuration; You can alternatively explicitly query those values using the terraform output command, which is optionally provided the output name (e.g. terraform output region) to act as a resource query;
  • Data Sources - definying a reference to information defined outside of Terraform;

As a declarative language, Terraform has no control flow constructs such as for-loop, although it provides with a basic if-else conditional construct, such as to define multiple variants of the modeled infrastructure, by deploying either these or those resources based on data or variable values.

These are the required steps to run terraform applications:
  1. Sign up for an AWS account, create a new non-root user and assign some policies
  2. Create a ~/.aws/credentials file with a new profile for the account created at 1 or export AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID
  3. Install Terraform using a package manager or by downloading the binary from here or here
  4. terraform init to initialize the Terraform project
  5. terraform plan to see changes to the infrastructure with respect to the applied tf file
  6. terraform apply to apply the changes to the infrastructure (or terraform apply -auto-approve to skip confirmation)
  7. Once done terraform destroy to terminate all resources managed by the current configuration;

In this github repo, we provide a few tutorials with Terraform. Especially, in the last one we create and deploy a lambda function on Localstack.

Enjoy!

Friday, July 16, 2021

Kubernetes Patterns

In this post I want to write down a few notes of a freely available book by Red Hat, named "Kubernetes patterns - Reusable Elements for Designing Cloud-Native Applications". Specifically, the book classifies common k8s patterns in:

  • Foundational
    1. predictable demands - containers shall declare a resource profile and limit to the specified requirements; this is the case of resources.requests and resources.limits (e.g. cpu and memory).
    2. declarative deployment - the creation of pod is actuated by a declarative deployment resource instead of an imperative pod definition;
    3. health probe - any container shall expose health-specific APIs to allow for its observability; this is the case of containers[x].livenessProbe and containers[x].readinessProbe, for instance using an existing REST interface as httpGet.{path, port} or as exec.command;
    4. managed lifecycle - defines how applications shall react to lifecycle events, such as SIGTERM (graceful shutdown) and SIGKILL (forced shutdown), as well as poststart hooks (i.e., containers[x].lifecycle.poststart.exec.command to run a specific command after the container is created, but asynchronously with the main container's process) and prestop hooks (i.e., containers[x].lifecycle.prestop.command) ran right before a container is terminated with a SIGTERM notification; instead of a command, an httpGet.{port, path} can be used;
    5. automated placement - defines a method to manage container distribution in multi node clusters; this is the case of node selectors, such as spec.nodeSelector.{disktype, ..,}, as well as node affinity/antiaffinity (i.e. spec.affinity.nodeAffinity) and inter-pod affinity/antiaffinity (spec.affinity.podAffinity).
  • Behavioral
    1. batch Job - a batch workload to be run once to completion; this is the case of the Job resource.
    2. periodic Job - a batch workload to be scheduled for periodic run; this is the case of the CronJob resource.
    3. daemon service - defines how pods can be run as daemons on specific nodes; In unix, the name daemon denotes a long-running, background process generally providing infrastructural functionalities, such as disk or network resources. This matches the k8s daemonset resource, meant to generate pods that run on all or selected cluster nodes in order to provide additional capabilities to the rest of the cluster pods;
    4. singleton service - ensures that only one instance of a service is running at time; this is generally achieved in out-of-application locking (without application-specific locking mechanisms) through a single-replica statefulSet or replicaSet, or in case of in-application locking (e.g. when using zookeeper as coordination middleware) to elect a leader out of any simple deployment - or generally a group of non-strictly related pods; Viceversa, a PodDisruptionBudget can be set to limit the max number of instances that are simultaneously down at a time (to prevent unavailability).
    5. stateful service - defines approaches to run stateful loads; this is the case of StatefulSets, which, as opposed to ReplicaSets, provide means for identifying connected network resources - given that any pod generated by the set has a name and an ordinal index, startup ordering - i.e. the sequence in which instances are scaled up/down -, as well as storage, given that instead of a persistentVolumeClaim a volumeClaimTemplate can be used to generate PVCs and uniquely bind them to the pods that constitute the StatefulSet;
    6. service discovery - defines mechanisms for discovery of services and mapped pod instances; this is commonly achieved using the app selector (i.e. spec.selector.app);
    7. self awareness - defines methods for introspection and metadata injection; for instance, it is possible to set environment values with information that will be available after the pod is started, such as env[x].valueFrom.fieldRef.fieldPath.{status.PodIP, spec.nodeName, metadata.uid, ...} or env[x].valueFrom.resourceFieldRef.container;
  • Structural
    1. init container - provides a way of running initialization tasks along with application containers; those containers can be listed in the spec.initContainers: [] section;
    2. sidecar container - runs an additional container to extend the application container with additional functionalities; this can be done by running multiple containers, which may interact via a shared mount point or a socket;
    3. adapter - a sidecar meant to expose a different interface of the one initially provided by application container, without modifying it; For instance, we may expose additional monitoring metrics for an existing container image by accessing monitoring data of the application container at a shared disk/network and exposing them on a REST interface.
    4. ambassador - a proxy to be used by the application container to interact with other services; For instance, an additional service, such as a distributed cache or key-value store may be accessed using a specific client, which is started as sidecar and exposes a local interface; the interface can be accessed locally (at localhost, given that they run on the same pod) by the application container which has thus access to the distributed service;
  • Configuration
    1. envVar configuration - defines how variables are used to store configurations, for instance those can be set i) from a literal, ii) using env.valueFrom.configMapKeyRef{name, key} to retrieve it from an existing config map or iii) env.valueFrom.secretKeyRef{name, key} to retrieve it from an existing secret; alternatively an entire config map or secret can be loaded using envFrom{configMapRef.name, prefix};
    2. configuration resource - uses config maps and secrets to store config information; those can then be referenced as specified above or mounted as files;
    3. immutable configuration -  uses built container images to store immutable configs; then mount the data from the image and access it from the application container;
    4. configuration template - uses configuration templates to generate various versions only differing slightly;
  • Advanced
    1. controller - defines methods to watch for changes on resource definitions - such as labels (indexed for query but only alphanumeric keys can be used), annotations (unindexed though relaxed constraint on key type) and configmaps - and react accordingly; this is typically an observe-analyze-act cycle, also called reconciliation loop, which can be run as am event-handling routine along with those available by default, for instance to manage ReplicaSet, Deployments and so on;
    2. operator - defines controllers watching changes on a custom resource; this is thus more flexible than a controller, since custom resource can be defined and the k8s api extended fully (as opposed to a configmap that won't define a new k8s resource type);
    3. elastic scale - defines dynamic scaling, both horizontally and vertically; there exists a HorizontalPodAutoscaler that can be set on a minimum and maximum resource (e.g. cpu, memory, custom ones) range the Pods are expected to run on before the scale process occurs - respectively by ramping up additional pods or terminating superfluous ones (desiredReplicas = currentReplicas * (currentMetricValue/desiredMetricValue)). A VerticalPodAutoscaler is often preferred to the horizontal one, given that adding and removing pods in stateful applications can be a disruptive process; Vertical scaling allows for tuning the requests and limits for containers as based on their actual usage; This can affect both creation of new Pods or update existing ones, which may be evicted and rescheduled with different requests/limits values.
    4. image builder - defines approaches for image build within the cluster; this is opposed to building images outside the cluster using CI pipelines. For instance, Openshift provides a BuildConfig resource that can specify both docker build and source-to-image build processes and directly push to the internal registry.

That's all folks!

Bibliography