1
I Use This!
Activity Not Available

News

Posted over 7 years ago by Kirill
In previous tutorials we discussed ways to enable and manage persistent storage in Kubernetes. In particular, we saw how to use PersistentVolumes to bind various external storage options like Amazon EBS to the pods, introducing ... [More] statefulness to your Kubernetes workloads. However, exposing persistent storage to pods is only one component of statefulness. One limitation of this approach is that we use PersistentVolumes and associated PersistentVolumeClaims within a standard deployment API designed primarily to manage stateless apps. However, as Kubernetes users and administrators, we want to manage pods in stateful apps in a controlled and predictive way, combining persistent storage with a sticky identity of the pods. Fortunately, Kubernetes developers have addressed our concerns with a StatefulSet workload API object designed to create and manage applications that are stateful by design. In this article, we discuss the architecture and key concepts of the StatefulSet API and walk you through the process of creating a working StatefulSet. In upcoming tutorials, we'll move on to the discussion of scaling, rolling updates and roll outs, and other methods for managing applications created with StatefulSets. Let's start! What Are StatefulSets? A StatefulSet is an alternative to deployments when it comes to managing a set of pods based on an identical container spec. Unlike a deployment, however, a StatefulSet maintains a sticky identity for its pods. As we remember, each time the pod is restarted in the Deployment, it gets a new identity, including a new UID and IP address. Under these circumstances, deployments are more suitable for stateless applications where the pod's identity does not matter and where some service can load-balance client requests between pods that perform identical tasks. StatefulSets take a radically different approach. Each pod in a StatefulSet gets a persistent identifier (UID) maintained across any rescheduling and restart. Moreover, since StatefulSets are currently required to use headless services to manage network identities for pods, each pod gets a sticky network sub-domain and CRV records that do not change across the pod's rescheduling. The question arises: why do we actually need all these features for pods? The answer is quite simple: maintaining a state and network identity across pod restarts and rescheduling is one of the major requirements for stateful apps, which are not currently supported in deployments and other stateless controllers in Kubernetes. Therefore, StatefulSets are extremely useful if your application needs the following features: Stable and unique network identifiers Stable, persistent storage Ordered deployment and scaling Ordered deletion and termination Ordered, automated rolling updates Note: StatefulSets are not available in Kubernetes prior to 1.5 and were a beta feature until the 1.9 release. You should consider using StatefulSets only with the latest releases of the Kubernetes platform. Creating a StatefulSet Unlike deployments which only require a deployment API object to be defined, StatefulSets come with a number of prerequisites. Before using them, you must: provision the storage using PersistentVolumes Provisioner based on the requested StorageClass or use the storage pre-provisioned by the Kubernetes administrator. define a headless service. Such services are not assigned a Cluster IP and do not support load-balancing, which reduces coupling to the Kubernetes system and allows freedom for users to do service discovery their custom way. finally, create a StatefulSet API object. In what follows, we walk you through this process, discussing key StatefulSet concepts and methods along the way. Tutorial In this tutorial, we create a StatefulSet for Apache HTTP server using a pre-provisioned PersistentVolume based on the pre-defined StorageClass and give network identities to the pods in the StatefulSet using a headless service. To complete this tutorial, we need:  a running Kubernetes cluster. We use a local single-node cluster with Kubernetes 1.9 deployed with Minikube. a kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1 Create a Headless Service As you remember, a service defines a set of pods that perform the same task and some policy to access them. A headless service defined below is used for the DNS lookups between Apache HTTP pods and clients within the Kubernetes cluster. apiVersion: v1 kind: Service metadata: name: httpd-service labels: app: httpd spec: ports: - port: 80 name: web clusterIP: None selector: app: httpd This service spec: creates a service named "httpd-service" and assigns pods labeled "httpd" to it. All future pods with that label will be associated to this headless service. specifies that a service is headless by setting clusterIP field to None. For more details about headless services, see the official documentation. creates a named port "web" that maps to the port:80 used by the service. Let's save this spec in the httpd-service.yaml file and create it using the following command: kubectl create -f httpd-service.yaml service "httpd-service" created Now, check whether this headless service was successfully created: kubectl describe service httpd-service The response should be: Name: httpd-service Namespace: default Labels: app=httpd Annotations: Selector: app=httpd Type: ClusterIP IP: None Port: web 80/TCP TargetPort: 80/TCP Endpoints: Session Affinity: None Events: That's it! Now we have a headless service ready to assign network identities for our future pods. Let's move on to the second step. Step #2 Create a StorageClass to Enable Dynamic Provisioning of PersistentVolumes for the StatefulSet As we remember from the previous tutorial, Kubernetes supports dynamic provisioning of volumes based on StorageClasses defined by the administrator. A StorageClass is a convenient way to describe storage types available in the cluster and their specific backup, reclaim, and mounting policies. If a PersistentVolumeClaim (PVC) refers to a particular StorageClass, Kubernetes can dynamically provision the requested amount of resources from the underlying storage provider like AWS EBS or Azure disk. For this tutorial, we create a StorageClass (aka storage 'profile') that can dynamically provision hostPath volumes that can mount a directory from the host node's filesystem. This is how our spec looks: kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: fast provisioner: k8s.io/minikube-hostpath reclaimPolicy: Retain In this spec, we: create a StorageClass named "fast" provisioned by the k8s.io/minikube-hostpath volume provisioner. The provisioner is responsible for the interaction with the volume plugin (hostPath plugin in our case) needed to provision the volume. set a "retain" reclaim policy for the volumes provisioned by this StorageClass.See more about reclaimPolicy parameter in our recent article. Let's save this spec in the fast-sc.yaml and create the StorageClass using the following command: kubectl create -f fast-sc.yaml storageclass.storage.k8s.io "fast" created Then, let's check whether our new StorageClass was created: kubectl get sc which should return something like this: NAME PROVISIONER AGE fast k8s.io/minikube-hostpath 1m standard (default) k8s.io/minikube-hostpath 56m So far, so good! Now we have a StorageClass that can dynamically provision hostPathvolumes for the pods in our StatefulSet. Let's move on! Step # 3 Creating a StatefulSet Now, as we have both a headless service and a custom StorageClass defined and created, we can do the same with our StatefulSet. apiVersion: apps/v1 kind: StatefulSet metadata: name: apache-http spec: selector: matchLabels: app: httpd serviceName: "httpd-service" replicas: 3 template: metadata: labels: app: httpd spec: terminationGracePeriodSeconds: 10 containers: - name: httpd image: httpd:latest ports: - containerPort: 80 name: web volumeMounts: - name: www mountPath: /usr/local/apache2/htdocs volumeClaimTemplates: - metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "fast" resources: requests: storage: 2Gi Let's analyze key parameters of this spec: spec.selector.matchLabels -- as in the deployment object spec, this field defines a set of pods to be managed by the StatefulSet. In our case, the StatefulSet will select pods with the "httpd" label. This field's value should match the pod's label defined in the spec.template.metadata.labels. spec.serviceName -- the name of the headless service for this StatefulSet. As you see, we have specified the name of the headless service created in the first step of this tutorial. spec.replicas -- a number of pod replicas run by the StatefulSet. Default value is 1. spec.template.spec.terminationGracePeriodSeconds -- a lapse of time before a given pod is terminated (graceful termination). Force termination of pods can be achieved by setting the value to 0 , but  this is strongly discouraged. For more information, see this article. spec.template.spec.containers - container images with their corresponding settings. See our article about Kubernetes Pods to find out more. volumeClaimTemplates.storageClassName -- a StorageClass that manages dynamic volume provisioning for the pods running in this StatefulSet. Here, we are using our "fast" StorageClass created in the second step of this tutorial. volumeClaimTemplates.resources.requests.storage -- the request for storage resources from our "fast" StorageClass. This request is similar to how PersistentVolumeClaims (PVC) claim resources from their respective PersistentVolumes (PVs). Now, let's open two terminal windows to see how this StatefulSet works. In the first terminal window, type kubectl get pods -w -l app=httpd to watch the creation of the StatefulSet's pods. Then, save the spec above in the httpd-ss.yaml , and create a StatefulSet running the following command in the second terminal: kubectl create -f httpd-ss.yaml statefulset.apps "apache-http" created Then, in the first terminal, you'll notice that the StatefulSet's pods are created sequentially in order from {0..N-1}. For example, the pod apache-http-2 will not be launched until the pod apache-http-1 is running and ready. This approach is known as the ordered pod creation. kubectl get pods -w -l app=httpd NAME READY STATUS RESTARTS AGE apache-http-0 0/1 ContainerCreating 0 1s apache-http-0 1/1 Running 0 5s apache-http-1 0/1 Pending 0 0s apache-http-1 0/1 Pending 0 0s apache-http-1 0/1 ContainerCreating 0 0s apache-http-1 1/1 Running 0 5s apache-http-2 0/1 Pending 0 0s apache-http-2 0/1 Pending 0 0s apache-http-2 0/1 ContainerCreating 0 0s apache-http-2 1/1 Running 0 4s Once all pods are created, the same command will return the following response: NAME READY STATUS RESTARTS AGE apache-http-0 1/1 Running 0 8m apache-http-1 1/1 Running 0 6m apache-http-2 1/1 Running 0 4m As you might have noticed, all pods in our StatefulSet have a sticky, unique identity that is based on the unique ordinal index appended to the StatefulSet name, so that eventually a pod takes the form -. This approach is different from how pods are created in deployments, for example. As you remember, pod names in the deployment are created by appending the deployment name by the random hash value generated for each pod: NAME READY STATUS RESTARTS AGE LABELS httpd-deployment-2955525241-27cj2 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 httpd-deployment-2955525241-kfjj2 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 httpd-deployment-2955525241-z6s5z 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 After a pod in the deployment is terminated, a rescheduled pods get a new UID. In contrast, a StatefulSet allows maintaining the pod's identity across terminations and restarts, so that the users always know with which pods they are interacting. Step #4 Checking the Pods Network in the StatefulSet Each pod in a StatefulSet also has a stable network ID that persists across the pod's restarts. The hostname of each pod is derived from the name of the StatefulSet and the ordinal of the pod. In our example, the pods' hostnames will be apache-http-0, apache-http-1, and so on, depending on the number of replicas in the StatefulSet. In its turn, the domain of pods in our StatefulSet is controlled by the headless service "httpd-service" defined above. The domain managed by the service takes the form of $(service name).$(namespace).svc.cluster.local, where “cluster.local” is the cluster domain. Pods in the StatefulSet get a matching DNS subdomain within this domain that takes the form $(podname).$(governing service domain), where the governing service domain is defined by the serviceName field of the StatefulSet. To verify what domains and sub-domains were created for the pods in our StatefulSet, we can use a kubectl run command that executes a container providing the nslookup command from the dnsutils package. kubectl run -i --tty --image busybox dns-test --restart=Never --rm /bin/sh nslookup apache-http-0.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-0.httpd-service Address 1: 172.17.0.4 apache-http-0.httpd-service.default.svc.cluster.local nslookup apache-http-1.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-1.httpd-service Address 1: 172.17.0.5 apache-http-1.httpd-service.default.svc.cluster.local nslookup apache-http-2.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-2.httpd-service Address 1: 172.17.0.6 apache-http-2.httpd-service.default.svc.cluster.local As you see, the headless service has created a domain named httpd-service.default.svc.cluster.local and three sub-domains for the pods using the DNS conventions described above. Each pod was assigned a unique IP address, sub-domain, and hostname. The CNAME of our headless service points to SRV records (one for each running pod), which in their turn point to the A record entries that contain the pod's IP address. That's how our pods can be discovered by clients. Now, if you delete the pods using kubectl delete pod -l app=httpd, you'll notice that once the pods are recreated they have the same ordinals, hostnames, SRV records, and A records. However, the IP addresses associated with the pods might have changed. That's why it's important to configure other applications to access pods in a StatefulSet the right way. In particular, if you need to find the active member of a StatefulSet, you should query the CNAME of the headless service (e.g httpd-service.default.svc.cluster.local). The SRV records associated with this CNAME will contain only the Pods belonging to the StatefulSet. Alternatively, you can reach the Pods using their SRV records directly (e.g apache-http-0.httpd-service.default.svc.cluster.local) because these records are stable ( they do not change across the Pod restarts). Step #5 Checking Persistent Storage in the StatefulSet Kubernetes will create one PersistentVolume for each VolumeClaimTemplate specified in a StatefulSet definition. In our case, each pod will receive a single PV with a StorageClass "fast" and 2 Gib of a provisioned storage. Please, note that PersistentVolumes associated with the pod's PersistentVolumeClaims are not deleted when the pods or StatefulSet are deleted. The volumes deletion should be done manually. Let's check what volumes were attached to the pods in our StatefulSet: kubectl get pvc -l app=httpd NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-apache-http-0 Bound pvc-a340c764-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h www-apache-http-1 Bound pvc-b50cfb0b-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h www-apache-http-2 Bound pvc-b8b451f6-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h Since we have specified the StorageClass for our PVCs, the PersistentVolumes associated with this class (hostPath SSD) will be dynamically provisioned and automatically bound to our claims. Also, the volumeMounts field in the StatefulSet spec will ensure that the /usr/local/apache2/htdocs directory of the Apache HTTP server is backed by a PersistentVolume. The volume will persist any Apache HTTP server data across the pod's restarts and terminations, ensuring that our application is stateful. Step #6 Deleting the StatefulSet StatefulSet supports two types of deletion: non-cascading and cascading. In the first one, the StatefulSet's pods are not deleted when the StatefulSet is deleted. In a cascading delete, both the pods and the StatefulSet are deleted. To clean up after the tutorial is completed, we'll need to do a cascading delete like this: kubectl delete statefulset apache-http statefulset "apache-http" deleted This command will terminate the Pods in your StatefulSet in reverse order {N-1..0}. Also, before a given Pod is terminated, all of its successors must be completely shutdown. Note that this operation will just delete the StatefulSet and its Pods but not the Headless Service associated with your StatefulSet.  Delete our 'fast' StorageClass: kubectl delete storageclass fast storageclass "fast" deleted Delete our httpd-service Service manually: kubectl delete service httpd-service service "httpd-service" deleted Finally, remember that any PVs associated with your StatefulSet should be deleted manually because Kubernetes prioritizes data safety over the automatic purge of all StatefulSet resources. Also, you'll need to delete spec files we created if you don't need them anymore. kubectl delete pvc -l app=httpd Conclusion That's it! We have demonstrated how to use StatefulSets to enable persistent storage, stable UID and network identity of your pods. Leveraging ordered pod creation and sticky network identity managed by the headless services, we can ensure that each pod of our StatefulSet maintains its state and identity across application restarts and rescheduling. These features make StatefulSets the first choice for stateful applications. In this article, however, we primarily focused on key concepts and the basic steps to create a StatefulSet from scratch. In the continuation of this article, we will discuss a StatefulSet's management options including scaling, rolling updates, and roll outs, which will give you a deeper understanding of managing stateful apps in Kubernetes. [Less]
Posted over 7 years ago by Kirill
In previous tutorials we discussed ways to enable and manage persistent storage in Kubernetes. In particular, we saw how to use PersistentVolumes to bind various external storage options like Amazon EBS to the pods, introducing ... [More] statefulness to your Kubernetes workloads. However, exposing persistent storage to pods is only one component of statefulness. One limitation of this approach is that we use PersistentVolumes and associated PersistentVolumeClaims within a standard deployment API designed primarily to manage stateless apps. However, as Kubernetes users and administrators, we want to manage pods in stateful apps in a controlled and predictive way, combining persistent storage with a sticky identity of the pods. Fortunately, Kubernetes developers have addressed our concerns with a StatefulSet workload API object designed to create and manage applications that are stateful by design. In this article, we discuss the architecture and key concepts of the StatefulSet API and walk you through the process of creating a working StatefulSet. In upcoming tutorials, we'll move on to the discussion of scaling, rolling updates and roll outs, and other methods for managing applications created with StatefulSets. Let's start! What Are StatefulSets? A StatefulSet is an alternative to deployments when it comes to managing a set of pods based on an identical container spec. Unlike a deployment, however, a StatefulSet maintains a sticky identity for its pods. As we remember, each time the pod is restarted in the Deployment, it gets a new identity, including a new UID and IP address. Under these circumstances, deployments are more suitable for stateless applications where the pod's identity does not matter and where some service can load-balance client requests between pods that perform identical tasks. StatefulSets take a radically different approach. Each pod in a StatefulSet gets a persistent identifier (UID) maintained across any rescheduling and restart. Moreover, since StatefulSets are currently required to use headless services to manage network identities for pods, each pod gets a sticky network sub-domain and CRV records that do not change across the pod's rescheduling. The question arises: why do we actually need all these features for pods? The answer is quite simple: maintaining a state and network identity across pod restarts and rescheduling is one of the major requirements for stateful apps, which are not currently supported in deployments and other stateless controllers in Kubernetes. Therefore, StatefulSets are extremely useful if your application needs the following features: Stable and unique network identifiers Stable, persistent storage Ordered deployment and scaling Ordered deletion and termination Ordered, automated rolling updates Note: StatefulSets are not available in Kubernetes prior to 1.5 and were a beta feature until the 1.9 release. You should consider using StatefulSets only with the latest releases of the Kubernetes platform. Creating a StatefulSet Unlike deployments which only require a deployment API object to be defined, StatefulSets come with a number of prerequisites. Before using them, you must: provision the storage using PersistentVolumes Provisioner based on the requested StorageClass or use the storage pre-provisioned by the Kubernetes administrator. define a headless service. Such services are not assigned a Cluster IP and do not support load-balancing, which reduces coupling to the Kubernetes system and allows freedom for users to do service discovery their custom way. finally, create a StatefulSet API object. In what follows, we walk you through this process, discussing key StatefulSet concepts and methods along the way. Tutorial In this tutorial, we create a StatefulSet for Apache HTTP server using a pre-provisioned PersistentVolume based on the pre-defined StorageClass and give network identities to the pods in the StatefulSet using a headless service. To complete this tutorial, we need:  a running Kubernetes cluster. We use a local single-node cluster with Kubernetes 1.9 deployed with Minikube. a kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1 Create a Headless Service As you remember, a service defines a set of pods that perform the same task and some policy to access them. A headless service defined below is used for the DNS lookups between Apache HTTP pods and clients within the Kubernetes cluster. apiVersion: v1 kind: Service metadata: name: httpd-service labels: app: httpd spec: ports: - port: 80 name: web clusterIP: None selector: app: httpd This service spec: creates a service named "httpd-service" and assigns pods labeled "httpd" to it. All future pods with that label will be associated to this headless service. specifies that a service is headless by setting clusterIP field to None. For more details about headless services, see the official documentation. creates a named port "web" that maps to the port:80 used by the service. Let's save this spec in the httpd-service.yaml file and create it using the following command: kubectl create -f httpd-service.yaml service "httpd-service" created Now, check whether this headless service was successfully created: kubectl describe service httpd-service The response should be: Name: httpd-service Namespace: default Labels: app=httpd Annotations: Selector: app=httpd Type: ClusterIP IP: None Port: web 80/TCP TargetPort: 80/TCP Endpoints: Session Affinity: None Events: That's it! Now we have a headless service ready to assign network identities for our future pods. Let's move on to the second step. Step #2 Create a StorageClass to Enable Dynamic Provisioning of PersistentVolumes for the StatefulSet As we remember from the previous tutorial, Kubernetes supports dynamic provisioning of volumes based on StorageClasses defined by the administrator. A StorageClass is a convenient way to describe storage types available in the cluster and their specific backup, reclaim, and mounting policies. If a PersistentVolumeClaim (PVC) refers to a particular StorageClass, Kubernetes can dynamically provision the requested amount of resources from the underlying storage provider like AWS EBS or Azure disk. For this tutorial, we create a StorageClass (aka storage 'profile') that can dynamically provision hostPath volumes that can mount a directory from the host node's filesystem. This is how our spec looks: kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: fast provisioner: k8s.io/minikube-hostpath reclaimPolicy: Retain In this spec, we: create a StorageClass named "fast" provisioned by the k8s.io/minikube-hostpath volume provisioner. The provisioner is responsible for the interaction with the volume plugin (hostPath plugin in our case) needed to provision the volume. set a "retain" reclaim policy for the volumes provisioned by this StorageClass.See more about reclaimPolicy parameter in our recent article. Let's save this spec in the fast-sc.yaml and create the StorageClass using the following command: kubectl create -f fast-sc.yaml storageclass.storage.k8s.io "fast" created Then, let's check whether our new StorageClass was created: kubectl get sc which should return something like this: NAME PROVISIONER AGE fast k8s.io/minikube-hostpath 1m standard (default) k8s.io/minikube-hostpath 56m So far, so good! Now we have a StorageClass that can dynamically provision hostPathvolumes for the pods in our StatefulSet. Let's move on! Step # 3 Creating a StatefulSet Now, as we have both a headless service and a custom StorageClass defined and created, we can do the same with our StatefulSet. apiVersion: apps/v1 kind: StatefulSet metadata: name: apache-http spec: selector: matchLabels: app: httpd serviceName: "httpd-service" replicas: 3 template: metadata: labels: app: httpd spec: terminationGracePeriodSeconds: 10 containers: - name: httpd image: httpd:latest ports: - containerPort: 80 name: web volumeMounts: - name: www mountPath: /usr/local/apache2/htdocs volumeClaimTemplates: - metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "fast" resources: requests: storage: 2Gi Let's analyze key parameters of this spec: spec.selector.matchLabels -- as in the deployment object spec, this field defines a set of pods to be managed by the StatefulSet. In our case, the StatefulSet will select pods with the "httpd" label. This field's value should match the pod's label defined in the spec.template.metadata.labels. spec.serviceName -- the name of the headless service for this StatefulSet. As you see, we have specified the name of the headless service created in the first step of this tutorial. spec.replicas -- a number of pod replicas run by the StatefulSet. Default value is 1. spec.template.spec.terminationGracePeriodSeconds -- a lapse of time before a given pod is terminated (graceful termination). Force termination of pods can be achieved by setting the value to 0 , but  this is strongly discouraged. For more information, see this article. spec.template.spec.containers - container images with their corresponding settings. See our article about Kubernetes Pods to find out more. volumeClaimTemplates.storageClassName -- a StorageClass that manages dynamic volume provisioning for the pods running in this StatefulSet. Here, we are using our "fast" StorageClass created in the second step of this tutorial. volumeClaimTemplates.resources.requests.storage -- the request for storage resources from our "fast" StorageClass. This request is similar to how PersistentVolumeClaims (PVC) claim resources from their respective PersistentVolumes (PVs). Now, let's open two terminal windows to see how this StatefulSet works. In the first terminal window, type kubectl get pods -w -l app=httpd to watch the creation of the StatefulSet's pods. Then, save the spec above in the httpd-ss.yaml , and create a StatefulSet running the following command in the second terminal: kubectl create -f httpd-ss.yaml statefulset.apps "apache-http" created Then, in the first terminal, you'll notice that the StatefulSet's pods are created sequentially in order from {0..N-1}. For example, the pod apache-http-2 will not be launched until the pod apache-http-1 is running and ready. This approach is known as the ordered pod creation. kubectl get pods -w -l app=httpd NAME READY STATUS RESTARTS AGE apache-http-0 0/1 ContainerCreating 0 1s apache-http-0 1/1 Running 0 5s apache-http-1 0/1 Pending 0 0s apache-http-1 0/1 Pending 0 0s apache-http-1 0/1 ContainerCreating 0 0s apache-http-1 1/1 Running 0 5s apache-http-2 0/1 Pending 0 0s apache-http-2 0/1 Pending 0 0s apache-http-2 0/1 ContainerCreating 0 0s apache-http-2 1/1 Running 0 4s Once all pods are created, the same command will return the following response: NAME READY STATUS RESTARTS AGE apache-http-0 1/1 Running 0 8m apache-http-1 1/1 Running 0 6m apache-http-2 1/1 Running 0 4m As you might have noticed, all pods in our StatefulSet have a sticky, unique identity that is based on the unique ordinal index appended to the StatefulSet name, so that eventually a pod takes the form -. This approach is different from how pods are created in deployments, for example. As you remember, pod names in the deployment are created by appending the deployment name by the random hash value generated for each pod: NAME READY STATUS RESTARTS AGE LABELS httpd-deployment-2955525241-27cj2 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 httpd-deployment-2955525241-kfjj2 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 httpd-deployment-2955525241-z6s5z 1/1 Running 0 15m app=httpd,pod-template-hash=2955525241 After a pod in the deployment is terminated, a rescheduled pods get a new UID. In contrast, a StatefulSet allows maintaining the pod's identity across terminations and restarts, so that the users always know with which pods they are interacting. Step #4 Checking the Pods Network in the StatefulSet Each pod in a StatefulSet also has a stable network ID that persists across the pod's restarts. The hostname of each pod is derived from the name of the StatefulSet and the ordinal of the pod. In our example, the pods' hostnames will be apache-http-0, apache-http-1, and so on, depending on the number of replicas in the StatefulSet. In its turn, the domain of pods in our StatefulSet is controlled by the headless service "httpd-service" defined above. The domain managed by the service takes the form of $(service name).$(namespace).svc.cluster.local, where “cluster.local” is the cluster domain. Pods in the StatefulSet get a matching DNS subdomain within this domain that takes the form $(podname).$(governing service domain), where the governing service domain is defined by the serviceName field of the StatefulSet. To verify what domains and sub-domains were created for the pods in our StatefulSet, we can use a kubectl run command that executes a container providing the nslookup command from the dnsutils package. kubectl run -i --tty --image busybox dns-test --restart=Never --rm /bin/sh nslookup apache-http-0.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-0.httpd-service Address 1: 172.17.0.4 apache-http-0.httpd-service.default.svc.cluster.local nslookup apache-http-1.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-1.httpd-service Address 1: 172.17.0.5 apache-http-1.httpd-service.default.svc.cluster.local nslookup apache-http-2.httpd-service Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: apache-http-2.httpd-service Address 1: 172.17.0.6 apache-http-2.httpd-service.default.svc.cluster.local As you see, the headless service has created a domain named httpd-service.default.svc.cluster.local and three sub-domains for the pods using the DNS conventions described above. Each pod was assigned a unique IP address, sub-domain, and hostname. The CNAME of our headless service points to SRV records (one for each running pod), which in their turn point to the A record entries that contain the pod's IP address. That's how our pods can be discovered by clients. Now, if you delete the pods using kubectl delete pod -l app=httpd, you'll notice that once the pods are recreated they have the same ordinals, hostnames, SRV records, and A records. However, the IP addresses associated with the pods might have changed. That's why it's important to configure other applications to access pods in a StatefulSet the right way. In particular, if you need to find the active member of a StatefulSet, you should query the CNAME of the headless service (e.g httpd-service.default.svc.cluster.local). The SRV records associated with this CNAME will contain only the Pods belonging to the StatefulSet. Alternatively, you can reach the Pods using their SRV records directly (e.g apache-http-0.httpd-service.default.svc.cluster.local) because these records are stable ( they do not change across the Pod restarts). Step #5 Checking Persistent Storage in the StatefulSet Kubernetes will create one PersistentVolume for each VolumeClaimTemplate specified in a StatefulSet definition. In our case, each pod will receive a single PV with a StorageClass "fast" and 2 Gib of a provisioned storage. Please, note that PersistentVolumes associated with the pod's PersistentVolumeClaims are not deleted when the pods or StatefulSet are deleted. The volumes deletion should be done manually. Let's check what volumes were attached to the pods in our StatefulSet: kubectl get pvc -l app=httpd NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-apache-http-0 Bound pvc-a340c764-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h www-apache-http-1 Bound pvc-b50cfb0b-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h www-apache-http-2 Bound pvc-b8b451f6-6a66-11e8-a50f-0800270c281a 2Gi RWO fast 3h Since we have specified the StorageClass for our PVCs, the PersistentVolumes associated with this class (hostPath SSD) will be dynamically provisioned and automatically bound to our claims. Also, the volumeMounts field in the StatefulSet spec will ensure that the /usr/local/apache2/htdocs directory of the Apache HTTP server is backed by a PersistentVolume. The volume will persist any Apache HTTP server data across the pod's restarts and terminations, ensuring that our application is stateful. Step #6 Deleting the StatefulSet StatefulSet supports two types of deletion: non-cascading and cascading. In the first one, the StatefulSet's pods are not deleted when the StatefulSet is deleted. In a cascading delete, both the pods and the StatefulSet are deleted. To clean up after the tutorial is completed, we'll need to do a cascading delete like this: kubectl delete statefulset apache-http statefulset "apache-http" deleted This command will terminate the Pods in your StatefulSet in reverse order {N-1..0}. Also, before a given Pod is terminated, all of its successors must be completely shutdown. Note that this operation will just delete the StatefulSet and its Pods but not the Headless Service associated with your StatefulSet.  Delete our 'fast' StorageClass: kubectl delete storageclass fast storageclass "fast" deleted Delete our httpd-service Service manually: kubectl delete service httpd-service service "httpd-service" deleted Finally, remember that any PVs associated with your StatefulSet should be deleted manually because Kubernetes prioritizes data safety over the automatic purge of all StatefulSet resources. Also, you'll need to delete spec files we created if you don't need them anymore. kubectl delete pvc -l app=httpd Conclusion That's it! We have demonstrated how to use StatefulSets to enable persistent storage, stable UID and network identity of your pods. Leveraging ordered pod creation and sticky network identity managed by the headless services, we can ensure that each pod of our StatefulSet maintains its state and identity across application restarts and rescheduling. These features make StatefulSets the first choice for stateful applications. In this article, however, we primarily focused on key concepts and the basic steps to create a StatefulSet from scratch. In the continuation of this article, we will discuss a StatefulSet's management options including scaling, rolling updates, and roll outs, which will give you a deeper understanding of managing stateful apps in Kubernetes. [Less]
Posted over 7 years ago by Kirill
As you might remember from the previous tutorials, Kubernetes supports a wide variety of volume plugins both for stateless and stateful apps. Non-persistent volumes such as emptyDir or configMap are firmly linked to the pod's ... [More] lifecycle: they are detached and deleted after the pod is terminated. However, with stateful applications like databases, we want to have volumes that persist data beyond the pod's lifecycle. Kubernetes solves this problem by introducing the PersistentVolume and PersistentVolumeClaim resources that enable native and external persistent storage in your Kubernetes clusters. In this tutorial, we'll explain how these two API resources can be used together to link various storage architectures (both native and CSP-based) to applications running in your cluster. To consolidate the theory, we are also going to walk you through a tutorial about using hostPath as a PersistentVolume in your stateful deployment. Let's start! Why Do We Need Persistent Volumes? The rationale for using a PersistentVolume resource in Kubernetes is quite simple. On the one hand, we have different storage infrastructures such as Amazon EBS (Elastic Block Storage), GCE Persistent Disk, or GlusterFS, each having its specific storage type (e.g., block storage, NFS, object storage, data center storage), architecture, and API. If we were to attach these diverse storage types manually, we would have to develop custom plugins to interact with the external drive's API, such as mounting the disk, requesting capacity, managing the disk's life cycle, etc. We would also need to configure a cloud storage environment, all of which would result in unnecessary overhead. Fortunately, the Kubernetes platform simplifies storage management for you. Its PersistentVolume subsystem is designed to solve the above-described problem by providing APIs that abstract details of the underlying storage infrastructure (e.g., AWS EBS, Azure Disk etc.), allowing users and administrators to focus on storage capacity and storage types their applications will consume rather than the subtle details of each storage provider's API. This sounds similar to the pod's resource model, doesn't it? As we remember from the previous tutorial, containers in a pod request resources in raw amounts of CPU and RAM so users do not bother about server flavors and memory types used by CSPs under the hood. PVs do the same for your storage, providing the right amount of resources for your pods regardless of what storage provider you opt to use. Linking a persistent volume to your application involves several steps: provisioning a PV, requesting storage, and using the PV in your pod or deployment. Let's discuss these steps. Provisioning a PersistentVolume (PV) Kubernetes users need to define a PersistentVolume resource object that specifies the storage type and storage capacity, volume access modes, mount options, and other relevant storage details (see the discussion below). Once the PV is created, we have a working API abstraction that tells Kubernetes how to interact with the underlying storage provider using its Kubernetes volume plugin (see our article about Kubernetes Volumes to learn more). Kubernetes supports static and dynamic provisioning of PVs. In the static provisioning, PVs are created by the cluster administrator and are allowed to use actual storage available in the cluster. This means that in order to use static provisioning, one needs to have a storage (e.g., Amazon EBS) capacity provisioned beforehand (see the tutorial below). On the other hand, a dynamic provisioning of volumes can be triggered when a volume type claimed by the user does not match any PVs available in the cluster. For dynamic provisioning to happen, the cluster administrator needs to enable the DefaultStorageClass admission plugin that defines default storage classes for applications. (Note: this is a big topic in its own right, so it will be discussed in the next tutorials.) Requesting Storage To make PVs available to pods in the Kubernetes cluster, you should explicitly claim them using a PersistentVolumeClaim (PVC) resource. A PVC is bound to the PV that matches storage type, capacity, and other requirements specified in the claim. Binding the PVC to the PV is secured by the control loop that watches for new PVCs and finds a matching PV. The claim will be automatically unbound if a matching volume does not exist. For example, if the PVC is requesting 200 Gi and the cluster only has 100 Gi PVs available, the claim won't be bound to any PV until a PV with 200 Gi is added to the cluster. In the example below, the PVC will be bound to the Persistent Volume #1 and not the Persistent Volume #2 because the PVC's resource request and the volume mode match the first volume only. Using PVs and PVCs in Pods Pods can use PVs by specifying a PVC that matches its resource and volume options definitions. This works as follows. When the PVC is specified, the cluster finds the claim in the pod's namespace, uses it to get the PV backing the claim, and mounts the corresponding volume into the container/s in the pod. Note that PVCs should be referenced as Volumes in a Pod's spec. For example, below we see a pod spec that uses a persistentVolumeClaim named "test-claim" referring to some PV: kind: Pod apiVersion: v1 metadata: name: test-pod spec: containers: - name: nginx image: nginx volumeMounts: - mountPath: "/var/www/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: test-claim Persistent Volume Types We discussed available volume types in the previous tutorial. However, not all of them are persistent. Below is a table of the persistent storage volume plugins that can be used by PVs and PVCs. Volume Name Storage Type                Description gcePersistentDisk Block Storage A Google Compute Engine (GCE) Persistent Disk that provides SSD and HDD storage attached to nodes and pods in a K8s cluster. awsElasticBlockStore Block Storage Amazon EBS volume is a persistent block storage volume offering consistent and low-latency performance. azureFile Network File Shares Microsoft Azure file volumes are fully managed file shares in Microsoft Azure accessible via the industry standard Server Message Block (SMB) protocol. azureDisk Block Storage A Microsoft Azure data disk provides block storage with SSD and HDD options. fc Data Center Storage and Storage Area Networks (SAN) Fibre channel is a high-speed networking technology for the lossless delivery of raw block data. FC is primarily used in Storage Area Networks (SAN) and commercial data centers. FlexVolume Allows Creating Volume Plugins FlexVolume enables users to develop Kubernetes volume plugins for vendor-provided storage. flocker Container Data Storage and Management Flocker is an open-source container data volume manager for Dockerized applications. The platform supports container portability across diverse storage types and cloud environments. nfs Network File System NFS refers to a distributed file system protocol that allows users to access files over a computer network. iscsi Networked Block Storage iSCSI (Internet Small Computer Systems Interface) is an IP-based storage networking protocol for connecting data storage facilities). It is used to facilitate data transfer over intranets and to manage storage over long distances by enabling location-independent data storage. rbd Ceph Block Storage Ceph RADOS Block Device (RBD) is a building block of Ceph Block Storage that leverages RADOS capabilities such as snapshotting, consistency, and replication. cephfs Object Storage and Interfaces for Block and File Storage Ceph is a storage platform that implements object storage on a distributed computer cluster. cinder Block Storage Cinder is a block storage service for openstack designed to provide storage resources to end users that can be used by the OpenStack Compute Project (Nova). glusterfs Networked File System Gluster is a distributed networked file system that aggregates storage from multiple servers into a single storage namespace. vsphereVolume VMDK Stands for a virtual machine disk (VMDK) provided by the vSphere (VMware). quobyte Data Center File System Quobyte volume plugin mounts Quobyte data center file system. hostPath Local Cluster File System hostPath volumes mounts directories from the host node’s filesystem into a pod. portworxVolume Block Storage A portworxVolume is a Portworx's elastic block storage layer that runs hyperconverged with Kubernetes. Portworx's storage system is designed to aggregate capacity across multiple servers similarly to Gluster. scaleIO Shared Block Networked Storage ScaleIO is a software-defined storage product from Dell EMC that creates a server-based Storage Area Network (SAN) from local server storage. It is designed to convert direct-attached storage into shared block storage. storageos Block Storage StorageOS aggregates storage across a cluster of servers and exposes it as high-throughput and low-latency block storage. Defining a PersistentVolume API Resource Now, let's discuss how PVs actually work. First, PVs are defined as Kubernetes API objects with a spec and a list of parameters. Below is an example of a typical PersistentVolume definition in Kubernetes. apiVersion: v1 kind: PersistentVolume metadata: name: pv-nfs spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.2 nfs: path: /tmp server: 172.15.0.6 As you see, here we defined a PV for the NFS volume type. Key parameters defined in this spec are the following: spec.capacity -- Storage capacity of the PV. In this example, our PV has a capacity of 10 Gi (gigibytes). The capacity property uses the same units as defined in the Kubernetes resource model. It allows users to represent storage as unadorned integers or as fixed-point integer with one of these SI suffices (E, P, T, G, M, K, m) or as their binary equivalents (Ei, Pi, Ti, Gi, Mi, Ki). Currently, Kubernetes users can only request storage size. However, future attributes may include throughput, IOPS, etc. spec.volumeMode (available since Kubernetes v.1.9 ) -- The volume mode property supports raw block devices and filesystems. Block storage mode offers a raw unformatted block storage that avoids filesystem overhead and, hence, ensures lower latency and higher throughput for mission-critical applications such as databases and object stores. Valid values for this field include "Filesystem" (default) and "Block". spec.accessModes -- Defines how the volume can be accessed. (Note: valid values vary across persistent storage providers.) In general, the supported field values are: ReadWriteOnce –- the volume can be mounted as a read/write volume only by a single node. ReadOnlyMany -- many nodes can mount the volume as read-only. ReadWriteMany -- many nodes can mount the volume as read-write. Note: a volume can be only mounted with one access mode at a time, even if it supports many. spec.storageClassName -- A storage class of the volume defined by the StorageClass resource. A PV of a given class can only be bound to PVCs requesting that class. A PV with no storageClassName defined can only be bound to PVCs that request no particular class. spec.persistentVolumeReclaimPolicy -- A reclaim policy for the Volume. At the present moment, Kubernetes supports the following reclaim policies: retain: If this policy is enabled, the PV will continue to exist even after the PVC is deleted. However, it won't be available to another claim until the previous claimant's data remains on the volume and PV are deleted manually. It's also worth noting that where Retain really shines is that users can reuse that data if they want -- for example, if they wanted to use the data without a PV (e.g., switch to a traditional database model), or if they wanted to use that data on a different, new PV (migrating to another cluster, testing, etc.). recycle: (Deprecated): This reclaim policy performs a basic scrub operation (rm -rf /thevolume/*) on a given volume and makes it available again for a new claim. delete: This reclaim policy deletes the PersistentVolume object from the Kubernetes API and associated storage capacity in the external infrastructure (e.g., AWS EBS, Google Persistent Disk, etc.). AWS EBS, GCE PD, Azure Disk, and Cinder volumes support this reclaim policy. spec.mountOptions -- A K8s administrator can use this field to specify additional mount options supported by the storage provider. In the spec above, we mount an NFS hard drive and specify that the NFS version 4.2 should be used. Note: Not all providers support mount options. For more information, see the official documentation. spec.nfs -- The list of the NFS-specific options. Here, we say that nfs Volume should be mounted at the /tmp path of a server with an IP 172.15.0.6 . Defining PVC As we've already said, PersistentVolumeClaims must be defined to claim resources of a given PV. Similarly to PVs, PVCs are defined as Kubernetes API resource objects. Let's see an example: kind: PersistentVolumeClaim apiVersion: v1 metadata: name: volume-claim spec: accessModes: - ReadWriteMany volumeMode: Block resources: requests: storage: 10Gi storageClassName: slow selector: matchLabels: release: "stable" This PVC: targets volumes that have ReadWriteMany access mode (spec.accessModes). requests the storage only from volumes that have "Block" volume mode enabled (spec.volumeMode). claims 10Gi of storage from any matching PV (spec.resources.requests.storage). filters volumes that match a "slow" storage class (spec.storageClassName). Note: If the default StorageClass is set by the administrator, the PVC with no storageClassName can be bound only to PVs of that default. specifies a label selector to further filter a set of volumes. Only volumes with a label "stable" can be bound to this claim. This PVC matches the StorageClass of the PV defined above. However, it does not match the volumeMode and accessMode in that PV. Therefore, our PVC cannot be used to claim resources from the pv-nfs PV. Using Persistent Volumes by Pod Once the PV and PVC are created, using persistent storage in a pod becomes straightforward: kind: Pod apiVersion: v1 metadata: name: persistent-pod spec: containers: - name: httpd image: httpd volumeMounts: - mountPath: "/usr/local/apache2/htdocs" name: test-pv volumes: - name: test-pv persistentVolumeClaim: claimName: volume-claim In this pod spec, we: pull Apache HTTP Server from the Docker Hub repository (spec.containers.image) define a Volume "test-pv" and use the PVC "volume-claim" to claim some PV that matches this claim. mount the Volume at the path /usr/local/apache2/htdocs in the httpd container. That's it! Hopefully, now you understood the theory behind PVs and PVCs and key options and parameters available to these API resources. Let's consolidate this knowledge with the tutorial. Tutorial: Using hostPath Persistent Volumes in Kubernetes In this tutorial, we'll create a PersistentVolume using hostPath volume plugin and claim it for the use in the Deployment running Apache HTTP servers. hostPath volumes use a file or directory on the Node and are suitable for the development and testing purposes. Note: hostPath volumes have certain limitations to watch out. It's not recommended to use them in production. Also, in order for the hostPath to work, we will need to run a single node cluster.  See the official documentation for more info. To complete this example, we used the following prerequisites: A Kubernetes cluster deployed with Minikube. Kubernetes version used was 1.10.0. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1 Create a Directory on Your Node First, let's create a directory on your Node that will be used by the hostPath Volume. This directory will be a webroot of Apache HTTP server. First, you need to open a shell to the Node in the cluster. Since you are using Minikube, open a shell by running minikube ssh. In your shell, create a new directory. Use any directory that does not need root permissions (e.g., user's home folder if you are on Linux): mkdir /home//data Then create the index.html file in this directory containing a custom greeting from the server (note: use the directory you created): echo 'Hello from the hostPath PersistentVolume!' > /home//data/index.html Step #2 Create a Persistent Volume The next thing we need to do is to create a hostPath PersistentVolume that will be using this directory. kind: PersistentVolume apiVersion: v1 metadata: name: pv-local labels: type: local spec: storageClassName: local capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/home//data" This spec: defines a PV named "pv-local" with a 10Gi capacity (spec.capacity.storage). sets the PV's access mode to ReadWriteOnce, which allows the volume to be mounted as read-write by a single node (spec.accessModes). assigns a storageClassName "local" to the PersistentVolume. configures hostPath Volume plugin to mount local directory at /home//data Save this spec in the file (e.g hostpath-pv.yaml) and create the PV running the following command: kubectl create -f hostpath-pv.yaml persistentvolume "pv-local" created Let's check whether our PersistentVolume was created: NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-local 10Gi RWO Retain Available local 2m This response indicates that the  volume is already available but still does not have any claim bound to it, so let's create one. Step #3 Create a PersistentVolumeClaim (PVC) for your PersistentVolume The next thing we need to do is to claim our PV using a PersistentVolumeClaim. Using this claim, we can request resources from the volume and make them available to our future pods. kind: PersistentVolumeClaim apiVersion: v1 metadata: name: hostpath-pvc spec: storageClassName: local accessModes: - ReadWriteOnce resources: requests: storage: 5Gi selector: matchLabels: type: local Our PVC does the following: filters the volumes labeled "local" to bind our specific hostPath volume and other hostPath volumes that might be created later (spec.selector.matchLabels) . targets hostPath volumes that have ReadWriteOnce access mode (spec.accessModes). requests a volume of at least 5Gi (spec.resources.requests.storage). First, save this resource definition in the hostpath-pvc.yaml, and then create it similar to what we did with the PV: kubectl create -f hostpath-pvc.yaml persistentvolumeclaim "hostpath-pvc" created Let's check the claim running the following command: kubectl get pvc hostpath-pvc The response should be something like this: NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hostpath-pvc Bound pv-local 10Gi RWO local 29s As you see, our PVC was already bound to the volume of the matching type. Let's verify that the PV we created was actually selected by the claim: kubectl get pv pv-local The response should be something like this: NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-local 10Gi RWO Retain Bound default/hostpath-pvc local 16m Did you notice the difference from the previous status of our PV? You'll see that it is now bound by the claim hostpath-pvc we just created (the claim is living in the default Kubernetes namespace). That's exactly what we wanted to achieve! Step #4 Use the PersistentVolumeClaim as a Volume in your Deployment Now everything is ready for the use of your hostPath PV in any pod or deployment of your choice. To do this, we need to create a deployment with a PVC referring to the hostPath volume. Since we created a PV that mounts a directory with the index.html file, let's deploy Apache HTTP server from the Docker Hub repository. apiVersion: apps/v1 # use apps/v1beta2 for versions before 1.9.0 kind: Deployment metadata: name: httpd spec: replicas: 2 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: containers: - image: httpd:latest name: httpd ports: - containerPort: 80 name: web volumeMounts: - name: web mountPath: /usr/local/apache2/htdocs volumes: - name: web persistentVolumeClaim: claimName: hostpath-pvc As you see, along with the standard deployment parameters like container image and container port, we have also defined a volume named "web" that uses our PersistentVolumeClaim. This volume will be mounted with our custom index.html at /usr/local/apache2/htdocs, which is the default webroot directory of Apache HTTP for this Docker Hub image. Also, deployment will have access to 5Gi of data in the hostPath volume. Save this spec in httpd-deployment.yaml and create the deployment using the following command: kubectl create -f httpd-deployment.yaml deployment.apps "httpd" created Let's check the deployment's details: ... Mounts: /usr/local/apache2/htdocs from web (rw) Volumes: web: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: hostpath-pvc ReadOnly: false Along with other details, the output shows that the directory /usr/local/apache2/htdocs was mounted from the web (rw) volume and that our PersistentVolumeClaim was used to provision the storage. Now, let's verify that our Apache HTTP pods actually serve the index.html file we created in the first step. To do this, let's first find the UID of one of the pods and get a shell to the Apache server container running in this pod. kubectl get pods -l app=httpd NAME READY STATUS RESTARTS AGE httpd-5958bdc7f5-fg4l5 1/1 Running 0 15m httpd-5958bdc7f5-jjf9r 1/1 Running 0 15m We'll enter the httpd-5958bdc7f5-jjf9r pod using the following command: kubectl exec -it httpd-5958bdc7f5-jjf9r -- /bin/bash Now, we are inside the Apache2 container's filesystem. You may verify this by using Linux ls command. root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# ls bin build cgi-bin conf error htdocs icons include logs modules The image uses Linux environment, so we can easily install curl to access our server: root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get update root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get install curl When the curl is installed, let's send a GET request to our server listening on localhost:80 (remember that containers within a Pod communicate via localhost). curl localhost:80 Hello from the hostPath PersistentVolume! That's it! Now you know how to define a PV and a PVC for hostPath volumes and use this storage type in your Kubernetes deployments. This tutorial demonstrated how both PV and PVC take care of the underlying storage infrastructure and filesystem, so users can focus on just how much storage they need for their deployment. Step #5 Clean Up This tutorial is over, so let's clean up after ourselves. Delete the Deployment: kubectl delete deployment httpd deployment "httpd" deleted Delete the PV: kubectl delete pv pv-local persistentvolume "pv-local" deleted Delete the PVC: kubectl delete pvc hostpath-pvc persistentvolumeclaim "hostpath-pvc" deleted Finally, don't forget to delete all files and folders like /home//data, PV, and PVC resource definition files we created.  Conclusion We hope that you now have a better understanding of how to create stateful applications in Kubernetes using PersistentVolume and PersistentVolumeClaim. As we've learned, persistent volumes are powerful abstractions that enable user access to diverse storage types supported by the Kubernetes platform. Using PVs you can attach and mount almost any type of persistent storage such as object-, file-, network- level storage to your pods and deployments. In addition, Kubernetes exposes a variety of storage options such as capacity, reclaim policy, volume modes, and access modes, making it easy for you to adjust different storage types to your particular application's requirements and needs. Kubernetes makes sure that your PVC is always bound to the right volume type available in your cluster, enabling the efficient usage of resources, high availability of applications, and integrity of your data across pod restarts and node failures. [Less]
Posted over 7 years ago by Kirill
As you might remember from the previous tutorials, Kubernetes supports a wide variety of volume plugins both for stateless and stateful apps. Non-persistent volumes such as emptyDir or configMap are firmly linked to the pod's ... [More] lifecycle: they are detached and deleted after the pod is terminated. However, with stateful applications like databases, we want to have volumes that persist data beyond the pod's lifecycle. Kubernetes solves this problem by introducing the PersistentVolume and PersistentVolumeClaim resources that enable native and external persistent storage in your Kubernetes clusters. In this tutorial, we'll explain how these two API resources can be used together to link various storage architectures (both native and CSP-based) to applications running in your cluster. To consolidate the theory, we are also going to walk you through a tutorial about using hostPath as a PersistentVolume in your stateful deployment. Let's start! Why Do We Need Persistent Volumes? The rationale for using a PersistentVolume resource in Kubernetes is quite simple. On the one hand, we have different storage infrastructures such as Amazon EBS (Elastic Block Storage), GCE Persistent Disk, or GlusterFS, each having its specific storage type (e.g., block storage, NFS, object storage, data center storage), architecture, and API. If we were to attach these diverse storage types manually, we would have to develop custom plugins to interact with the external drive's API, such as mounting the disk, requesting capacity, managing the disk's life cycle, etc. We would also need to configure a cloud storage environment, all of which would result in unnecessary overhead. Fortunately, the Kubernetes platform simplifies storage management for you. Its PersistentVolume subsystem is designed to solve the above-described problem by providing APIs that abstract details of the underlying storage infrastructure (e.g., AWS EBS, Azure Disk etc.), allowing users and administrators to focus on storage capacity and storage types their applications will consume rather than the subtle details of each storage provider's API. This sounds similar to the pod's resource model, doesn't it? As we remember from the previous tutorial, containers in a pod request resources in raw amounts of CPU and RAM so users do not bother about server flavors and memory types used by CSPs under the hood. PVs do the same for your storage, providing the right amount of resources for your pods regardless of what storage provider you opt to use. Linking a persistent volume to your application involves several steps: provisioning a PV, requesting storage, and using the PV in your pod or deployment. Let's discuss these steps. Provisioning a PersistentVolume (PV) Kubernetes users need to define a PersistentVolume resource object that specifies the storage type and storage capacity, volume access modes, mount options, and other relevant storage details (see the discussion below). Once the PV is created, we have a working API abstraction that tells Kubernetes how to interact with the underlying storage provider using its Kubernetes volume plugin (see our article about Kubernetes Volumes to learn more). Kubernetes supports static and dynamic provisioning of PVs. In the static provisioning, PVs are created by the cluster administrator and are allowed to use actual storage available in the cluster. This means that in order to use static provisioning, one needs to have a storage (e.g., Amazon EBS) capacity provisioned beforehand (see the tutorial below). On the other hand, a dynamic provisioning of volumes can be triggered when a volume type claimed by the user does not match any PVs available in the cluster. For dynamic provisioning to happen, the cluster administrator needs to enable the DefaultStorageClass admission plugin that defines default storage classes for applications. (Note: this is a big topic in its own right, so it will be discussed in the next tutorials.) Requesting Storage To make PVs available to pods in the Kubernetes cluster, you should explicitly claim them using a PersistentVolumeClaim (PVC) resource. A PVC is bound to the PV that matches storage type, capacity, and other requirements specified in the claim. Binding the PVC to the PV is secured by the control loop that watches for new PVCs and finds a matching PV. The claim will be automatically unbound if a matching volume does not exist. For example, if the PVC is requesting 200 Gi and the cluster only has 100 Gi PVs available, the claim won't be bound to any PV until a PV with 200 Gi is added to the cluster. In the example below, the PVC will be bound to the Persistent Volume #1 and not the Persistent Volume #2 because the PVC's resource request and the volume mode match the first volume only. Using PVs and PVCs in Pods Pods can use PVs by specifying a PVC that matches its resource and volume options definitions. This works as follows. When the PVC is specified, the cluster finds the claim in the pod's namespace, uses it to get the PV backing the claim, and mounts the corresponding volume into the container/s in the pod. Note that PVCs should be referenced as Volumes in a Pod's spec. For example, below we see a pod spec that uses a persistentVolumeClaim named "test-claim" referring to some PV: kind: Pod apiVersion: v1 metadata: name: test-pod spec: containers: - name: nginx image: nginx volumeMounts: - mountPath: "/var/www/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: test-claim Persistent Volume Types We discussed available volume types in the previous tutorial. However, not all of them are persistent. Below is a table of the persistent storage volume plugins that can be used by PVs and PVCs. Volume Name Storage Type                Description gcePersistentDisk Block Storage A Google Compute Engine (GCE) Persistent Disk that provides SSD and HDD storage attached to nodes and pods in a K8s cluster. awsElasticBlockStore Block Storage Amazon EBS volume is a persistent block storage volume offering consistent and low-latency performance. azureFile Network File Shares Microsoft Azure file volumes are fully managed file shares in Microsoft Azure accessible via the industry standard Server Message Block (SMB) protocol. azureDisk Block Storage A Microsoft Azure data disk provides block storage with SSD and HDD options. fc Data Center Storage and Storage Area Networks (SAN) Fibre channel is a high-speed networking technology for the lossless delivery of raw block data. FC is primarily used in Storage Area Networks (SAN) and commercial data centers. FlexVolume Allows Creating Volume Plugins FlexVolume enables users to develop Kubernetes volume plugins for vendor-provided storage. flocker Container Data Storage and Management Flocker is an open-source container data volume manager for Dockerized applications. The platform supports container portability across diverse storage types and cloud environments. nfs Network File System NFS refers to a distributed file system protocol that allows users to access files over a computer network. iscsi Networked Block Storage iSCSI (Internet Small Computer Systems Interface) is an IP-based storage networking protocol for connecting data storage facilities). It is used to facilitate data transfer over intranets and to manage storage over long distances by enabling location-independent data storage. rbd Ceph Block Storage Ceph RADOS Block Device (RBD) is a building block of Ceph Block Storage that leverages RADOS capabilities such as snapshotting, consistency, and replication. cephfs Object Storage and Interfaces for Block and File Storage Ceph is a storage platform that implements object storage on a distributed computer cluster. cinder Block Storage Cinder is a block storage service for openstack designed to provide storage resources to end users that can be used by the OpenStack Compute Project (Nova). glusterfs Networked File System Gluster is a distributed networked file system that aggregates storage from multiple servers into a single storage namespace. vsphereVolume VMDK Stands for a virtual machine disk (VMDK) provided by the vSphere (VMware). quobyte Data Center File System Quobyte volume plugin mounts Quobyte data center file system. hostPath Local Cluster File System hostPath volumes mounts directories from the host node’s filesystem into a pod. portworxVolume Block Storage A portworxVolume is a Portworx's elastic block storage layer that runs hyperconverged with Kubernetes. Portworx's storage system is designed to aggregate capacity across multiple servers similarly to Gluster. scaleIO Shared Block Networked Storage ScaleIO is a software-defined storage product from Dell EMC that creates a server-based Storage Area Network (SAN) from local server storage. It is designed to convert direct-attached storage into shared block storage. storageos Block Storage StorageOS aggregates storage across a cluster of servers and exposes it as high-throughput and low-latency block storage. Defining a PersistentVolume API Resource Now, let's discuss how PVs actually work. First, PVs are defined as Kubernetes API objects with a spec and a list of parameters. Below is an example of a typical PersistentVolume definition in Kubernetes. apiVersion: v1 kind: PersistentVolume metadata: name: pv-nfs spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.2 nfs: path: /tmp server: 172.15.0.6 As you see, here we defined a PV for the NFS volume type. Key parameters defined in this spec are the following: spec.capacity -- Storage capacity of the PV. In this example, our PV has a capacity of 10 Gi (gigibytes). The capacity property uses the same units as defined in the Kubernetes resource model. It allows users to represent storage as unadorned integers or as fixed-point integer with one of these SI suffices (E, P, T, G, M, K, m) or as their binary equivalents (Ei, Pi, Ti, Gi, Mi, Ki). Currently, Kubernetes users can only request storage size. However, future attributes may include throughput, IOPS, etc. spec.volumeMode (available since Kubernetes v.1.9 ) -- The volume mode property supports raw block devices and filesystems. Block storage mode offers a raw unformatted block storage that avoids filesystem overhead and, hence, ensures lower latency and higher throughput for mission-critical applications such as databases and object stores. Valid values for this field include "Filesystem" (default) and "Block". spec.accessModes -- Defines how the volume can be accessed. (Note: valid values vary across persistent storage providers.) In general, the supported field values are: ReadWriteOnce –- the volume can be mounted as a read/write volume only by a single node. ReadOnlyMany -- many nodes can mount the volume as read-only. ReadWriteMany -- many nodes can mount the volume as read-write. Note: a volume can be only mounted with one access mode at a time, even if it supports many. spec.storageClassName -- A storage class of the volume defined by the StorageClass resource. A PV of a given class can only be bound to PVCs requesting that class. A PV with no storageClassName defined can only be bound to PVCs that request no particular class. spec.persistentVolumeReclaimPolicy -- A reclaim policy for the Volume. At the present moment, Kubernetes supports the following reclaim policies: retain: If this policy is enabled, the PV will continue to exist even after the PVC is deleted. However, it won't be available to another claim until the previous claimant's data remains on the volume and PV are deleted manually. It's also worth noting that where Retain really shines is that users can reuse that data if they want -- for example, if they wanted to use the data without a PV (e.g., switch to a traditional database model), or if they wanted to use that data on a different, new PV (migrating to another cluster, testing, etc.). recycle: (Deprecated): This reclaim policy performs a basic scrub operation (rm -rf /thevolume/*) on a given volume and makes it available again for a new claim. delete: This reclaim policy deletes the PersistentVolume object from the Kubernetes API and associated storage capacity in the external infrastructure (e.g., AWS EBS, Google Persistent Disk, etc.). AWS EBS, GCE PD, Azure Disk, and Cinder volumes support this reclaim policy. spec.mountOptions -- A K8s administrator can use this field to specify additional mount options supported by the storage provider. In the spec above, we mount an NFS hard drive and specify that the NFS version 4.2 should be used. Note: Not all providers support mount options. For more information, see the official documentation. spec.nfs -- The list of the NFS-specific options. Here, we say that nfs Volume should be mounted at the /tmp path of a server with an IP 172.15.0.6 . Defining PVC As we've already said, PersistentVolumeClaims must be defined to claim resources of a given PV. Similarly to PVs, PVCs are defined as Kubernetes API resource objects. Let's see an example: kind: PersistentVolumeClaim apiVersion: v1 metadata: name: volume-claim spec: accessModes: - ReadWriteMany volumeMode: Block resources: requests: storage: 10Gi storageClassName: slow selector: matchLabels: release: "stable" This PVC: targets volumes that have ReadWriteMany access mode (spec.accessModes). requests the storage only from volumes that have "Block" volume mode enabled (spec.volumeMode). claims 10Gi of storage from any matching PV (spec.resources.requests.storage). filters volumes that match a "slow" storage class (spec.storageClassName). Note: If the default StorageClass is set by the administrator, the PVC with no storageClassName can be bound only to PVs of that default. specifies a label selector to further filter a set of volumes. Only volumes with a label "stable" can be bound to this claim. This PVC matches the StorageClass of the PV defined above. However, it does not match the volumeMode and accessMode in that PV. Therefore, our PVC cannot be used to claim resources from the pv-nfs PV. Using Persistent Volumes by Pod Once the PV and PVC are created, using persistent storage in a pod becomes straightforward: kind: Pod apiVersion: v1 metadata: name: persistent-pod spec: containers: - name: httpd image: httpd volumeMounts: - mountPath: "/usr/local/apache2/htdocs" name: test-pv volumes: - name: test-pv persistentVolumeClaim: claimName: volume-claim In this pod spec, we: pull Apache HTTP Server from the Docker Hub repository (spec.containers.image) define a Volume "test-pv" and use the PVC "volume-claim" to claim some PV that matches this claim. mount the Volume at the path /usr/local/apache2/htdocs in the httpd container. That's it! Hopefully, now you understood the theory behind PVs and PVCs and key options and parameters available to these API resources. Let's consolidate this knowledge with the tutorial. Tutorial: Using hostPath Persistent Volumes in Kubernetes In this tutorial, we'll create a PersistentVolume using hostPath volume plugin and claim it for the use in the Deployment running Apache HTTP servers. hostPath volumes use a file or directory on the Node and are suitable for the development and testing purposes. Note: hostPath volumes have certain limitations to watch out. It's not recommended to use them in production. Also, in order for the hostPath to work, we will need to run a single node cluster.  See the official documentation for more info. To complete this example, we used the following prerequisites: A Kubernetes cluster deployed with Minikube. Kubernetes version used was 1.10.0. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1 Create a Directory on Your Node First, let's create a directory on your Node that will be used by the hostPath Volume. This directory will be a webroot of Apache HTTP server. First, you need to open a shell to the Node in the cluster. Since you are using Minikube, open a shell by running minikube ssh. In your shell, create a new directory. Use any directory that does not need root permissions (e.g., user's home folder if you are on Linux): mkdir /home//data Then create the index.html file in this directory containing a custom greeting from the server (note: use the directory you created): echo 'Hello from the hostPath PersistentVolume!' > /home//data/index.html Step #2 Create a Persistent Volume The next thing we need to do is to create a hostPath PersistentVolume that will be using this directory. kind: PersistentVolume apiVersion: v1 metadata: name: pv-local labels: type: local spec: storageClassName: local capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/home//data" This spec: defines a PV named "pv-local" with a 10Gi capacity (spec.capacity.storage). sets the PV's access mode to ReadWriteOnce, which allows the volume to be mounted as read-write by a single node (spec.accessModes). assigns a storageClassName "local" to the PersistentVolume. configures hostPath Volume plugin to mount local directory at /home//data Save this spec in the file (e.g hostpath-pv.yaml) and create the PV running the following command: kubectl create -f hostpath-pv.yaml persistentvolume "pv-local" created Let's check whether our PersistentVolume was created: NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-local 10Gi RWO Retain Available local 2m This response indicates that the  volume is already available but still does not have any claim bound to it, so let's create one. Step #3 Create a PersistentVolumeClaim (PVC) for your PersistentVolume The next thing we need to do is to claim our PV using a PersistentVolumeClaim. Using this claim, we can request resources from the volume and make them available to our future pods. kind: PersistentVolumeClaim apiVersion: v1 metadata: name: hostpath-pvc spec: storageClassName: local accessModes: - ReadWriteOnce resources: requests: storage: 5Gi selector: matchLabels: type: local Our PVC does the following: filters the volumes labeled "local" to bind our specific hostPath volume and other hostPath volumes that might be created later (spec.selector.matchLabels) . targets hostPath volumes that have ReadWriteOnce access mode (spec.accessModes). requests a volume of at least 5Gi (spec.resources.requests.storage). First, save this resource definition in the hostpath-pvc.yaml, and then create it similar to what we did with the PV: kubectl create -f hostpath-pvc.yaml persistentvolumeclaim "hostpath-pvc" created Let's check the claim running the following command: kubectl get pvc hostpath-pvc The response should be something like this: NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hostpath-pvc Bound pv-local 10Gi RWO local 29s As you see, our PVC was already bound to the volume of the matching type. Let's verify that the PV we created was actually selected by the claim: kubectl get pv pv-local The response should be something like this: NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-local 10Gi RWO Retain Bound default/hostpath-pvc local 16m Did you notice the difference from the previous status of our PV? You'll see that it is now bound by the claim hostpath-pvc we just created (the claim is living in the default Kubernetes namespace). That's exactly what we wanted to achieve! Step #4 Use the PersistentVolumeClaim as a Volume in your Deployment Now everything is ready for the use of your hostPath PV in any pod or deployment of your choice. To do this, we need to create a deployment with a PVC referring to the hostPath volume. Since we created a PV that mounts a directory with the index.html file, let's deploy Apache HTTP server from the Docker Hub repository. apiVersion: apps/v1 # use apps/v1beta2 for versions before 1.9.0 kind: Deployment metadata: name: httpd spec: replicas: 2 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: containers: - image: httpd:latest name: httpd ports: - containerPort: 80 name: web volumeMounts: - name: web mountPath: /usr/local/apache2/htdocs volumes: - name: web persistentVolumeClaim: claimName: hostpath-pvc As you see, along with the standard deployment parameters like container image and container port, we have also defined a volume named "web" that uses our PersistentVolumeClaim. This volume will be mounted with our custom index.html at /usr/local/apache2/htdocs, which is the default webroot directory of Apache HTTP for this Docker Hub image. Also, deployment will have access to 5Gi of data in the hostPath volume. Save this spec in httpd-deployment.yaml and create the deployment using the following command: kubectl create -f httpd-deployment.yaml deployment.apps "httpd" created Let's check the deployment's details: ... Mounts: /usr/local/apache2/htdocs from web (rw) Volumes: web: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: hostpath-pvc ReadOnly: false Along with other details, the output shows that the directory /usr/local/apache2/htdocs was mounted from the web (rw) volume and that our PersistentVolumeClaim was used to provision the storage. Now, let's verify that our Apache HTTP pods actually serve the index.html file we created in the first step. To do this, let's first find the UID of one of the pods and get a shell to the Apache server container running in this pod. kubectl get pods -l app=httpd NAME READY STATUS RESTARTS AGE httpd-5958bdc7f5-fg4l5 1/1 Running 0 15m httpd-5958bdc7f5-jjf9r 1/1 Running 0 15m We'll enter the httpd-5958bdc7f5-jjf9r pod using the following command: kubectl exec -it httpd-5958bdc7f5-jjf9r -- /bin/bash Now, we are inside the Apache2 container's filesystem. You may verify this by using Linux ls command. root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# ls bin build cgi-bin conf error htdocs icons include logs modules The image uses Linux environment, so we can easily install curl to access our server: root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get update root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get install curl When the curl is installed, let's send a GET request to our server listening on localhost:80 (remember that containers within a Pod communicate via localhost). curl localhost:80 Hello from the hostPath PersistentVolume! That's it! Now you know how to define a PV and a PVC for hostPath volumes and use this storage type in your Kubernetes deployments. This tutorial demonstrated how both PV and PVC take care of the underlying storage infrastructure and filesystem, so users can focus on just how much storage they need for their deployment. Step #5 Clean Up This tutorial is over, so let's clean up after ourselves. Delete the Deployment: kubectl delete deployment httpd deployment "httpd" deleted Delete the PV: kubectl delete pv pv-local persistentvolume "pv-local" deleted Delete the PVC: kubectl delete pvc hostpath-pvc persistentvolumeclaim "hostpath-pvc" deleted Finally, don't forget to delete all files and folders like /home//data, PV, and PVC resource definition files we created.  Conclusion We hope that you now have a better understanding of how to create stateful applications in Kubernetes using PersistentVolume and PersistentVolumeClaim. As we've learned, persistent volumes are powerful abstractions that enable user access to diverse storage types supported by the Kubernetes platform. Using PVs you can attach and mount almost any type of persistent storage such as object-, file-, network- level storage to your pods and deployments. In addition, Kubernetes exposes a variety of storage options such as capacity, reclaim policy, volume modes, and access modes, making it easy for you to adjust different storage types to your particular application's requirements and needs. Kubernetes makes sure that your PVC is always bound to the right volume type available in your cluster, enabling the efficient usage of resources, high availability of applications, and integrity of your data across pod restarts and node failures. [Less]
Posted over 7 years ago by Kirill
Kubernetes offers a number of resources like volumes, StatefulSets, or StorageClass that provide diverse storage options for applications running in your clusters. The platform's storage options include both native storage, CSP ... [More] disks, network file systems, object storage, and many more. In this article, we introduce you to Kubernetes Volumes and demonstrate how they can be used to share resources between containers in a stateless app. Let's start! What Are Kubernetes Volumes? Volumes are Kubernetes abstractions that allow containers to use various storage and file system types, share storage, and keep state. By themselves, containers do not maintain their state if terminated or restarted. To avoid losing information, containers must be defined with volumes mounted at specific paths in the container image's file system. Even though volumes outlive containers to which they are attached, when a Pod dies, the volume dies, too. This does not necessarily mean that volume data will be lost forever. Its destiny will depend on the type of volume used (persistent vs. nonpersistent). Therefore, it is useful to distinguish between two types of applications that use volumes: stateless and stateful apps. Here is the difference. Stateless apps do not store the client data generated during sessions. When a session ends, all data generated by the user is lost. A typical example of a volume suitable for stateless applications is emptyDir. A volume of this type will exist until a pod to which it's attached is removed from a node for some reason. When that happens, all data in the emptyDir will be deleted forever. In contrast, stateful apps like databases need some way to store application and user data. Kubernetes has a support for stateful apps implemented in such resources as PersistentVolume (PV). The latter is a volume plugin with a lifecycle independent on the pod that uses it. PersistentVolumes allow using any external storage (e.g., AWS ELB) without the knowledge of the underlying cloud environment. Volumes are very powerful indeed because they abstract container storage from the underlying storage infrastructure (e.g., devices, file systems) and storage providers much like Kubernetes resource requests and limits abstract CPU and memory from the VMs and bare-metal servers (see the image below). In addition, Kubernetes users can extend the platform with their own storage types. New volume plugins can be created for any conceivable type of storage using Container Storage Interface (CSI) and FlexVolume interfaces that expose volume drivers to container environments. Currently, Kubernetes supports over 25 volume plugins. Describing each of them is beyond the scope of this article, but you can find more information in the Kubernetes documentation. Just to get some idea of various storage options supported by Kubernetes, it will be useful to break down the available volume plugins by category: Volumes of cloud service providers (CSPs). One example of this type is awsElasticBlockStore , which allows mounting an AWS EBS volume into a Pod. The contents of this volume type are preserved when a Pod is removed. Other available CSP volume types include Microsoft's azureDisc and GCE's gcePersistentDisk among others. CSP-based volume drivers are normally 'claimed' by PersistentVolumes that "rent" access to the underlying storage infrastructure. Object storage systems. For example, Kubernetes supports CephFS (cephfs) that provides interfaces for object-level, block-level, and file-level storage. Native Kubernetes volume types. emptyDir and configMap are two examples of volumes supported by Kubernetes natively. For example, a configMap volume can be used to inject configuration defined in the ConfigMap object for the use of containers in your Pod. Volumes for remote repositories. Such volume plugins as gitRepo can be used for cloning git repositories into empty directories for your Pods to access. Network filesystems for accessing files across the network. Kubernetes supports NFS (Network File System) (nfs), iscsi (IP-based storage networking protocol for linking data storage facilities), Gluster (glusterfs), and some more. Persistent storage for stateful applications. For example, PersistentVolumes allow users to "claim" persistent storage options like GCE PersistentDisk without bothering about the details of a particular cloud environment. Other examples of persistent storage include StorageOs (storageos). Data center filesystems. Kubernetes ships with Quobyte (quobyte) volume plugin that mounts Quobyte Data Center File System. Secrets volumes. Kubernetes offers a secret volume for storing sensitive information in the Kubernetes API that can be mounted as files by Pods. Secret volumes use tmpfs (a RAM-basked filesystem) so they are never written on disk. Defining a Volume In Kubernetes, the provisioning of volumes for Pods is quite simple. You can use the spec.volumes field to specify what volumes to provide and spec.containers.volumeMounts field to indicate where to mount these olumes into containers. Note: mount paths should be specified for each container in a Pod individually. Below is a simple example of defining and mounting a volume for some arbitrary Pod (Pod meta details are omitted for brevity). ... spec: containers: - name: httpd image: httpd:latest ports: - containerPort: 80 volumeMounts: - name: httpd-config mountPath: /etc/apache2/ volumes: - name: httpd-config configMap: name: httpd-configmap This Pod spec: Creates a default volume named httpd-config and tells Pod to use ConfigMap volume to inject Apache HTTP server configuration. See ConfigMap resource documentation to learn more. Mounts the volume at the path /etc/apache2/ directory that contains httpd application files (e.g., configuration). Mounting volumes needs more explanation though. In a nutshell, the container's filesystem is composed of the Docker image and volumes (containerization creates a partial view of the file system used by the application). The Docker image is at the root of this filesystem. All new volumes are mounted at the specified paths within this image. In our example, we have the container with the Docker filesystem that might contain /var, /etc, /bin and other directories used by the Apache HTTP server. By mounting the volume at /etc/apache2/ location, we make all contents of this folder accessible to it. In the example below, the volume will be populated by Apache HTTP Server data. However, if we were to create an empty volume, we could use an emptyDir volume type. ... spec: containers: - image: httpd:latest name: httpd volumeMounts: - mountPath: /cache name: cache-volume volumes: - name: cache-volume emptyDir: {} This way we would have a completely empty directory, which is useful for a disk-based merge sort, caching, and more. Using SubPath It is a good practice when containers have their individual directories (folders) within a shared volume. This design is especially useful for stacked applications with several tightly coupled containers. The subPath field allows mounting a single volume multiple times with different sub-paths. In the example below, we define a Pod where NGINX data is mapped to html sub-path and MySQL database is stored in mysql folder of a shared persistent "site-data" volume. apiVersion: v1 kind: Pod metadata: name: lemp spec: containers: - name: mysql image: mysql volumeMounts: - mountPath: /var/lib/mysql name: site-data subPath: mysql - name: nginx image: nginx volumeMounts: - mountPath: /usr/share/nginx/html name: site-data subPath: html volumes: - name: site-data persistentVolumeClaim: claimName: lemp-site-data Now, both NGINX and MySQL have their individual folders inside a shared Volume. Note: It's worth mentioning that subPath currently has a few vulnerabilities to watch out. See this article to learn more. So far so good! By now we have a general understanding of how volumes work in Kubernetes and what volume types the platform offers out of the box. However, what are some use cases for volumes? In what follows, we guide you through the tutorial showing how to use Kubernetes volumes to share data between two containers in a Pod. Let's go! Tutorial: Communication between Containers Using Shared Storage In this tutorial, we demonstrate a typical use case for a shared storage when one container is writing logs to the log file while another container (referred to as a sidecar logger) streams these logs to its own stdout. Since kubelet controls stdout, the application's logs can be then accessed using kubectl logs PODNAME CONTAINER_NAME (see the image below). To complete this example, we need the following prerequisites: A working Kubernetes cluster. See our guide for more information about deploying a Kubernetes cluster with Supergiant. As another option, you can install a single-node Kubernetes cluster on a local system with Minikube. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. We create a Pod that pulls an NGINX container (application container) writing some logs and the sidecar container with a busybox Docker image that provides several Unix tools like bash in a single executable file. apiVersion: v1 kind: Pod metadata: name: sidecar-logging labels: app: sidecar-logging spec: containers: - name: nginx image: nginx:latest volumeMounts: - name: varlog mountPath: /var/log/nginx/ ports: - containerPort: 80 - name: busybox image: busybox args: [/bin/sh, -c, 'tail -f /var/log/nginx/access.log'] volumeMounts: - name: varlog mountPath: /var/log/nginx/ volumes: - name: varlog In this Pod definition: Create the Pod named 'sidecar-logging' with NGINX and busybox containers. Define a volume "varlog" to be used by both containers. We are mounting /var/log/nginx directory from the Docker's filesystem so that the sidecar container (busybox) has access to NGINX log files (error.log and access.log) Open containerPort: 80 for the NGINX container. Using Unix shell from the busybox distribution, we ask busybox to tail logs from /var/log/nginx/access.log. Tail is a UNIX native program that allows tracking the tail end of text files, stream, and display new lines as they are added. This is very useful for dynamic tracking of NGINX log files. The log file that will be tracked stores logs generated by HTTP client requests to NGINX server such as visiting the page in a browser or sending curl requests to the server. As you see, our sidecar logging container is quite simple, but that is enough to illustrate how shared storage can be used for inter-container communication. The next thing we need to do is to save the above Pod spec in the sidecar-logging.yaml and deploy our Pod running the following command: kubectl create -f sidecar-logging.yaml Our Pod is now running but is not exposed to the external world. In order to access NGINX server from outside of our cluster, we need to create a NodePort Service type like this: kubectl expose pod sidecar-logging --type=NodePort --port=80 service "sidecar-logging" exposed Now, we can access NGINX on the yourhost:NodePort, which will redirect to the NGINX container listening on the port:80. However, first we need to find out what port Kubernetes assigned to the NodePort service. kubectl describe service sidecar-logging Name: sidecar-logging Namespace: default Labels: app=sidecar-logging Selector: app=sidecar-logging Type: NodePort IP: 10.3.231.160 Port: 80/TCP NodePort: 31399/TCP Endpoints: 10.2.6.6:80 Session Affinity: None As you see, we have the 31399 port assigned. Now, we can trigger the server to write some logs to the /var/log/nginx/access.log file by accessing yourhost:31399 from your browser or sending arbitrary curl requests. Let's check out what logs our sidecar logging container has to display by running kubectl logs sidecar-logging busybox. This will return access logs specifying the IP of the machine that sent the request, HTTP resources to which requests were sent, type of requests, user agents (browsers), and dates. You should have an output similar to this: 10.2.93.0 - - [24/May/2018:13:43:18 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" 10.2.93.0 - - [24/May/2018:13:43:18 +0000] "GET /robots.txt HTTP/1.1" 404 572 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" 10.2.93.0 - - [24/May/2018:13:43:19 +0000] "GET /favicon.ico HTTP/1.1" 404 572 "http://ec2-52-62-39-157.ap-southeast-2.compute.amazonaws.com:31399/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" That's it! Now you understand how a sidecar container can use shared storage to collect logs from the main application and display them. The container can also ship logs to some logging backend, but the example above is enough for you to get some basic idea of using shared storage in Kubernetes. Conclusion In this article, we examined Kubernetes volumes -- powerful abstractions that enable diverse storage options for Pods and containers. Thanks to over 25 supported volume types and the capacity to create new volume plugins, Kubernetes allows deploying applications with any storage types, both stateless and stateful. In this tutorial, however, we largely focused on stateless applications that use volumes with a finite lifecycle. In our next articles, we will discuss persistent storage using PersistentVolumes and other storage resources for the deployment of the full-fledged stateful applications in Kubernetes. Stay tuned for new content to find out more! [Less]
Posted over 7 years ago by Kirill
Kubernetes offers a number of resources like volumes, StatefulSets, or StorageClass that provide diverse storage options for applications running in your clusters. The platform's storage options include both native storage, CSP ... [More] disks, network file systems, object storage, and many more. In this article, we introduce you to Kubernetes Volumes and demonstrate how they can be used to share resources between containers in a stateless app. Let's start! What Are Kubernetes Volumes? Volumes are Kubernetes abstractions that allow containers to use various storage and file system types, share storage, and keep state. By themselves, containers do not maintain their state if terminated or restarted. To avoid losing information, containers must be defined with volumes mounted at specific paths in the container image's file system. Even though volumes outlive containers to which they are attached, when a Pod dies, the volume dies, too. This does not necessarily mean that volume data will be lost forever. Its destiny will depend on the type of volume used (persistent vs. nonpersistent). Therefore, it is useful to distinguish between two types of applications that use volumes: stateless and stateful apps. Here is the difference. Stateless apps do not store the client data generated during sessions. When a session ends, all data generated by the user is lost. A typical example of a volume suitable for stateless applications is emptyDir. A volume of this type will exist until a pod to which it's attached is removed from a node for some reason. When that happens, all data in the emptyDir will be deleted forever. In contrast, stateful apps like databases need some way to store application and user data. Kubernetes has a support for stateful apps implemented in such resources as PersistentVolume (PV). The latter is a volume plugin with a lifecycle independent on the pod that uses it. PersistentVolumes allow using any external storage (e.g., AWS ELB) without the knowledge of the underlying cloud environment. Volumes are very powerful indeed because they abstract container storage from the underlying storage infrastructure (e.g., devices, file systems) and storage providers much like Kubernetes resource requests and limits abstract CPU and memory from the VMs and bare-metal servers (see the image below). In addition, Kubernetes users can extend the platform with their own storage types. New volume plugins can be created for any conceivable type of storage using Container Storage Interface (CSI) and FlexVolume interfaces that expose volume drivers to container environments. Currently, Kubernetes supports over 25 volume plugins. Describing each of them is beyond the scope of this article, but you can find more information in the Kubernetes documentation. Just to get some idea of various storage options supported by Kubernetes, it will be useful to break down the available volume plugins by category: Volumes of cloud service providers (CSPs). One example of this type is awsElasticBlockStore , which allows mounting an AWS EBS volume into a Pod. The contents of this volume type are preserved when a Pod is removed. Other available CSP volume types include Microsoft's azureDisc and GCE's gcePersistentDisk among others. CSP-based volume drivers are normally 'claimed' by PersistentVolumes that "rent" access to the underlying storage infrastructure. Object storage systems. For example, Kubernetes supports CephFS (cephfs) that provides interfaces for object-level, block-level, and file-level storage. Native Kubernetes volume types. emptyDir and configMap are two examples of volumes supported by Kubernetes natively. For example, a configMap volume can be used to inject configuration defined in the ConfigMap object for the use of containers in your Pod. Volumes for remote repositories. Such volume plugins as gitRepo can be used for cloning git repositories into empty directories for your Pods to access. Network filesystems for accessing files across the network. Kubernetes supports NFS (Network File System) (nfs), iscsi (IP-based storage networking protocol for linking data storage facilities), Gluster (glusterfs), and some more. Persistent storage for stateful applications. For example, PersistentVolumes allow users to "claim" persistent storage options like GCE PersistentDisk without bothering about the details of a particular cloud environment. Other examples of persistent storage include StorageOs (storageos). Data center filesystems. Kubernetes ships with Quobyte (quobyte) volume plugin that mounts Quobyte Data Center File System. Secrets volumes. Kubernetes offers a secret volume for storing sensitive information in the Kubernetes API that can be mounted as files by Pods. Secret volumes use tmpfs (a RAM-basked filesystem) so they are never written on disk. Defining a Volume In Kubernetes, the provisioning of volumes for Pods is quite simple. You can use the spec.volumes field to specify what volumes to provide and spec.containers.volumeMounts field to indicate where to mount these olumes into containers. Note: mount paths should be specified for each container in a Pod individually. Below is a simple example of defining and mounting a volume for some arbitrary Pod (Pod meta details are omitted for brevity). ... spec: containers: - name: httpd image: httpd:latest ports: - containerPort: 80 volumeMounts: - name: httpd-config mountPath: /etc/apache2/ volumes: - name: httpd-config configMap: name: httpd-configmap This Pod spec: Creates a default volume named httpd-config and tells Pod to use ConfigMap volume to inject Apache HTTP server configuration. See ConfigMap resource documentation to learn more. Mounts the volume at the path /etc/apache2/ directory that contains httpd application files (e.g., configuration). Mounting volumes needs more explanation though. In a nutshell, the container's filesystem is composed of the Docker image and volumes (containerization creates a partial view of the file system used by the application). The Docker image is at the root of this filesystem. All new volumes are mounted at the specified paths within this image. In our example, we have the container with the Docker filesystem that might contain /var, /etc, /bin and other directories used by the Apache HTTP server. By mounting the volume at /etc/apache2/ location, we make all contents of this folder accessible to it. In the example below, the volume will be populated by Apache HTTP Server data. However, if we were to create an empty volume, we could use an emptyDir volume type. ... spec: containers: - image: httpd:latest name: httpd volumeMounts: - mountPath: /cache name: cache-volume volumes: - name: cache-volume emptyDir: {} This way we would have a completely empty directory, which is useful for a disk-based merge sort, caching, and more. Using SubPath It is a good practice when containers have their individual directories (folders) within a shared volume. This design is especially useful for stacked applications with several tightly coupled containers. The subPath field allows mounting a single volume multiple times with different sub-paths. In the example below, we define a Pod where NGINX data is mapped to html sub-path and MySQL database is stored in mysql folder of a shared persistent "site-data" volume. apiVersion: v1 kind: Pod metadata: name: lemp spec: containers: - name: mysql image: mysql volumeMounts: - mountPath: /var/lib/mysql name: site-data subPath: mysql - name: nginx image: nginx volumeMounts: - mountPath: /usr/share/nginx/html name: site-data subPath: html volumes: - name: site-data persistentVolumeClaim: claimName: lemp-site-data Now, both NGINX and MySQL have their individual folders inside a shared Volume. Note: It's worth mentioning that subPath currently has a few vulnerabilities to watch out. See this article to learn more. So far so good! By now we have a general understanding of how volumes work in Kubernetes and what volume types the platform offers out of the box. However, what are some use cases for volumes? In what follows, we guide you through the tutorial showing how to use Kubernetes volumes to share data between two containers in a Pod. Let's go! Tutorial: Communication between Containers Using Shared Storage In this tutorial, we demonstrate a typical use case for a shared storage when one container is writing logs to the log file while another container (referred to as a sidecar logger) streams these logs to its own stdout. Since kubelet controls stdout, the application's logs can be then accessed using kubectl logs PODNAME CONTAINER_NAME (see the image below). To complete this example, we need the following prerequisites: A working Kubernetes cluster. See our guide for more information about deploying a Kubernetes cluster with Supergiant. As another option, you can install a single-node Kubernetes cluster on a local system with Minikube. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. We create a Pod that pulls an NGINX container (application container) writing some logs and the sidecar container with a busybox Docker image that provides several Unix tools like bash in a single executable file. apiVersion: v1 kind: Pod metadata: name: sidecar-logging labels: app: sidecar-logging spec: containers: - name: nginx image: nginx:latest volumeMounts: - name: varlog mountPath: /var/log/nginx/ ports: - containerPort: 80 - name: busybox image: busybox args: [/bin/sh, -c, 'tail -f /var/log/nginx/access.log'] volumeMounts: - name: varlog mountPath: /var/log/nginx/ volumes: - name: varlog In this Pod definition: Create the Pod named 'sidecar-logging' with NGINX and busybox containers. Define a volume "varlog" to be used by both containers. We are mounting /var/log/nginx directory from the Docker's filesystem so that the sidecar container (busybox) has access to NGINX log files (error.log and access.log) Open containerPort: 80 for the NGINX container. Using Unix shell from the busybox distribution, we ask busybox to tail logs from /var/log/nginx/access.log. Tail is a UNIX native program that allows tracking the tail end of text files, stream, and display new lines as they are added. This is very useful for dynamic tracking of NGINX log files. The log file that will be tracked stores logs generated by HTTP client requests to NGINX server such as visiting the page in a browser or sending curl requests to the server. As you see, our sidecar logging container is quite simple, but that is enough to illustrate how shared storage can be used for inter-container communication. The next thing we need to do is to save the above Pod spec in the sidecar-logging.yaml and deploy our Pod running the following command: kubectl create -f sidecar-logging.yaml Our Pod is now running but is not exposed to the external world. In order to access NGINX server from outside of our cluster, we need to create a NodePort Service type like this: kubectl expose pod sidecar-logging --type=NodePort --port=80 service "sidecar-logging" exposed Now, we can access NGINX on the yourhost:NodePort, which will redirect to the NGINX container listening on the port:80. However, first we need to find out what port Kubernetes assigned to the NodePort service. kubectl describe service sidecar-logging Name: sidecar-logging Namespace: default Labels: app=sidecar-logging Selector: app=sidecar-logging Type: NodePort IP: 10.3.231.160 Port: 80/TCP NodePort: 31399/TCP Endpoints: 10.2.6.6:80 Session Affinity: None As you see, we have the 31399 port assigned. Now, we can trigger the server to write some logs to the /var/log/nginx/access.log file by accessing yourhost:31399 from your browser or sending arbitrary curl requests. Let's check out what logs our sidecar logging container has to display by running kubectl logs sidecar-logging busybox. This will return access logs specifying the IP of the machine that sent the request, HTTP resources to which requests were sent, type of requests, user agents (browsers), and dates. You should have an output similar to this: 10.2.93.0 - - [24/May/2018:13:43:18 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" 10.2.93.0 - - [24/May/2018:13:43:18 +0000] "GET /robots.txt HTTP/1.1" 404 572 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" 10.2.93.0 - - [24/May/2018:13:43:19 +0000] "GET /favicon.ico HTTP/1.1" 404 572 "http://ec2-52-62-39-157.ap-southeast-2.compute.amazonaws.com:31399/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" "-" That's it! Now you understand how a sidecar container can use shared storage to collect logs from the main application and display them. The container can also ship logs to some logging backend, but the example above is enough for you to get some basic idea of using shared storage in Kubernetes. Conclusion In this article, we examined Kubernetes volumes -- powerful abstractions that enable diverse storage options for Pods and containers. Thanks to over 25 supported volume types and the capacity to create new volume plugins, Kubernetes allows deploying applications with any storage types, both stateless and stateful. In this tutorial, however, we largely focused on stateless applications that use volumes with a finite lifecycle. In our next articles, we will discuss persistent storage using PersistentVolumes and other storage resources for the deployment of the full-fledged stateful applications in Kubernetes. Stay tuned for new content to find out more! [Less]
Posted over 7 years ago by Kirill
Kubernetes is a powerful platform for managing containerized applications. It supports their deployment, scheduling, replication, updating, monitoring, and much more. Kubernetes has become a complex system due to the addition of ... [More] new abstractions, resource types, cloud integrations, and add-ons. Further, Kubernetes cluster networking is perhaps one of the most complex components of the Kubernetes infrastructure because it involves so many layers and parts (e.g., container-to-container networking, Pod networking, services, ingress, load balancers), and many users are struggling to make sense of it all. The goal of Kubernetes networking is to turn containers and Pods into bona fide "virtual hosts" that can communicate with each other across nodes while combining the benefits of VMs with a microservices architecture and containerization. Kubernetes networking is based on several layers, all serving this ultimate purpose: Container-to-container communication using localhost and the Pod's network namespace. This networking level enables the container network interfaces for tightly coupled containers that can communicate with each other on specified ports much like the conventional applications communicate via localhost. Pod-to-pod communication that enables communication of Pods across Nodes. If you want to learn more about Pods, see our recent article). Services. A Service abstraction defines a policy (microservice) for accessing Pods by other applications. Ingress, load balancing, and DNS. Sounds like a lot of stuff, doesn't it? It is. That's why we decided to create a series of articles explaining Kubernetes networking from the bottom (container-to-container communication) to the top (pod networking, services, DNS, and load balancing). In the first part of the series, we discuss container-to-container and pod-to-pod networking. We demonstrate how Kubernetes networking is different from the "normal" Docker approach, what requirements for networking implementations it imposes, and how it achieves a homogeneous networking system that allows Pods communication across nodes. We think that by the end of this article you'll have a better understanding of Kubernetes networking that will prepare you for the deployment of the full-fledged microservices applications using Kubernetes services, DNS, and load balancing. Fundamentals of Kubernetes Networking Kubernetes platform aims to simplify cluster networking by creating a flat network structure that frees users from setting up dynamic port allocation to coordinate ports, designing custom routing rules and sub-nets, and using Network Address Translation (NAT) to move packets across different network segments. To achieve this, Kubernetes prohibits networking implementations involving any intentional network segmentation policy. In other words, Kubernetes aims to keep the networking architecture as simple as possible for the end user. The Kubernetes platform sets the following networking rules: All containers should communicate with each other without NAT. All nodes should communicate with all containers without NAT. The IP as seen by one container is the same as seen by the other container (in other words, Kubernetes bars any IP masquerading). Pods can communicate regardless of what Node they sit on. To understand how Kubernetes implements these rules, let's first discuss the Docker model that serves as a point of reference for Kubernetes networking. Overview of the Docker Networking Model As you might know, Docker supports numerous network architectures like overlay networks and Macvlan networks, but its default networking solution is based on host-private networking implemented by the bridge networking driver. To clarify the terms, as with any other private network, Docker's host-private networking model is based on a private IP address space that can be freely used by anybody without the approval of the Internet registry but that has to be translated using NAT or a proxy server if the network needs to connect to the Internet. A host-private network is a private network that lives on one host as opposed to a multi-host private network that covers multiple hosts. Governed by this model, Docker's bridge driver implements the following: First, Docker creates a virtual bridge (docker0) and allocates a subnet from one of the private address blocks for that bridge. A network bridge is a device that creates a single merged network from multiple networks or network segments. By the same token, a virtual bridge is an analogy of a physical network bridge used in the virtual networking. Virtual network bridges like docker0 allow connecting virtual machines (VMs) or containers into a single virtual network. This is precisely what the Docker's bridge driver is designed for. To connect containers to the virtual network, Docker allocates a virtual ethernet device called veth attached to the bridge. Similarly to a virtual bridge, veth is a virtual analogy of the ethernet technology used to connect hosts to LAN or Internet or package and to pass data using a wide variety of protocols. The veth is mapped to eth0 network interface, which is Linux's Ethernet interface that manages Ethernet device and connection between the host and the network. In Docker, each in-container eth0 is provided with an IP address from the bridge's address range. In this way, each container gets its own IP address from that range. The above-described architecture is schematically represented in the image below. In this image, we see that both Container 1 and Container 2 are part of the virtual private network created by the docker0 bridge. Each of the containers has a veth interface connected to the docker0 bridge. Since both containers and their veth interfaces are on the same logical network, they can easily communicate if they manage to discover each other's IP addresses. However, since both containers are allocated a unique veth, there is no shared network interface between them, which hinders coordinated communication, container isolation, and ability to encapsulate them in a single abstraction like pod. Docker allows solving this problem by allocating ports, which then can be forwarded or proxied to other containers. This has a limitation that containers should coordinate the ports usage very carefully or allocate them dynamically. Kubernetes Solution Kubernetes bypasses the above-mentioned limitation by providing a shared network interface for containers. Using the analogy from the Docker model, Kubernetes allows containers to share a single veth interface like in the image below. As a result, Kubernetes model augments the default host-private networking approach in the following way: Allows both containers to be addressable on veth0 (e.g., 172.17.02 in the image above). Allows containers to access each other via allocated ports on localhost. Practically speaking, this is the same as running applications on a host with added benefits of container isolation and design of tightly coupled container architectures. To implement this model, Kubernetes creates a special container for each pod that provides a network interface for other containers. This container is started with a "pause" command that provides a virtual network interface for all containers, allowing them to communicate with each other. By now, you have a better understanding of how container-to-container networking works in Kubernetes. As we have seen, it is largely based on the augmented version of the bridge driver but with an added benefit of a shared network interface that provides better isolation and communication for containerized applications. Tutorial Now, let's illustrate a possible scenario of the communication between two containers running in a single pod. One of the most common examples of the multi-container communication via localhost is when one container like Apache HTTP server or NGINX is configured as a reverse proxy that proxies requests to a web application running in another container. Elaborating upon this case, we are going to discuss a situation when the NGINX container is configured to proxy request from its default port (:80) to the Ghost publishing platform accessible on some port (e.g port:2368) To complete this example, we'll need the following prerequisites: A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1: Define a ConfigMap ConfigMaps are Kubernetes objects that allow decoupling the app's configuration from the Pod's spec enabling better modularity of your settings. In the example below, we are defining a ConfigMap for NGINX server that includes a basic reverse proxy configuration. apiVersion: v1 kind: ConfigMap metadata: name: nginx-conf data: nginx.conf: |- user nginx; worker_processes 2; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } http { sendfile on; keepalive_timeout 65; include /etc/nginx/conf.d/*.conf; server { listen 80 default_server; location /ghost { proxy_pass http://127.0.0.1:2368; } } } In brief, this ConfigMap tells NGINX to proxy requests from its default port localhost:80 to localhost:2368 on which the Ghost is listening to requests. This ConfigMap should be first passed to Kubernetes before we can deploy a Pod. Save the ConfigMap in a file (e.g.,  nginx-config.yaml), and then run the following command: kubectl create -f nginx-config.yaml Step #2: Create a Deployment The next thing we need to do is to create a Deployment for our two-container pod (see our recent article for the review of the Pod deployment options in Kubernetes). apiVersion: apps/v1 kind: Deployment metadata: name: tut-deployment labels: app: tut spec: replicas: 1 selector: matchLabels: app: tut template: metadata: labels: app: tut spec: containers: - name: ghost image: ghost:latest ports: - containerPort: 2368 - name: nginx image: nginx:alpine ports: - containerPort: 80 volumeMounts: - name: proxy-config mountPath: /etc/nginx/nginx.conf subPath: nginx.conf volumes: - name: proxy-config configMap: name: nginx-conf These deployment specs: Define a deployment named 'tut-deployment' (metadata.name) and assign a label 'tut' to all pods of this deployment (metadata.labels.app). Sets desired state of the deployment to 2 replicas (spec.replicas). Define two containers: 'ghost' that uses ghost Docker container image and 'nginx' container that uses the nginx image from Docker repository. Open a container port:80 for the 'nginx' container (spec.containers.name.image). Create a volume 'proxy-config' and one volume of a type configMap named 'nginx-config' that will be used by containers to access a ConfigMap resource titled "nginx-config" created in the previous step. Mounts 'proxy-config' volume to the path /etc/nginx/nginx.conf to enable the container's access to NGINX configuration. To create this deployment, save the above manifest in the tut-deployment.yaml file and run the following command: kubectl create -f tut-deployment.yaml If everything is OK, you will be able to see the running deployment using kubectl get deployment tut-deployment: NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE tut-deployment 2 2 2 0 13s Step #3: Exposing a NodePort Now, as our Pods are running, we should expose the NGINX port:80 to the public Internet to see if the reverse proxy works. This can be done by exposing the Deployment as a Service (in the next tutorial, we are going to cover Kubernetes services in more detail): kubectl expose deployment tut-deployment --type=NodePort --port=80 service "tut-deployment" exposed After our deployment is exposed, we need to find a NodePort dynamically assigned to it: kubectl describe service tut-deployment This command will produce an output similar to this: Name: tut-deployment Namespace: default Labels: app=tut Selector: app=tut Type: NodePort IP: 10.3.208.190 Port: 80/TCP NodePort: 30234/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None We need a NodePort value, which is 30234 in our case. Now you can access the ghost publishing platform through NGINX using http://YOURHOST:30234. That's it! Now you see how containers can easily communicate via localhost using built-in Pod's virtual network. As such, a container-to-container networking is a building block of the next layer, which is a pod-to-pod networking discussed in the next section. From Container-to-Container to Pod-to-Pod Communication One of the most exciting features in Kubernetes is that pods and containers within pods can communicate with each other even if they land on different nodes. This feature is something that is not implemented in Docker by default (Note: Docker supports multi-host connectivity as a custom solution available via overlay driver). Before delving deeper into how Kubernetes implements pod-to-pod networking, let's first discuss how networking works on a pod level. As we remember from the previous tutorial, pods are abstractions that encapsulate containers to provide Kubernetes services like shared storage, networking interfaces, deployment, and updates to them. When Kubernetes creates a pod, it allocates an IP address to it. This IP is shared by all containers in that pod and allows them to communicate with each other using localhost (as we saw in the example above). This is known as "IP-per-Pod" model. It is an extremely convenient model where pods can be treated much like physical hosts or VMs from the standpoint of port allocation, service discovery, load balancing, migration, and more. So far, so good! But what if we want our pods to be able to communicate across nodes? This becomes a little more complicated. Referring to the example above, let's assume that we now have two nodes hosting two containers each. All these containers are connected using docker0 bridges and have shared veth0 network interfaces. However, on both nodes a Docker bridge (docker0) and a virtual ethrenet interface ( veth0 ) are now likely to have the same IP address because they were both created by the same default Docker function. Even if veth IPs are different, we still do not avoid a problem of an individual node being unaware of private network address space created on another node, which makes it difficult to reach pods on it. How Does Kubernetes Solve this Problem? Let's see how Kubernetes elegantly solves this problem. As we see in the image below, veth0, custom bridge, eth0, and a gateway that connects two nodes are now parts of the shared private network namespace centered around the gateway (10.100.01). This configuration implies that Kubernetes has somehow managed to create a separate network that covers two nodes. You may also notice that addresses to bridges are now assigned depending on what node a bridge is living on. So for example, we now have a 10.0.1... address space shared by a custom bridge and veth0 on Node 1 and a 10.0.2... address space shared by the same components on Node 2. At the same time, however, eth0 on both nodes share the address space of the common gateway, which allows both nodes to communicate (10.100.0.0 address space). The design of this network is similar to an overlay network. (In a nutshell, an overlay network is a network built on top of another low-level network.) For example, the internet was originally built as an overlay over the telephone network. A pod network in Kubernetes is an example of an overlay network that takes individual private networks within each node and transforms them into a new software-defined network (SDN) with a shared namespace, which allows pods to communicate across nodes. That's how the Kubernetes magic works! Kubernetes ships with this model by default, but there are several networking solutions that achieve the same result. Remember that any network implementation that violates Kubernetes networking principles (mentioned in the Intro) will not work with Kubernetes. Some of the most popular networking implementations supported by Kubernetes are the following: Cisco Application Centric Infrastructure -- an integrated overlay and underlay SDN solution with the support for containers, virtual machines, and bare metal servers. Cilium -- open source software for container applications with a strong security model. Flannel -- a simple overlay network that satisfies all Kubernetes requirements while being one of the most easiest to install and run. For more available networking solutions, see the official Kubernetes documentation. Conclusion In this article, we covered two basic components of the Kubernetes networking architecture: container-to-container networking and pod-to-pod networking. We have seen that Kubernetes uses overlay networking to create a flat network structure where containers and pods can communicate with each other across nodes. All routing rules and IP namespaces are managed by Kubernetes by default, so there is no need to bother creating subnets and using dynamic port allocation. In fact, there are several out-of-the-box overlay network implementations to get you started. Kubernetes networking enables an easy migration of applications from VMs to pods, which can be treated as "virtual hosts" with the functionality of VMs but with an added benefit of container isolation and microservices architecture. In our following tutorial, we discuss the next layer of the Kubernetes networking: services, which are abstractions that implement microservices and service discovery for pods, enabling highly available applications accessible from the outside of a Kubernetes cluster. [Less]
Posted over 7 years ago by Kirill
Kubernetes is a powerful platform for managing containerized applications. It supports their deployment, scheduling, replication, updating, monitoring, and much more. Kubernetes has become a complex system due to the addition of ... [More] new abstractions, resource types, cloud integrations, and add-ons. Further, Kubernetes cluster networking is perhaps one of the most complex components of the Kubernetes infrastructure because it involves so many layers and parts (e.g., container-to-container networking, Pod networking, services, ingress, load balancers), and many users are struggling to make sense of it all. The goal of Kubernetes networking is to turn containers and Pods into bona fide "virtual hosts" that can communicate with each other across nodes while combining the benefits of VMs with a microservices architecture and containerization. Kubernetes networking is based on several layers, all serving this ultimate purpose: Container-to-container communication using localhost and the Pod's network namespace. This networking level enables the container network interfaces for tightly coupled containers that can communicate with each other on specified ports much like the conventional applications communicate via localhost. Pod-to-pod communication that enables communication of Pods across Nodes. If you want to learn more about Pods, see our recent article). Services. A Service abstraction defines a policy (microservice) for accessing Pods by other applications. Ingress, load balancing, and DNS. Sounds like a lot of stuff, doesn't it? It is. That's why we decided to create a series of articles explaining Kubernetes networking from the bottom (container-to-container communication) to the top (pod networking, services, DNS, and load balancing). In the first part of the series, we discuss container-to-container and pod-to-pod networking. We demonstrate how Kubernetes networking is different from the "normal" Docker approach, what requirements for networking implementations it imposes, and how it achieves a homogeneous networking system that allows Pods communication across nodes. We think that by the end of this article you'll have a better understanding of Kubernetes networking that will prepare you for the deployment of the full-fledged microservices applications using Kubernetes services, DNS, and load balancing. Fundamentals of Kubernetes Networking Kubernetes platform aims to simplify cluster networking by creating a flat network structure that frees users from setting up dynamic port allocation to coordinate ports, designing custom routing rules and sub-nets, and using Network Address Translation (NAT) to move packets across different network segments. To achieve this, Kubernetes prohibits networking implementations involving any intentional network segmentation policy. In other words, Kubernetes aims to keep the networking architecture as simple as possible for the end user. The Kubernetes platform sets the following networking rules: All containers should communicate with each other without NAT. All nodes should communicate with all containers without NAT. The IP as seen by one container is the same as seen by the other container (in other words, Kubernetes bars any IP masquerading). Pods can communicate regardless of what Node they sit on. To understand how Kubernetes implements these rules, let's first discuss the Docker model that serves as a point of reference for Kubernetes networking. Overview of the Docker Networking Model As you might know, Docker supports numerous network architectures like overlay networks and Macvlan networks, but its default networking solution is based on host-private networking implemented by the bridge networking driver. To clarify the terms, as with any other private network, Docker's host-private networking model is based on a private IP address space that can be freely used by anybody without the approval of the Internet registry but that has to be translated using NAT or a proxy server if the network needs to connect to the Internet. A host-private network is a private network that lives on one host as opposed to a multi-host private network that covers multiple hosts. Governed by this model, Docker's bridge driver implements the following: First, Docker creates a virtual bridge (docker0) and allocates a subnet from one of the private address blocks for that bridge. A network bridge is a device that creates a single merged network from multiple networks or network segments. By the same token, a virtual bridge is an analogy of a physical network bridge used in the virtual networking. Virtual network bridges like docker0 allow connecting virtual machines (VMs) or containers into a single virtual network. This is precisely what the Docker's bridge driver is designed for. To connect containers to the virtual network, Docker allocates a virtual ethernet device called veth attached to the bridge. Similarly to a virtual bridge, veth is a virtual analogy of the ethernet technology used to connect hosts to LAN or Internet or package and to pass data using a wide variety of protocols. The veth is mapped to eth0 network interface, which is Linux's Ethernet interface that manages Ethernet device and connection between the host and the network. In Docker, each in-container eth0 is provided with an IP address from the bridge's address range. In this way, each container gets its own IP address from that range. The above-described architecture is schematically represented in the image below. In this image, we see that both Container 1 and Container 2 are part of the virtual private network created by the docker0 bridge. Each of the containers has a veth interface connected to the docker0 bridge. Since both containers and their veth interfaces are on the same logical network, they can easily communicate if they manage to discover each other's IP addresses. However, since both containers are allocated a unique veth, there is no shared network interface between them, which hinders coordinated communication, container isolation, and ability to encapsulate them in a single abstraction like pod. Docker allows solving this problem by allocating ports, which then can be forwarded or proxied to other containers. This has a limitation that containers should coordinate the ports usage very carefully or allocate them dynamically. Kubernetes Solution Kubernetes bypasses the above-mentioned limitation by providing a shared network interface for containers. Using the analogy from the Docker model, Kubernetes allows containers to share a single veth interface like in the image below. As a result, Kubernetes model augments the default host-private networking approach in the following way: Allows both containers to be addressable on veth0 (e.g., 172.17.02 in the image above). Allows containers to access each other via allocated ports on localhost. Practically speaking, this is the same as running applications on a host with added benefits of container isolation and design of tightly coupled container architectures. To implement this model, Kubernetes creates a special container for each pod that provides a network interface for other containers. This container is started with a "pause" command that provides a virtual network interface for all containers, allowing them to communicate with each other. By now, you have a better understanding of how container-to-container networking works in Kubernetes. As we have seen, it is largely based on the augmented version of the bridge driver but with an added benefit of a shared network interface that provides better isolation and communication for containerized applications. Tutorial Now, let's illustrate a possible scenario of the communication between two containers running in a single pod. One of the most common examples of the multi-container communication via localhost is when one container like Apache HTTP server or NGINX is configured as a reverse proxy that proxies requests to a web application running in another container. Elaborating upon this case, we are going to discuss a situation when the NGINX container is configured to proxy request from its default port (:80) to the Ghost publishing platform accessible on some port (e.g port:2368) To complete this example, we'll need the following prerequisites: A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube. A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here. Step #1: Define a ConfigMap ConfigMaps are Kubernetes objects that allow decoupling the app's configuration from the Pod's spec enabling better modularity of your settings. In the example below, we are defining a ConfigMap for NGINX server that includes a basic reverse proxy configuration. apiVersion: v1 kind: ConfigMap metadata: name: nginx-conf data: nginx.conf: |- user nginx; worker_processes 2; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } http { sendfile on; keepalive_timeout 65; include /etc/nginx/conf.d/*.conf; server { listen 80 default_server; location /ghost { proxy_pass http://127.0.0.1:2368; } } } In brief, this ConfigMap tells NGINX to proxy requests from its default port localhost:80 to localhost:2368 on which the Ghost is listening to requests. This ConfigMap should be first passed to Kubernetes before we can deploy a Pod. Save the ConfigMap in a file (e.g.,  nginx-config.yaml), and then run the following command: kubectl create -f nginx-config.yaml Step #2: Create a Deployment The next thing we need to do is to create a Deployment for our two-container pod (see our recent article for the review of the Pod deployment options in Kubernetes). apiVersion: apps/v1 kind: Deployment metadata: name: tut-deployment labels: app: tut spec: replicas: 1 selector: matchLabels: app: tut template: metadata: labels: app: tut spec: containers: - name: ghost image: ghost:latest ports: - containerPort: 2368 - name: nginx image: nginx:alpine ports: - containerPort: 80 volumeMounts: - name: proxy-config mountPath: /etc/nginx/nginx.conf subPath: nginx.conf volumes: - name: proxy-config configMap: name: nginx-conf These deployment specs: Define a deployment named 'tut-deployment' (metadata.name) and assign a label 'tut' to all pods of this deployment (metadata.labels.app). Sets desired state of the deployment to 2 replicas (spec.replicas). Define two containers: 'ghost' that uses ghost Docker container image and 'nginx' container that uses the nginx image from Docker repository. Open a container port:80 for the 'nginx' container (spec.containers.name.image). Create a volume 'proxy-config' and one volume of a type configMap named 'nginx-config' that will be used by containers to access a ConfigMap resource titled "nginx-config" created in the previous step. Mounts 'proxy-config' volume to the path /etc/nginx/nginx.conf to enable the container's access to NGINX configuration. To create this deployment, save the above manifest in the tut-deployment.yaml file and run the following command: kubectl create -f tut-deployment.yaml If everything is OK, you will be able to see the running deployment using kubectl get deployment tut-deployment: NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE tut-deployment 2 2 2 0 13s Step #3: Exposing a NodePort Now, as our Pods are running, we should expose the NGINX port:80 to the public Internet to see if the reverse proxy works. This can be done by exposing the Deployment as a Service (in the next tutorial, we are going to cover Kubernetes services in more detail): kubectl expose deployment tut-deployment --type=NodePort --port=80 service "tut-deployment" exposed After our deployment is exposed, we need to find a NodePort dynamically assigned to it: kubectl describe service tut-deployment This command will produce an output similar to this: Name: tut-deployment Namespace: default Labels: app=tut Selector: app=tut Type: NodePort IP: 10.3.208.190 Port: 80/TCP NodePort: 30234/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None We need a NodePort value, which is 30234 in our case. Now you can access the ghost publishing platform through NGINX using http://YOURHOST:30234. That's it! Now you see how containers can easily communicate via localhost using built-in Pod's virtual network. As such, a container-to-container networking is a building block of the next layer, which is a pod-to-pod networking discussed in the next section. From Container-to-Container to Pod-to-Pod Communication One of the most exciting features in Kubernetes is that pods and containers within pods can communicate with each other even if they land on different nodes. This feature is something that is not implemented in Docker by default (Note: Docker supports multi-host connectivity as a custom solution available via overlay driver). Before delving deeper into how Kubernetes implements pod-to-pod networking, let's first discuss how networking works on a pod level. As we remember from the previous tutorial, pods are abstractions that encapsulate containers to provide Kubernetes services like shared storage, networking interfaces, deployment, and updates to them. When Kubernetes creates a pod, it allocates an IP address to it. This IP is shared by all containers in that pod and allows them to communicate with each other using localhost (as we saw in the example above). This is known as "IP-per-Pod" model. It is an extremely convenient model where pods can be treated much like physical hosts or VMs from the standpoint of port allocation, service discovery, load balancing, migration, and more. So far, so good! But what if we want our pods to be able to communicate across nodes? This becomes a little more complicated. Referring to the example above, let's assume that we now have two nodes hosting two containers each. All these containers are connected using docker0 bridges and have shared veth0 network interfaces. However, on both nodes a Docker bridge (docker0) and a virtual ethrenet interface ( veth0 ) are now likely to have the same IP address because they were both created by the same default Docker function. Even if veth IPs are different, we still do not avoid a problem of an individual node being unaware of private network address space created on another node, which makes it difficult to reach pods on it. How Does Kubernetes Solve this Problem? Let's see how Kubernetes elegantly solves this problem. As we see in the image below, veth0, custom bridge, eth0, and a gateway that connects two nodes are now parts of the shared private network namespace centered around the gateway (10.100.01). This configuration implies that Kubernetes has somehow managed to create a separate network that covers two nodes. You may also notice that addresses to bridges are now assigned depending on what node a bridge is living on. So for example, we now have a 10.0.1... address space shared by a custom bridge and veth0 on Node 1 and a 10.0.2... address space shared by the same components on Node 2. At the same time, however, eth0 on both nodes share the address space of the common gateway, which allows both nodes to communicate (10.100.0.0 address space). The design of this network is similar to an overlay network. (In a nutshell, an overlay network is a network built on top of another low-level network.) For example, the internet was originally built as an overlay over the telephone network. A pod network in Kubernetes is an example of an overlay network that takes individual private networks within each node and transforms them into a new software-defined network (SDN) with a shared namespace, which allows pods to communicate across nodes. That's how the Kubernetes magic works! Kubernetes ships with this model by default, but there are several networking solutions that achieve the same result. Remember that any network implementation that violates Kubernetes networking principles (mentioned in the Intro) will not work with Kubernetes. Some of the most popular networking implementations supported by Kubernetes are the following: Cisco Application Centric Infrastructure -- an integrated overlay and underlay SDN solution with the support for containers, virtual machines, and bare metal servers. Cilium -- open source software for container applications with a strong security model. Flannel -- a simple overlay network that satisfies all Kubernetes requirements while being one of the most easiest to install and run. For more available networking solutions, see the official Kubernetes documentation. Conclusion In this article, we covered two basic components of the Kubernetes networking architecture: container-to-container networking and pod-to-pod networking. We have seen that Kubernetes uses overlay networking to create a flat network structure where containers and pods can communicate with each other across nodes. All routing rules and IP namespaces are managed by Kubernetes by default, so there is no need to bother creating subnets and using dynamic port allocation. In fact, there are several out-of-the-box overlay network implementations to get you started. Kubernetes networking enables an easy migration of applications from VMs to pods, which can be treated as "virtual hosts" with the functionality of VMs but with an added benefit of container isolation and microservices architecture. In our following tutorial, we discuss the next layer of the Kubernetes networking: services, which are abstractions that implement microservices and service discovery for pods, enabling highly available applications accessible from the outside of a Kubernetes cluster. [Less]
Posted over 7 years ago by Kirill
In the first part of the Kubernetes networking series, we previously discussed the first two building blocks of the Kubernetes networking: container-to-container communication and pod-to-pod communication. As you might recall ... [More] , Kubernetes uses a flat networking architecture that allows containers and pods to communicate with each other across nodes without using complex routing rules and NAT. However, we discussed only a scenario of communication between two containers and pods and did not look into how a set of pods can be accessed both within a cluster and outside of it. However, this scenario is very important because in a Kubernetes cluster we are usually dealing with the ReplicaSets of pods that represent multiple application instances (backends) serving some frontend apps or users. Pods in these ReplicaSets may have a finite life cycle, be perishable, and non-renewable. Given these constraints, we must understand how to turn individual pods into bona fide microservices, load-balanced and exposed to other pods or users. That's why, in this part of our Kubernetes networking series, we are moving to the discussion of Kubernetes services, which are one of the best features of the platform. We discuss how services work under the hood and how they can be created using Kubernetes native tools. By the end of this article, you'll have a better understanding of how to turn your pods into fully operational microservices capable of working at any scale. Pod-to-Pod Communication without Services As we remember, pods are ephemeral and mortal entities in Kubernetes. If the pod dies, it is not re-scheduled to a new node. Rather, Kubernetes may create an identical pod with the same name if needed but with a different IP and UID. We cannot rely on pods' IPs because they are perishable.  To illustrate this problem, let's imagine that we have two pods, one of which is a single-container pod running a simple Python server from the Docker repository and the other is a client pod that just sends GET requests to the first pod using its IP address. Let's first define the deployment object for the server pods: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: service-tut-deployment spec: replicas: 2 selector: matchLabels: app: service-tut-pod template: metadata: labels: app: service-tut-pod spec: containers: - name: python-http-server image: python:2.7 command: ["/bin/bash"] args: ["-c", "echo \" Hello from $(hostname)\" > index.html; python -m SimpleHTTPServer 80"] ports: - name: http containerPort: 80 This spec: Creates the deployment titled "service-tut-deployment" that is going to run two replicas of our simple HTTP server (spec.replicas). Creates a "service-tut-pod" selector for all pods running in the deployment. Pulls the Python container from the Docker repo and creates the index.html file for the simple Python HTTP server that will respond "Hello from $(hostname)" to GET requests. Creates a named port "http" and opens containerPort:80 for it (spec.containers.ports.containerPort). Save the deployment object into service-tut-deployment.yaml , and run the following command to create it: kubectl create -f service-tut-deployment.yaml Now, we can use kubectl get pods to see the pods credted by the Deployment: NAME READY STATUS RESTARTS AGE service-tut-deployment-1108534429-7p9ps 1/1 Running 0 26s service-tut-deployment-1108534429-lqnnj 1/1 Running 0 26s For our client pod (not yet defined) to be able to communicate with these pods, we should find the IP address at least of one of them. This can be done like this: kubectl describe pod service-tut-deployment-1108534429-7p9ps As you see, we are asking kubectl to describe one of the running pods, which sends the following output to the terminal: Name: service-tut-deployment-1108534429-7p9ps Namespace: default Node: ip-172-20-0-50.ap-southeast-2.compute.internal/172.20.0.50 Start Time: Mon, 21 May 2018 09:15:42 +0000 Labels: app=service-tut-pod pod-template-hash=1108534429 Status: Running IP: 10.2.6.7 The last line is the pod's IP we need: 10.2.6.7. So far, so good! Next, we need to define a pretty dumb client pod whose only purpose is to send HTTP requests to the pod's IP retrieved above. We are going to use a curl container retrieved from the Docker repository for that purpose. apiVersion: v1 kind: Pod metadata: name: service-tut-client spec: containers: - name: curl image: appropriate/curl command: ["/bin/sh"] args: ["-c","curl 10.2.6.7:80 "] As you see, this pod will be curling one of our pods on its IP and 80 port. Save the pod object in client-pod.yaml , and run the following command to create the client pod: kubectl create -f client-pod.yaml This pod will run the curl GET request to the specified backend Pod IP (10.2.6.7:80) and terminate once the operation is completed. We can find the server's response in 'service-tut-client' pod logs like this: kubectl logs service-tut-client % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 58 100 58 0 0 58000 0 --:--:-- --:--:-- --:--:-- 58000 Hello from service-tut-deployment-1108534429-7p9ps As you see, our Python server responded with "Hello from service-tut-deployment-1108534429". However, our client pod is firmly attached to one of the backend pods. In essence, having a frontend pod with the hard-coded backend IP is a limited approach (see the image below). As we remember, pods are ephemeral entities, and this leads to a problem: frontends interacting with a set of backend pods don't have a way to track their IP addresses (which might change). For example, if our pod is terminated for some reason (for example, as a best-effort pod, it's one of the main targets for termination in case of memory and CPU shortage), it will be replaced by the pod with a new IP. Correspondingly, the client using the previous IP is likely to break since there are no longer pods using this IP. A Kubernetes service is a solution to this problem. Formally speaking, a Kubernetes service is a REST object that defines a logical set of pods and some policy for accessing them (e.g., microservice). A service targets a set of pods using their label selector. Using services, frontend clients may not worry about which backend pod they are actually accessing. Similarly to pods, services can be created using YAML manifests and posted to the Kubernetes API server. Let's see how it works in the example below. Linking Deployment to a Service We are going to link our pods to a service to decouple frontend pods from the backend pods. A simple service object is enough for our purpose. kind: Service apiVersion: v1 metadata: name: tut-service spec: selector: app: service-tut-pod ports: - protocol: TCP port: 4000 targetPort: http This service spec: Creates a service named "tut-service" (metadata.name) and assigns a logical set of pods labeled 'service-tut-pod' to it. Now, all pods labeled 'service-tut-pod' can be accessed by the service. A spods based on their label selector is called a service with a selector (this type is used in our service manifest). Besides this, Kubernetes supports services without a selector, which are useful when you need to point your service to another cluster or namespace or if you are migrating workloads to Kubernetes and some of your backends are outside of the Kubernetes cluster. Specifies a TCP protocol for the service's ports (spec.ports.protocol) (Note: Kubernetes Services support both TCP and UDP protocols.) Opens the service's port:4000 (spec.ports.port) and sets the field spec.ports.targetPort to "http". This field specifies the port on backend pods to which our service will forward client requests. Note, that Kubernetes supports named ports. Thus, targetPort:http , in fact, refers to the port:80 defined in our 'service-tut-deployment' deployment for backend pods. Named ports are very flexible and convenient to use. For example, you can change the port number in the next version of the backend software without breaking the clients. In addition to named ports, Kubernetes supports multi-port services that can expose more than one port. In this case, all ports should be named: ports: - name: http protocol: TCP port: 80 targetPort: 2941 - name: https protocol: TCP port: 443 targetPort: 2941 That's it! We've described our service object in detail. This service manifest looks quite simple, but, a lot of things are happening under the hood. Before delving deeper into these details, let's save our service spec in the tut-service.yaml and create a service. (Note: Kubernetes requires pods referenced by the service to be deployed in advance of the Service.) kubectl create -f tut-service.yaml service "tut-service" created Now, let's see the detailed description of our new service: kubectl describe service tut-service Name: tut-service Namespace: default Labels: Selector: app=service-tut-pod Type: ClusterIP IP: 10.3.28.57 Port: 4000/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None The console output provides useful details: Selector: a label selector of pods sharing the service (service-tut-pod). Type: a service type which is ClusterIP, a default service type that allows client pods to access a service only if they are running in the same cluster as the service pods. Other available service types include NodePort, Load Balancer, and some others to be discussed in the next tutorials. IP: a Virtual IP (VIP) of a service. A kube-proxy running on a node is responsible for assigning VIPs for the services of a type other than ExternalName. Port: a service's port (4000). Endpoints: The IPs of all service pods. If we were to create a service without a selector, no endpoints objects would be created, so we would have to define them manually pointing the service to our backend similar to this: kind: Endpoints apiVersion: v1 metadata: name: tut-service subsets: - addresses: - ip: 1.2.3.4 ports: - port: 9376 However, let's go back to our 'tut-service' service. We are now ready to refer to it in our client pod to access the underlying backend pods. But first we need to make some changes to the client pod specs: apiVersion: v1 kind: Pod metadata: name: service-tut-client spec: containers: - name: curl image: appropriate/curl command: ["/bin/sh"] args: ["-c","curl tut-service:4000 "] As you see, we are now directly curling  tut-service:4000 instead of the backend pod's IP as in the previous example. We can use a DNS name of the service ("tut-service") because our Kubernetes cluster uses a Kube-DNS add-on that watches the Kubernetes API for new services and creates DNS records for each of them. If Kube-DNS is enabled across your cluster, then all pods can perform name resolution of services automatically. However, you can certainly continue to use the ClusterIP of your service. To update our client pod with these new settings, run the following command: kubectl apply -f client-pod.yaml When you now look into the logs of the client pod, you'll see that responses by backend service pods are split approximately 50-to-50 percent, which suggests that the service acts as a load balancer using a round-robin algorithm or random selection of sods. Now, instead of addressing a specific backend pod, the client pod can send its requests to the service that will distribute them between the backend pods (see the Image below). That's it! As you see, we don't have to use the pods' IP anymore and can let the service load balance between regularly updated backend endpoints. However, how does Kubernetes actually implement this magic? Let's delve deeper into this question.  How Do Kubernetes Services Work? To answer this question, let's go back to our service's description. The first thing that stands out is that our service uses an IP address space different from the one used by backend pods to which it is referring (ClusterIP is from 10.3.0.0 address space whereas both pods belong to the 10.2.0.0 address space). kubectl describe service tut-service Name: tut-service Namespace: default Labels: Selector: app=service-tut-pod Type: ClusterIP IP: 10.3.28.57 Port: 4000/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None This observation implies that services and their corresponding pods land on different networks. However, how then are they able to communicate? And how do services decide to which pods to send client requests? It turns out that services are assigned with Virtual IPs (VIPs) created and managed by kube-proxy. The latter is a network proxy running on each node with a task of reflecting services defined in the Kubernetes API and executing simple TCP or UDB upstream forwarding or round-robin TCP/UDP forwarding across a set of backends. In brief, Kube proxy is responsible for redirecting requests from the service VIP to the IPs of backend pods using some packet rules. The proxy can act in several modes that differ in their implementation of packets forwarding from clients to the backend. These three modes are userspace, iptables, and ipvs. Let's briefly discuss how they work. Userspace Mode This proxy mode is called "userspace" because kube-proxy performs most of its work in the OS userspace. When the backend service is created, the proxy opens a randomly chosen port on the local node and installs iptables, which allow defining chains of rules for the management of network packets. The iptables capture the traffic to the service's clusterIP and port and redirect it to the backend pods. When the client (e.g., 'service-tut-client' discussed above) connects to the service, the iptables rule kicks in and redirects the request packets to the service proxy's own port. In its turn, the pod is using a round robin algorithm and starts sending traffic to it. The above-described mechanism implies that the service owner can use any port without worrying about possible port collisions. Regardless of this benefit, the userspace mode is the slowest one because kube-proxy has to frequently switch between the userspace and the kernel space. As a result, this mode will work only at small and medium scale. iptables Mode In this mode, kube-proxy directly installs iptables rules which redirect the traffic from the VIP to per-service rules. In their turn, these per-service rules are linked to per-endpoint rules that redirect to the backend pods. Backends are selected either randomly or based on a session affinity. It is noteworthy that in the iptables mode packets are never copied to the userspace and the kube-proxy does not need to be running for the VIP to work. Thus, the discussed mode is much faster than the userspace mode because there is no need to switch back and forth between the userspace and kernelspace. On the downside, the mode depends on having working readiness probes because it cannot automatically try another pod if the one it selects fails to respond. ipvs Mode In this mode, kube-proxy directly interacts with the netlink at the kernel level. The Netlink is a Linux kernel interface used for the inter-process communication (IPC) between the kernel and userspace processes and between userspace processes. The discussed mode works as follows. First, the proxy creates ipvs rules and syncs them with Kubernetes services and endpoints. These rules are then used to redirect the traffic to one of the backend pods. Ipvs is a new generation of packet rules that use kernel hash tables and work entirely in the kernel space. This means that ipvs is faster than both iptables and userspace modes. This mode also offers more options for load balancing, including such algorithms as round-robin, destination hashing, least connection, never queue, and more. Note: The Ipvs mode is a beta feature added in Kubernetes v1.9 and it also requires the IPVS kernel modules to be installed on the node. Conclusion In this article, we discussed how Kubernetes services enable the decoupling of frontend clients and backend pods. Working closely with kube-proxy, services redirect requests from their VIPs to labeled backend pods, so clients should not bother about what pods they actually access. This abstraction allows for the independent development of frontend and backend components of your application, which is a cornerstone of the microservices architecture and of Kubernetes. This part of our Kubernetes networking series was largely devoted to ClusterIP Service types that allow backend pods to be addressable only by clients running in the same cluster. In the next article of the series, we will discuss how to publish your Kubernetes services and expose them externally using NodePort and LoadBalancer service types. We'll also discuss how you can easily create services of the LoadBalancer type using Supergiant Kubernetes-as-a-Service platform. [Less]
Posted over 7 years ago by Kirill
In the first part of the Kubernetes networking series, we previously discussed the first two building blocks of the Kubernetes networking: container-to-container communication and pod-to-pod communication. As you might recall ... [More] , Kubernetes uses a flat networking architecture that allows containers and pods to communicate with each other across nodes without using complex routing rules and NAT. However, we discussed only a scenario of communication between two containers and pods and did not look into how a set of pods can be accessed both within a cluster and outside of it. However, this scenario is very important because in a Kubernetes cluster we are usually dealing with the ReplicaSets of pods that represent multiple application instances (backends) serving some frontend apps or users. Pods in these ReplicaSets may have a finite life cycle, be perishable, and non-renewable. Given these constraints, we must understand how to turn individual pods into bona fide microservices, load-balanced and exposed to other pods or users. That's why, in this part of our Kubernetes networking series, we are moving to the discussion of Kubernetes services, which are one of the best features of the platform. We discuss how services work under the hood and how they can be created using Kubernetes native tools. By the end of this article, you'll have a better understanding of how to turn your pods into fully operational microservices capable of working at any scale. Pod-to-Pod Communication without Services As we remember, pods are ephemeral and mortal entities in Kubernetes. If the pod dies, it is not re-scheduled to a new node. Rather, Kubernetes may create an identical pod with the same name if needed but with a different IP and UID. We cannot rely on pods' IPs because they are perishable.  To illustrate this problem, let's imagine that we have two pods, one of which is a single-container pod running a simple Python server from the Docker repository and the other is a client pod that just sends GET requests to the first pod using its IP address. Let's first define the deployment object for the server pods: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: service-tut-deployment spec: replicas: 2 selector: matchLabels: app: service-tut-pod template: metadata: labels: app: service-tut-pod spec: containers: - name: python-http-server image: python:2.7 command: ["/bin/bash"] args: ["-c", "echo \" Hello from $(hostname)\" > index.html; python -m SimpleHTTPServer 80"] ports: - name: http containerPort: 80 This spec: Creates the deployment titled "service-tut-deployment" that is going to run two replicas of our simple HTTP server (spec.replicas). Creates a "service-tut-pod" selector for all pods running in the deployment. Pulls the Python container from the Docker repo and creates the index.html file for the simple Python HTTP server that will respond "Hello from $(hostname)" to GET requests. Creates a named port "http" and opens containerPort:80 for it (spec.containers.ports.containerPort). Save the deployment object into service-tut-deployment.yaml , and run the following command to create it: kubectl create -f service-tut-deployment.yaml Now, we can use kubectl get pods to see the pods credted by the Deployment: NAME READY STATUS RESTARTS AGE service-tut-deployment-1108534429-7p9ps 1/1 Running 0 26s service-tut-deployment-1108534429-lqnnj 1/1 Running 0 26s For our client pod (not yet defined) to be able to communicate with these pods, we should find the IP address at least of one of them. This can be done like this: kubectl describe pod service-tut-deployment-1108534429-7p9ps As you see, we are asking kubectl to describe one of the running pods, which sends the following output to the terminal: Name: service-tut-deployment-1108534429-7p9ps Namespace: default Node: ip-172-20-0-50.ap-southeast-2.compute.internal/172.20.0.50 Start Time: Mon, 21 May 2018 09:15:42 +0000 Labels: app=service-tut-pod pod-template-hash=1108534429 Status: Running IP: 10.2.6.7 The last line is the pod's IP we need: 10.2.6.7. So far, so good! Next, we need to define a pretty dumb client pod whose only purpose is to send HTTP requests to the pod's IP retrieved above. We are going to use a curl container retrieved from the Docker repository for that purpose. apiVersion: v1 kind: Pod metadata: name: service-tut-client spec: containers: - name: curl image: appropriate/curl command: ["/bin/sh"] args: ["-c","curl 10.2.6.7:80 "] As you see, this pod will be curling one of our pods on its IP and 80 port. Save the pod object in client-pod.yaml , and run the following command to create the client pod: kubectl create -f client-pod.yaml This pod will run the curl GET request to the specified backend Pod IP (10.2.6.7:80) and terminate once the operation is completed. We can find the server's response in 'service-tut-client' pod logs like this: kubectl logs service-tut-client % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 58 100 58 0 0 58000 0 --:--:-- --:--:-- --:--:-- 58000 Hello from service-tut-deployment-1108534429-7p9ps As you see, our Python server responded with "Hello from service-tut-deployment-1108534429". However, our client pod is firmly attached to one of the backend pods. In essence, having a frontend pod with the hard-coded backend IP is a limited approach (see the image below). As we remember, pods are ephemeral entities, and this leads to a problem: frontends interacting with a set of backend pods don't have a way to track their IP addresses (which might change). For example, if our pod is terminated for some reason (for example, as a best-effort pod, it's one of the main targets for termination in case of memory and CPU shortage), it will be replaced by the pod with a new IP. Correspondingly, the client using the previous IP is likely to break since there are no longer pods using this IP. A Kubernetes service is a solution to this problem. Formally speaking, a Kubernetes service is a REST object that defines a logical set of pods and some policy for accessing them (e.g., microservice). A service targets a set of pods using their label selector. Using services, frontend clients may not worry about which backend pod they are actually accessing. Similarly to pods, services can be created using YAML manifests and posted to the Kubernetes API server. Let's see how it works in the example below. Linking Deployment to a Service We are going to link our pods to a service to decouple frontend pods from the backend pods. A simple service object is enough for our purpose. kind: Service apiVersion: v1 metadata: name: tut-service spec: selector: app: service-tut-pod ports: - protocol: TCP port: 4000 targetPort: http This service spec: Creates a service named "tut-service" (metadata.name) and assigns a logical set of pods labeled 'service-tut-pod' to it. Now, all pods labeled 'service-tut-pod' can be accessed by the service. A spods based on their label selector is called a service with a selector (this type is used in our service manifest). Besides this, Kubernetes supports services without a selector, which are useful when you need to point your service to another cluster or namespace or if you are migrating workloads to Kubernetes and some of your backends are outside of the Kubernetes cluster. Specifies a TCP protocol for the service's ports (spec.ports.protocol) (Note: Kubernetes Services support both TCP and UDP protocols.) Opens the service's port:4000 (spec.ports.port) and sets the field spec.ports.targetPort to "http". This field specifies the port on backend pods to which our service will forward client requests. Note, that Kubernetes supports named ports. Thus, targetPort:http , in fact, refers to the port:80 defined in our 'service-tut-deployment' deployment for backend pods. Named ports are very flexible and convenient to use. For example, you can change the port number in the next version of the backend software without breaking the clients. In addition to named ports, Kubernetes supports multi-port services that can expose more than one port. In this case, all ports should be named: ports: - name: http protocol: TCP port: 80 targetPort: 2941 - name: https protocol: TCP port: 443 targetPort: 2941 That's it! We've described our service object in detail. This service manifest looks quite simple, but, a lot of things are happening under the hood. Before delving deeper into these details, let's save our service spec in the tut-service.yaml and create a service. (Note: Kubernetes requires pods referenced by the service to be deployed in advance of the Service.) kubectl create -f tut-service.yaml service "tut-service" created Now, let's see the detailed description of our new service: kubectl describe service tut-service Name: tut-service Namespace: default Labels: Selector: app=service-tut-pod Type: ClusterIP IP: 10.3.28.57 Port: 4000/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None The console output provides useful details: Selector: a label selector of pods sharing the service (service-tut-pod). Type: a service type which is ClusterIP, a default service type that allows client pods to access a service only if they are running in the same cluster as the service pods. Other available service types include NodePort, Load Balancer, and some others to be discussed in the next tutorials. IP: a Virtual IP (VIP) of a service. A kube-proxy running on a node is responsible for assigning VIPs for the services of a type other than ExternalName. Port: a service's port (4000). Endpoints: The IPs of all service pods. If we were to create a service without a selector, no endpoints objects would be created, so we would have to define them manually pointing the service to our backend similar to this: kind: Endpoints apiVersion: v1 metadata: name: tut-service subsets: - addresses: - ip: 1.2.3.4 ports: - port: 9376 However, let's go back to our 'tut-service' service. We are now ready to refer to it in our client pod to access the underlying backend pods. But first we need to make some changes to the client pod specs: apiVersion: v1 kind: Pod metadata: name: service-tut-client spec: containers: - name: curl image: appropriate/curl command: ["/bin/sh"] args: ["-c","curl tut-service:4000 "] As you see, we are now directly curling  tut-service:4000 instead of the backend pod's IP as in the previous example. We can use a DNS name of the service ("tut-service") because our Kubernetes cluster uses a Kube-DNS add-on that watches the Kubernetes API for new services and creates DNS records for each of them. If Kube-DNS is enabled across your cluster, then all pods can perform name resolution of services automatically. However, you can certainly continue to use the ClusterIP of your service. To update our client pod with these new settings, run the following command: kubectl apply -f client-pod.yaml When you now look into the logs of the client pod, you'll see that responses by backend service pods are split approximately 50-to-50 percent, which suggests that the service acts as a load balancer using a round-robin algorithm or random selection of sods. Now, instead of addressing a specific backend pod, the client pod can send its requests to the service that will distribute them between the backend pods (see the Image below). That's it! As you see, we don't have to use the pods' IP anymore and can let the service load balance between regularly updated backend endpoints. However, how does Kubernetes actually implement this magic? Let's delve deeper into this question.  How Do Kubernetes Services Work? To answer this question, let's go back to our service's description. The first thing that stands out is that our service uses an IP address space different from the one used by backend pods to which it is referring (ClusterIP is from 10.3.0.0 address space whereas both pods belong to the 10.2.0.0 address space). kubectl describe service tut-service Name: tut-service Namespace: default Labels: Selector: app=service-tut-pod Type: ClusterIP IP: 10.3.28.57 Port: 4000/TCP Endpoints: 10.2.6.6:80,10.2.6.7:80 Session Affinity: None This observation implies that services and their corresponding pods land on different networks. However, how then are they able to communicate? And how do services decide to which pods to send client requests? It turns out that services are assigned with Virtual IPs (VIPs) created and managed by kube-proxy. The latter is a network proxy running on each node with a task of reflecting services defined in the Kubernetes API and executing simple TCP or UDB upstream forwarding or round-robin TCP/UDP forwarding across a set of backends. In brief, Kube proxy is responsible for redirecting requests from the service VIP to the IPs of backend pods using some packet rules. The proxy can act in several modes that differ in their implementation of packets forwarding from clients to the backend. These three modes are userspace, iptables, and ipvs. Let's briefly discuss how they work. Userspace Mode This proxy mode is called "userspace" because kube-proxy performs most of its work in the OS userspace. When the backend service is created, the proxy opens a randomly chosen port on the local node and installs iptables, which allow defining chains of rules for the management of network packets. The iptables capture the traffic to the service's clusterIP and port and redirect it to the backend pods. When the client (e.g., 'service-tut-client' discussed above) connects to the service, the iptables rule kicks in and redirects the request packets to the service proxy's own port. In its turn, the pod is using a round robin algorithm and starts sending traffic to it. The above-described mechanism implies that the service owner can use any port without worrying about possible port collisions. Regardless of this benefit, the userspace mode is the slowest one because kube-proxy has to frequently switch between the userspace and the kernel space. As a result, this mode will work only at small and medium scale. iptables Mode In this mode, kube-proxy directly installs iptables rules which redirect the traffic from the VIP to per-service rules. In their turn, these per-service rules are linked to per-endpoint rules that redirect to the backend pods. Backends are selected either randomly or based on a session affinity. It is noteworthy that in the iptables mode packets are never copied to the userspace and the kube-proxy does not need to be running for the VIP to work. Thus, the discussed mode is much faster than the userspace mode because there is no need to switch back and forth between the userspace and kernelspace. On the downside, the mode depends on having working readiness probes because it cannot automatically try another pod if the one it selects fails to respond. ipvs Mode In this mode, kube-proxy directly interacts with the netlink at the kernel level. The Netlink is a Linux kernel interface used for the inter-process communication (IPC) between the kernel and userspace processes and between userspace processes. The discussed mode works as follows. First, the proxy creates ipvs rules and syncs them with Kubernetes services and endpoints. These rules are then used to redirect the traffic to one of the backend pods. Ipvs is a new generation of packet rules that use kernel hash tables and work entirely in the kernel space. This means that ipvs is faster than both iptables and userspace modes. This mode also offers more options for load balancing, including such algorithms as round-robin, destination hashing, least connection, never queue, and more. Note: The Ipvs mode is a beta feature added in Kubernetes v1.9 and it also requires the IPVS kernel modules to be installed on the node. Conclusion In this article, we discussed how Kubernetes services enable the decoupling of frontend clients and backend pods. Working closely with kube-proxy, services redirect requests from their VIPs to labeled backend pods, so clients should not bother about what pods they actually access. This abstraction allows for the independent development of frontend and backend components of your application, which is a cornerstone of the microservices architecture and of Kubernetes. This part of our Kubernetes networking series was largely devoted to ClusterIP Service types that allow backend pods to be addressable only by clients running in the same cluster. In the next article of the series, we will discuss how to publish your Kubernetes services and expose them externally using NodePort and LoadBalancer service types. We'll also discuss how you can easily create services of the LoadBalancer type using Supergiant Kubernetes-as-a-Service platform. [Less]