Introduction
When running multiple services and applications on a Kubernetes cluster, a centralized cluster-level logging stack can help you quickly sort and analyze the large volume of log data produced by your Pods. A popular centralized logging solution is the Elasticsearch, Fluentd and Kibana (EFK) stack.
Elasticsearch is a distributed, scalable, real-time search engine that enables structured, full-text search and analytics. It is commonly used to index and search through large volumes of log data, but it can also be used to search many different types of documents.
Elasticsearch
is commonly deployed alongside Kibana, a powerful frontend and data visualization dashboard for Elasticsearch. Kibana allows you to explore your Elasticsearch log data through a web interface and create dashboards and queries to quickly answer questions and gain insights about your Kubernetes applications.
In this tutorial, we’ll use Fluentd to collect, transform, and send log data to the Elasticsearch backend. Fluentd is a popular open source data collector that we will configure on our Kubernetes nodes to track container log files, filter and transform log data, and deliver it to the Elasticsearch cluster, where it will be indexed and stored.
We’ll start by setting up and launching a scalable Elasticsearch cluster, and then build the Kibana Kubernetes service and deployment. To conclude, we’ll configure Fluentd as a DaemonSet to run on each Kubernetes worker node.
If you’re looking for a managed Kubernetes hosting service,
check out our simple, managed Kubernetes service built for growth
.
Prerequisites
Before you begin this guide, make sure you have the following available:
A
-
Kubernetes 1.10+ cluster with role-based access control (RBAC) enabled
Make sure that the cluster
- have sufficient resources available to deploy the EFK stack, and otherwise scale the cluster by adding worker nodes. We’ll deploy an Elasticsearch cluster of 3 pods (you can downgrade to 1 if needed), as well as a single Kibana Pod. Each worker node will also run a Fluentd Pod. The cluster in this guide consists of 3 worker nodes and a managed control plane.
-
The kubectl command-line tool installed on the local computer, configured to connect to the cluster. You can read more about installing kubectl in the official documentation.
Once you’ve set up these components, you’re ready to get started with this guide
.
Step 1 — Create
a
namespace Before deploying an Elasticsearch cluster, we’ll first create a namespace where we’ll install all of our logging instrumentation. Kubernetes allows you to separate the objects running in your cluster using a “virtual cluster” abstraction called Namespaces. In this guide, we will create a kube-logging namespace in which we will install the EFK stack components. This namespace will also allow us to quickly clean and delete the registry stack without any loss of function in the Kubernetes cluster.
To get started, first investigate
the existing namespaces in your cluster using kubectl:
- kubectl get
namespaces
You should see the following three initial namespaces, which come preinstalled with your Kubernetes cluster:
OutputNAME STATUS DEFAULT AGE Active 5m kube-system Active 5m kube-public Active 5m The
Default names hosts objects that are created without specifying a namespace. The kube-system namespace contains objects created and used by the Kubernetes system, such as kube-dns, kube-proxy, and kubernetes-dashboard. We recommend that you keep this namespace clean and not contaminate it with application and instrumentation workloads.
The kube-public namespace is another automatically created namespace that can be used to store objects that you would like to be readable and accessible across the cluster, even to unauthenticated users.
To create the kube-logging namespace, first open and edit a file named kube-logging.yaml using your favorite editor, such as nano:nano kube-logging.yaml
Inside your editor, paste the following YAML namespace object
: kind: Namespace apiVersion: v1 metadata: name:
- kube-logging
Then, save and close the file.
Here, we specify the type of the Kubernetes object as a Namespace object. For more information about namespace objects, see the namespace tutorial in the official Kubernetes documentation. We also specify the version of the Kubernetes API used to create the object (v1) and give it a name, kube-logging.
Once you have created the kube-logging.yaml namespace object file, create the namespace using kubectl create with the filename flag -f: kubectl create -f kube-logging.yaml You should see the
following output
: outputnamespace/kube-logging created
You can then confirm that
the namespace was created successfully:
- kubectl get namespaces
At this point, You should see the new
kube-logging namespace
:
OutputNAME STATE AGE default Active 23m kube-logging Active 1m kube-public Active 23m
- kube-system
Active 23m We can now deploy an Elasticsearch cluster
in this
isolated log namespace.
Step 2 — Creating the Elasticsearch StatefulSet
Now that we’ve created a namespace to host our registry stack, we can start deploying its various components. We’ll first start by deploying a 3-node Elasticsearch cluster.
In this guide, we use 3 Elasticsearch Pods to avoid the “split brain” problem that occurs in high-availability multi-node clusters. At a high level, the “split brain” is what arises when one or more nodes cannot communicate with the others, and several “split” masters are chosen. With 3 nodes, if one disconnects from the cluster temporarily, the other two nodes can choose a new master and the cluster can continue to function while the last node tries to rejoin. For more information, see A new era for cluster orchestration in Elasticsearch and Voto configurations.
Creating the
headless service
To get started, we’ll create a headless Kubernetes service called elasticsearch that will define a DNS domain for all 3 Pods. A headless service does not perform load balancing or have a static IP; For more information about headless services, see the official Kubernetes documentation.
Open a file named elasticsearch_svc.yaml
with your favorite editor
:
- nano elasticsearch_svc.yaml
Paste the following Kubernetes YAML
service: type: apiVersion service: v1 metadata: name: elasticsearch namespace: kube-logging labels: app: elasticsearch spec: selector: app: elasticsearch clusterIP: None ports: – port: 9200 name: rest – port: 9300 name: inter-node
Then save and close the file.
We define a service called elasticsearch in the kube-logging namespace and give it the tag app: elasticsearch. We then set .spec.selector to app:elasticsearch for the Service to select Pods with the app:elasticsearch tag. When we associate our Elasticsearch StatefulSet with this Service, the Service will return DNS A records pointing to Elasticsearch Pods with the app: elasticsearch tag.
Then we configure clusterIP: None, which makes the service headless. Finally, we define the ports 9200 and 9300 that are used to interact with the REST API and for communication between nodes, respectively.
Create the service using kubectl: kubectl
- create -f elasticsearch_svc.yaml
You should see the following output: Outputservice/elasticsearch created
Finally, verify that the service was created correctly using
kubectl get:
kubectl
get services -namespace=kube-logging
You should see the following:
OutputNAME TYPE CLUSTER-IP EXTERNAL IP PORT(S) AGE Elasticsearch ClusterIP None <none> 9200/TCP, 9300/TCP 26s
Now that we’ve set up our headless service and a stable .elasticsearch.kube-logging.svc.cluster.local domain for our Pods, we can go ahead and create StatefulSet
. Creating the StatefulSet A Kubernetes
StatefulSet
allows you to assign a stable identity to your Pods
and grant them stable, persistent storage. Elasticsearch requires stable storage to preserve data in Pod reprogramming and restart. For more information about the StatefulSet workload, see the Statefulsets page of Kubernetes documents.
Open a file named elasticsearch_statefulset.yaml in your favorite editor:
- nano elasticsearch_statefulset.yaml
We’ll move through the definition of the StatefulSet object section by section, pasting blocks into this file.
Start by pasting the following block
: apiVersion: apps/v1 kind: StatefulSet metadata: name: en-cluster namespace: kube-logging spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template
:
Metadata: Tags: Application: Elasticsearch
In this block, we define a StatefulSet named es-cluster in the kube-logging namespace. We then associate it with our previously created elasticsearch service using the serviceName field. This ensures that each Pod in StatefulSet will be accessible using the following DNS address: es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, where [0,1,2] corresponds to the integer ordinal assigned to the Pod.
We specify 3 replicas (Pods) and set the matchLabels selector in app: elasticseach, which we then reflect in the .spec.template.metadata section. The .spec.selector.matchLabels and .spec.template.metadata.labels fields must match.
Now we can move on to the specification of the object. Paste the following YAML block immediately below the previous block: .
. . Spec: Containers: – Name: Elasticsearch Image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0 resources: Limits: CPU: 1000m requests: CPU: 100m ports: – containerPort: 9200 name: REST protocol: TCP – containerPort: 9300 name: inter-node protocol: TCP volumeMounts: – name: data mountPath: /usr/share/elasticsearch/data env: – name: cluster.name value: k8s-logs – name: node.name valueFrom: fieldRef: fieldPath: metadata.name – name: discovery.seed_hosts value: “es-cluster-0. elasticsearch, en-cluster-1.elasticsearch,en-cluster-2.elasticsearch” – name: cluster.initial_master_nodes value: “en-cluster-0,en-cluster-1,en-cluster-2” – name: ES_JAVA_OPTS value: “-Xms512m -Xmx512m”
Here we define the Pods in StatefulSet. We name the containers elasticsearch and choose the image docker.elastic.co/elasticsearch/elasticsearch:7.2.0 Docker. At this point, you can modify this image tag to correspond to your own internal Elasticsearch image or a different version. Note that for the purposes of this guide, only Elasticsearch 7.2.0 has been tested.
We then use the resource field to specify that the container needs at least 0.1 vCPU guaranteed and can burst up to 1 vCPU (limiting the Pod’s resource usage when performing a large initial ingest or dealing with a load spike). You must modify these values based on the expected load and available resources. For more information about requests and resource limits, see the official Kubernetes documentation.
We then open and name ports 9200 and 9300 for the REST API and communication between nodes, respectively. We specify a volumeMount named data that will mount the PersistentVolume named data into the container in the path /usr/share/elasticsearch/data. We will define VolumeClaims for this StatefulSet in a later YAML block.
Finally, we set some environment variables
in the container:
- cluster.name: The name of the Elasticsearch cluster, which in this guide is k8s-logs
- node.name: the name of the node, which we set in the .metadata.name field using valueFrom. This will resolve to es-cluster-[0,1,2], depending on the ordinal assigned to the node.
- discovery.seed_hosts: This field establishes a list of master-eligible nodes in the cluster that will initialize the node discovery process. In this guide, thanks to the headless service we set up earlier, our Pods have domains of the form es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, so we set this variable accordingly. Using the Kubernetes DNS resolution of the local namespace, we can shorten this to es-cluster-[0,1,2].elasticsearch. For more information about Elasticsearch discovery, see the official Elasticsearch documentation.
- cluster.initial_master_nodes: This field also specifies a list of eligible master nodes that will participate in the master election process. Note that for this field you must identify nodes by their node.name, and not by their host names.
- ES_JAVA_OPTS: Here we set this to -Xms512m -Xmx512m, which tells the JVM to use a minimum and maximum heap size of 512 MB. You must adjust these parameters based on resource availability and the needs of your cluster. For more information, see Configuring the heap size.
.
The next block we’ll paste looks like this:
. . . initContainers: – name: fix-permissions image: busybox command: [“sh”, “-c”, “chown -R 1000:1000 /usr/share/elasticsearch/data”] securityContext: privileged: true volumeMounts: – name: data mountPath: /usr/share/elasticsearch/data – name: increase-vm-max-map image: busybox command: [“sysctl”, “-w”, “vm.max_map_count=262144”] securityContext: privileged: true – name: increase-fd-ulimit image: busybox command: [“sh”, “-c”, “ulimit -n 65536”] securityContext: privileged: true
In this block, we define several starter containers that run before the main elasticsearch application container. Each of these startup containers runs to completion in the order in which they are defined. For more information about Init Containers, see the official Kubernetes documentation.
The first, called fix-permissions, runs a chown command to change the owner and group of the Elasticsearch data directory to 1000:1000, the UID of the Elasticsearch user. By default, Kubernetes mounts the data directory as root, making it inaccessible to Elasticsearch. For more information on this step, see Elasticsearch’s “Notes for Production Usage and Defaults.”
The second, called increase-vm-max-map, runs a command to increase the operating system’s limits on mmap counts, which by default may be too low, resulting in out-of-memory errors. For more information on this step, see the official Elasticsearch documentation.
The next startup container that runs is increase-fd-ulimit, which runs the ulimit command to increase the maximum number of open file descriptors. For more information on this step, see the “Production usage notes and defaults” in the official Elasticsearch documentation.
Now that we
have defined our main application container and the startup containers that run before to tune the container’s operating system, we can add the final piece to our StatefulSet object definition file: volumeClaimTemplates.
Paste the following block volumeClaimTemplate:
. . . volumeClaimTemplates: – metadata: name: data labels: app: elasticsearch spec: accessModes: [ “ReadWriteOnce” ] storageClassName: do-block-storage resources: requests: storage: 100Gi
In this block, we define volumeClaimTemplates of StatefulSet. Kubernetes will use this to create PersistentVolumes for the Pods. In the previous block, we named it data (which is the name we refer to in volumeMounts defined above) and give it the same application: elasticsearch label as our StatefulSet.
We then specify its access mode as ReadWriteOnce, which means it can only be mounted as read-write by a single node. We define the storage class as do-block-storage in this guide, as we use a DigitalOcean Kubernetes cluster for demonstration purposes. You should change this value based on where you run your Kubernetes cluster. For more information, see the Persistent Volume documentation.
Finally, we specified that we would like each PersistentVolume to be 100 GiB in size. You must adjust this value according to your production needs.
The full StatefulSet
specification should look similar to the following
: apiVersion: apps/v1 type: StatefulSet metadata: name: en-cluster namespace: kube-logging spec: serviceName: elasticsearch replicas: 3 selector: matchLabels: app: elasticsearch template: metadata: labels: app: elasticsearch spec: containers: – name: elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0 resources: limits: cpu: 1000m requests: cpu: 100m ports: – containerPort: 9200 name: rest protocol: TCP – containerPort: 9300 name: protocol between nodes: TCP volumeMounts: – name: data mountPath: /usr/share/elasticsearch/data env: – name: cluster.name value: k8s-logs – name: node.name valueFrom: fieldRef: fieldPath: metadata.name – name: discovery.seed_hosts value: “en-cluster-0.elasticsearch,en-cluster-1.elasticsearch,en-cluster-2.elasticsearch” – name: cluster.initial_master_nodes value: “en-cluster-0,en-cluster-1,en-cluster-2” – name: ES_JAVA_OPTS value: “-Xms512m -Xmx512m” initContainers: – name: fix-permissions image: busybox command: [“sh”, “-c”, “chown -R 1000:1000 /usr/share/elasticsearch/data”] securityContext: privileged: true volumeMounts: – name: data mountPath: /usr/share/elasticsearch/data – name: increase-vm-max-map image: busybox command: [“sysctl”, “-w”, “vm.max_map_count=262144”] securityContext: privileged: true – name: increase-fd-ulimit image: busybox command: [“sh”, “-c”, “ulimit -n 65536”] securityContext: privileged: true volumeClaimTemplates: – metadata: name: data tags: app: elasticsearch spec: accessModes: [ “ReadWriteOnce” ] storageClassName: do-block-storage resources: requests: storage: 100Gi
Once you are satisfied with your Elasticsearch configuration, save and close the file
.
Now, deploy
StatefulSet using kubectl: kubectl
- create -f elasticsearch_statefulset.yaml
You should see the following output: Outputstatefulset.apps/en-cluster created You
can monitor the StatefulSet as it is deployed using kubectl
rollout status:
- kubectl rollout status sts/en-cluster -namespace=kube-logging
You should see the following output as the cluster is deployed:
DepartureWaiting for 3 pods to be ready… Waiting for 2 pods to be ready… Waiting for 1 pods to be ready… Full partitioned deployment: 3 new pods have been updated…
Once all pods have been deployed, you can verify that your Elasticsearch cluster is working properly by making a request in the REST API.
To do this, first forward local port 9200 to port 9200 on one of the Elasticsearch nodes (es-cluster-0) using kubectl port-forward:
- kubectl port-forward es-cluster-0 9200:9200 -namespace=kube-logging
Then, in a separate terminal window, make a curl request against the REST API
:
- curl http://localhost:9200/_cluster/state?pretty
You should see the following output:
Output{ “cluster_name” : “k8s-logs”, “compressed_size_in_bytes” : 348, “cluster_uuid” : “QD06dK7CQgids-GQZooNVw”, “version” : 3, “state_uuid” : “mjNIWXAzQVuxNNOQ7xR-qg”, “master_node” : “IdM5B7cUQWqFgIHXBp0JDg”, “blocks” : { }, “nodes” : { “u7DoTpMmSCixOoictzHItA” : { “name” : “es-cluster-1”, “ephemeral_id” : “ZlBflnXKRMC4RvEACHIVdg”, “transport_address” : “10.” 10.”, “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10..10., “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10..10., “Nodes” : { “u7DoTpMmSCixOoictzHItA” : { “name” : “es-cluster-1”, “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10..10.”, “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10.”10.” 10., “Nodes” : { “u7DoTpMmSCixOoictzHItA” : { “name” : “es-cluster-1”, “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10.10., “” : “ZlBflnXKRMC4RvEACHIVdg”, “” : “10., “” : “IdM5B7cUQWqFgIHXBp0JDg”, “blocks” : { }, “nodes” : { “u7DoTpMmSCixOoictzHItA” : { “name” : “es-cluster-1”, “” : “ZlBflnXKRMC4Rv244.8.2:9300”, “attributes” : { } }, “IdM5B7cUQWqFgIHXBp0JDg” : { “name” : “es-cluster-0”, “ephemeral_id” : “JTk1FDdFQuWbSFAtBxdxAQ”, “transport_address” : “10.”10.” 244.44.3:9300″, “attributes” : { } }, “R8E7xcSUSbGbgrhAdyAKmQ” : { “name” : “es-cluster-2”, “ephemeral_id” : “9wv6ke71Qqy9vk2LgJTqaA”, “transport_address” : “10.244.40.4:9300”, “attributes” : { } } }, …
This indicates that our k8s-logs Elasticsearch cluster has been successfully created with 3 nodes: es-cluster-0, es-cluster-1, and es-cluster-2. The current master node is es-cluster-0.
Now that your Elasticsearch cluster is up and running, you can move on to setting up a Kibana frontend for it.
Step 3 — Create the
Kibana deployment and service
To start Kibana on Kubernetes, we’ll create a service called kibana and a deployment that consists of a Pod replica. You can scale the number of replicas based on production needs, and optionally specify a load balancer type for the service to load balance requests on deployment pods.
This time, we’ll create the service and deployment in the same file. Open a file called
kibana.yaml in your favorite editor:
- nano kibana.yaml
Paste the following service specification
: apiVersion: v1 Type: Service metadata: Name: Kibana Namespace: Kube-logging Tags: Application: Kibana Spec: Ports: – Port: 5601 Selector: Application: Kibana – apiVersion: apps/v1 Type: Deployment metadata: Name: Kibana Namespace: Kube-logging Tags: Application: Kibana Spec: Replicas: 1 selector: matchLabels: app: Kibana Template: Metadata: Tags: Application: Kibana Spec: Containers: – Name: Kibana Image: docker.elastic.co/kibana/kibana:7.2.0 Resources: Limits: CPU: 1000M Requests: CPU: 100M ENV: – Name: ELASTICSEARCH_URL Value: http://elasticsearch:9200 Ports: – containerPort: 5601
Then save and close the file
.
In this specification, we have defined a service called kibana in the kube-logging namespace and given it the tag app: kibana.
We have also specified that it must be accessible on port 5601 and use the app: kibana tag to select the Service’s target Pods.
In the deployment specification, we defined an implementation called kibana and specified that we would like a 1-pod replica.
We use the image docker.elastic.co/kibana/kibana:7.2.0. At this point, you can substitute your own private or public Kibana image to use.
We specified that we would like at least 0.1 vCPU guaranteed for the Pod, reaching a limit of 1 vCPU. You can change these parameters depending on the expected load and available resources.
Next, we use the ELASTICSEARCH_URL environment variable to set the endpoint and port for the Elasticsearch cluster. With Kubernetes DNS, this endpoint corresponds to your elasticsearch service name. This domain will resolve to an IP address list for all 3 Elasticsearch Pods. For more information about Kubernetes DNS, see DNS for services and pods.
Finally, we set the Kibana container port to 5601, to which the Kibana Service will send requests.
Once you are satisfied with your Kibana configuration, you can deploy the service and
deployment using kubectl: kubectl create -f kibana.yaml You should see the following output: Outputservice/kibana created
- deployment.apps
/kibana created
You can verify that the deployment was successful by running the following command
:
- kubectl rollout status deployment/kibana -namespace=kube-logging
You should see the following output:
Outputdeployment “kibana” successfully deployed To access the Kibana
interface, we will once again forward a local port to the Kubernetes node running Kibana. Take the details of the
Kibana Pod using kubectl get:
- kubectl get pods -namespace=kube-logging
OutputNAME READY STATE RESTART AGE es-cluster-0 1/1 Running 0 55m en-cluster-1 1/1 Running 0 54m en-cluster-2 1/1 Running 0 54m Kibana-6C9FB4B5B7-PLBG2 1/1 Running 0 4m27s
Here we observe that our Kibana Pod is called kibana-6c9fb4b5b7-plbg2.
Forward local port 5601 to port 5601 of this Pod:
- kubectl port-forward kibana-6c9fb4b5b7-plbg25601:5601 -namespace=kube-logging
You should see
the following output: OutputForwarding from 127.0.0.1:5601 -> 5601 Forwarding from [::1]:5601 ->
5601
Now, in your
web browser, visit
the following URL: http://localhost:5601
If you see the following Kibana welcome page, you have successfully deployed Kibana to your Kubernetes cluster
:
You can now move on to deploying the final component of the EFK stack: the log collector
, Fluentd.
Step 4 — Creating
the Fluentd DaemonSet
In this guide, we’ll configure Fluentd as a DaemonSet, which is a type of Kubernetes workload that runs a copy of a given Pod on each node in the Kubernetes cluster. With this DaemonSet driver, we will deploy a Fluentd logging agent pod on each node in our cluster. For more information about this logging architecture, see “Using a Node Registration Agent” in the official Kubernetes documents.
In Kubernetes, containerized applications that log in to stdout and stderr have their log flows captured and redirected to JSON files on the nodes. The Fluentd Pod will follow these log files, filter the log events, transform the log data, and send it to the Elasticsearch log backend we implemented in Step 2.
In addition to container logs,
the Fluentd agent will track logs for Kubernetes system components such as kubelet, kube-proxy, and Docker logs. For a complete list of sources tracked by the Fluentd logging agent, see the kubernetes.conf file used to configure the logging agent. For more information about logging on Kubernetes clusters, see “Node-Level Logging” in the official Kubernetes documentation.
Start by opening a file called fluentd.yaml
in your favorite text editor: nano fluentd.yaml
Again,
we’ll paste Kubernetes object definitions block by block, providing context as we go. In this guide, we use the Fluentd DaemonSet specification provided by Fluentd maintainers. Another useful resource provided by Fluentd maintainers is Kuberentes Fluentd.
First, paste the following ServiceAccount definition
: apiVersion: v1 Type: ServiceAccount metadata: name: Fluentd Namespace: Kube-logging Tags: Application: Fluentd
Here, we create a service account called fluentd that the Fluentd Pods will use to access the Kubernetes API. We created it in the kube-logging namespace and once again gave it the tag app: fluentd. For more information about service accounts in Kubernetes, see Configure service accounts for pods in the official Kubernetes documents.
Next, paste the following
ClusterRole block: . . . – apiVersion: rbac.authorization.k8s.io/v1 type: ClusterRole metadata: name: fluentd Tags: app: fluentd rules: – apiGroups: – “” resources: – pods – namespaces verbs: – get – list – see
Here we define a ClusterRole called fluentd to which we grant get, list and watch permissions on pods and namespaces objects. ClusterRoles allows you to grant access to cluster-scoped Kubernetes resources, such as Nodes. For more information about role-based access control and cluster roles, see Using RBAC authorization in the official Kubernetes documentation.
Now, paste the following ClusterRoleBinding
block: . . . – kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd roleRef: kind: ClusterRole name: fluentd apiGroup: rbac.authorization.k8s.io subjects: – kind: ServiceAccount name: fluentd namespace: kube-logging
In this block, we define a ClusterRoleBinding called fluentd that binds the ClusterRole fluentd to the fluidd service account. This grants the fluentd ServiceAccount the permissions listed in the fluentd cluster role.
At this point we can start pasting the actual DaemonSet specification
: . . . – apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-logging labels: app: fluentd
Here, we define a DaemonSet named fluentd in the kube-logging namespace and give it the tag app: fluentd.
Then paste into the next section:
. . . spec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: serviceAccount: fluentd serviceAccountName: fluentd tolerations: – key: node-role.kubernetes.io/master effect: NoSchedule containers: – name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 env: – name: FLUENT_ELASTICSEARCH_HOST value: “elasticsearch.kube-logging.svc.cluster.local” – name: FLUENT_ELASTICSEARCH_PORT value: “9200” – name: FLUENT_ ELASTICSEARCH_ SCHEME value: “http” – name: FLUENTD_SYSTEMD_CONF value: disable
Here, we match the app: fluentd tag defined in .metadata.labels and then assign the DaemonSet the fluentd service account. We also select the application: fluid like the Pods managed by this DaemonSet.
Next, we define a NoSchedule tolerance to match the equivalent blob on Kubernetes master nodes. This will ensure that the DaemonSet is also deployed to Kubernetes masters. If you do not want to run a Fluentd Pod on your master nodes, remove this tolerance. For more information about Kubernetes spots and tolerances, see “Spots and Tolerances” in the official Kubernetes documents.
Next, we
start defining the Pod container, which we call fluentd.
We use the official Debian v1.4.2 image provided by the Fluentd maintainers. If you want to use your own private or public Fluentd image, or use a different image version, modify the image label in the container specification. The Dockerfile and the contents of this image are available in the Fluentd fluentd-kubernetes-daemonset Github repository.
Next, we configure Fluentd using some environment variables
:
- FLUENT_ELASTICSEARCH_HOST: We set this to the Elasticsearch headless service address defined above: elasticsearch.kube-logging.svc.cluster.local. This will be resolved in a list of IP addresses for the 3 Elasticsearch Pods. Most likely, the actual Elasticsearch host is the first IP address returned in this list. To distribute logs across the cluster, you must modify the Fluentd Elasticsearch Output plugin configuration. For more information about this plugin, see Elasticsearch Output Plugin.
- FLUENT_ELASTICSEARCH_PORT: We set this to the Elasticsearch port we configured earlier, 9200.
- FLUENT_ELASTICSEARCH_SCHEME: We set this to http.
- FLUENTD_SYSTEMD_CONF: We set this to disable to suppress systemd-related output that is not configured in the container.
Finally, paste into the next section:
. . . resources: limits: memory: 512My requests: CPU: 100m memory: 200My volumeMount: – name: varlog mountPath: /var/log – name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true terminationGracePeriodSeconds: 30 volumes: – name: varlog hostPath: path: /var/log – name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
Here we specify a memory limit of 512 MiB on the FluentD Pod, and guarantee 0.1vCPU and 200MiB of memory. You can adjust these resource and request limits based on the expected logging volume and available resources.
Next, we mount the /var/log and /var/lib/docker/containers host paths into the container using varlog and varlibdockercontainers volumeMounts. These volumes are defined at the end of the block.
The final parameter we define in this block is terminationGracePeriodSeconds, which gives Fluentd 30 seconds to shut down properly upon receiving a SIGTERM signal. After 30 seconds, the containers receive a SIGSKILL signal. The default value for terminationGracePeriodSeconds is 30s, so in most cases you can omit this parameter. For more information on how to successfully terminate Kubernetes workloads, see Google’s “Kubernetes Best Practices: Ending with Grace.”
The entire Fluentd specification should look similar to the following
: apiVersion: v1 type: ServiceAccount metadata: name: fluentd namespace: kube-logging tags: app: fluentd – apiVersion: rbac.authorization.k8s.io/v1 type: ClusterRole metadata: name: fluentd Tags: app: fluid rules: – apiGroups: – “” resources: – pods – namespaces verbs: – get – list – view – type: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: fluentd roleRef: kind: ClusterRole name: fluentd apiGroup: rbac.authorization.k8s.io subjects: – kind: ServiceAccount namespace: fluentd namespace: kube-logging – apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: kube-logging labels: app: fluentd spec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: serviceAccount: fluentd serviceAccountName: fluentd tolerations: – key: node-role.kubernetes.io/master effect: containers NoSchedule: – name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1. 4.2-debian-elasticsearch-1.1 env: – name: FLUENT_ELASTICSEARCH_HOST value: “elasticsearch.kube-logging.svc.cluster.local” – name: FLUENT_ELASTICSEARCH_PORT value: “9200” – name: FLUENT_ELASTICSEARCH_SCHEME value: “http” – name: FLUENTD_SYSTEMD_CONF value: disable resources: limits: memory: 512Mi requests: cpu: 100m memory: 200Mi volumeMounts: – name: varlog mountPath: /var/log – name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true terminationGracePeriodSeconds: 30 volumes: – name: varlog hostPath: path: /var/log – name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
Once you have finished configuring the Fluentd DaemonSet, save and close the file
.
Now, deploy
the DaemonSet using kubectl: kubectl create -f fluentd.yaml
You should see the following output
: Outputserviceaccount/fluentd created clusterrole.rbac.authorization.k8s.io/fluentd created clusterrolebinding.rbac.authorization.k8s.io/fluentd created daemonset.extensions/fluentd created
Verify that your DaemonSet was successfully deployed using
kubectl:
- kubectl get ds -namespace=kube-logging
You should see the following status output:
OutputCURRENT DESIRED NAME READY UNTIL DATE AVAILABLE NODE SELECTOR AGE fluids 3 3 3 3 3 3 <none> 58s
This indicates that there are 3 fluid Pods running, which corresponds to the number of nodes in our Kubernetes cluster
.
We can now check Kibana to verify that log data is collected and sent correctly to Elasticsearch.
With the kubectl port still open, sail to http://localhost:5601.
Click Discover in the left navigation menu
:
You should see the following settings window:
This allows you to define the Elasticsearch indexes you’d like to explore in Kibana. For more information, see Define your index patterns in official Kibana documents. For now, we’ll only use the logstash-* wildcard pattern to capture all log data in our Elasticsearch cluster. Enter logstash-* in the text box and click Next step.
You will then be taken to the following page:
This allows you to configure which field Kibana will use to filter log data by time. From the drop-down menu, select the @timestamp field and press Create Index Pattern.
Now, press Discover on the left navigation menu.
You should see a histogram graph and some recent log entries:
At this point, you have successfully configured and deployed the EFK stack on your Kubernetes cluster. To learn how to use Kibana to analyze your registration data, see the Kibana User Guide.
In the next optional section, we’ll implement a simple counter pod that prints numbers in stdout and find your records in Kibana.
Step 5 (optional) — Testing the container record
To demonstrate a basic Kibana use case of exploring the latest records for a given Pod, we’ll implement a minimal counter Pod that prints sequential numbers in stdout
.
Let’s start by creating the Pod. Open
a file called counter.yaml in your favorite editor:
- nano counter.yaml
Then, paste the following
Pod specification: apiVersion: v1 Type: Pod metadata: name: Counter specification: containers: – name: count image: busybox args: [/bin/sh, -c, ‘i=0; while true; do echo “$i: $(date)”; i=$((i+1)); sleep 1; done’]
Save and close the file.
This is a minimal Pod called a counter that executes a while loop, Sequential Number Printing.
Deploy the counter pod using
kubectl:
- kubectl create -f counter.yaml
Once the pod has been created and is running, return to your Kibana dashboard
.
On the Discover page, in the search bar, type kubernetes.pod_name:counter. This filters the log data for meter-named pods.
You should then see a list
of log entries for the counter pod:
You can click any of the log entries to view additional metadata such as container name, Kubernetes node, namespace, and more.
Conclusion
In this guide, we’ve demonstrated how to set up Elasticsearch, Fluentd, and Kibana on a Kubernetes cluster. We have used a minimal logging architecture consisting of a single log agent pod running on each Kubernetes worker node.
Before deploying this logging stack to your production Kubernetes cluster, it is best to adjust the resource requirements and limits as outlined in this guide. You may also want to configure X-Pack to enable built-in monitoring and security features.
The logging architecture we’ve used here consists of 3 Elasticsearch Pods, a single Kibana Pod (no load balancing), and a set of Fluentd Pods implemented as DaemonSet. You may want to scale this configuration based on your production use case. For more information about scaling your Elasticsearch and Kibana stack, see Scaling Elasticsearch.
Kubernetes also enables more complex log agent architectures that can better fit your use case. For more information, see Logging architecture in Kubernetes documents.