The Always-on Cluster

Is Kubernetes expected to be running 24/7?¶

Yes, Kubernetes is designed to run 24/7 and is expected to be continuously available to manage and orchestrate containerized applications.

Kubernetes is designed to provide high availability and fault tolerance by distributing workloads across multiple nodes in a cluster. This means that even if one node fails, the workloads can be rescheduled on other available nodes, ensuring that the applications continue to run without interruption.

Kubernetes also provides features such as self-healing and automated scaling, which rely on the continuous operation of the Kubernetes control plane components and the worker nodes. Shutting down the cluster regularly can disrupt these features and impact the stability and performance of the applications running on the cluster.

Maintenance and updates may require temporary downtime or cluster restarts. In such cases, it’s recommended to use rolling updates or other strategies that minimize downtime and disruption to your services.

What are the the consequences and potential problems that can occur if the entire cluster is shut down each night and brought back up each morning?¶

Shutting down and bringing up a Kubernetes cluster each night can have several consequences and potential problems, including:

Downtime: Shutting down and restarting the cluster each day will result in downtime for your applications running on the cluster. This can impact your business operations and cause frustration for your users.
Resource allocation: When you bring the cluster back up, Kubernetes will need to schedule all the pods again, which can take some time depending on the number of pods and their resource requirements. This can lead to resource allocation issues as some pods may not be scheduled immediately, causing resource contention among pods.
Data loss: If you are not using persistent volumes to store data, shutting down the cluster can result in data loss as any data stored in the cluster’s local storage will be lost.
Cluster instability: Frequent shutdowns and restarts of the cluster can cause instability, as some components may not come back up properly or may not be in sync with each other. This can result in unexpected behavior and failures.
Increased cost: Shutting down and restarting the cluster each day can increase your AWS costs, as you will be charged for the EC2 instances even when they are not running.

If you need to perform maintenance or updates, consider using rolling updates or other strategies that minimize downtime and disruption to your services. Additionally, using persistent volumes to store data and implementing backups can help prevent data loss in case of unexpected shutdowns.

How this would cause data loss with an ElasticSearch deployment using persistent volumes. Is it possible for data to become corrupted?¶

If you are using persistent volumes to store data in your Kubernetes cluster, shutting down the cluster each night can potentially cause data loss or data corruption. Here’s why:

Data loss: When you shut down the cluster, all the worker nodes that host the persistent volumes will be terminated, and the data stored in those volumes will be lost unless they are backed up to an external storage system or replicated to other nodes.
Data corruption: If the worker nodes hosting the persistent volumes are not properly shut down, there is a risk of data corruption. This can happen if the node is powered off or terminated abruptly without allowing the applications to gracefully shut down, which can result in data being left in an inconsistent state.

For example, in the case of an Elasticsearch deployment using persistent volumes, shutting down the cluster each night can potentially cause data loss or data corruption if the data stored in the persistent volumes is not properly backed up or replicated to other nodes.

To avoid these issues, it’s recommended to keep the cluster running continuously, or schedule the shutdowns during off-hours when they will have the least impact on your users and applications. Additionally, using persistent volumes to store data and implementing backups or replication can help prevent data loss or corruption in case of unexpected shutdowns. It’s also recommended to properly configure the Kubernetes control plane and worker nodes to ensure that they are gracefully shut down and restarted, which can minimize the risk of data corruption.

Last update : 10 mai 2023
Created : 10 mai 2023