kops rolling update pods unclean exit

kops rolling update pods unclean exit

As we were on-boarding more and more services to our kubernetes cluster, we were getting deep insights of how things work or don’t work.

One of the thing we realized when doing rolling updates of our kubernetes cluster is that pods were not getting enough time for graceful termination, and of-course our users were not thrilled about it. One of our user was running Kafka stack on our cluster and he complained that his Kafka indexes were getting corrupted due to this.

When he brought this up to our notice, we immediately knew it has to do something with how kops do the rolling update because graceful termination was a well known kubernetes ability.

So when we started investigating we found that kops have a feature flag to turn on draining of node. We thought we found our culprit but it turns out its default value is true since this PR was merged.

So we started digging a little more deep and thats when we found our issue.

kubectl which is a CLI for kubernetes has a command drain which actually takes care of deleting the pods on a given node thus draining it. And one of the flag exposed by that command is GracefulTerminationInSeconds which is of type int (whose zero value is 0) and that flag is configured to have a default value of “-1” in that command using Cobra and as per documentation in kubectl command following is the behavior for different values:

  • -1 use the graceful termination seconds specified by the pod
  • 0 delete the pod immediately
  • > 0 use this as graceful termination seconds

Also kubernetes exposes those commands as exported pkg functions that can be used as sdk, and kops happens to be making use of that. However it does not set any default value for parameter GracefulTerminationInSeconds and therefore end up running drain command with GracefulTerminationInSeconds = 0

As you can imagine the fix was as easy as configuring drain command with default value of -1

We opened the PR with this fix here and currently its waiting for the four golden words from one of kubernetes/kops member: lgtm

We will update the post here once the PR is merged.

Update: the PR has been accepted and merged.

Leave a Reply

Your email address will not be published. Required fields are marked *