Updating nodes in AKS - Wojciech Lepczyński

Today article about updating the system running Azure Kubernetes Service (AKS). It is not about upgrading the AKS version, but about upgrading the nodes themselves. Microsoft Azure manages the so-called “Matser Node” for us, installs the latest patch updates, etc. It takes care of its security and we do not have access to it. It is different with the other nodes on which our containers are running. Azure Kubernetes Service (AKS) installs patches on them, but it will not automatically restart these nodes itself, so the AKS update will not complete.

Nodes running Windows Server do not receive daily updates, only the entire AKS is updated which deploys new nodes with the latest image and patches. You can find information on this subject at https://docs.microsoft.com/pl-pl/azure/aks/use-multiple-node-pools#upgrade-a-node-pool

In this article, however, I will deal with Linux nodes, because I use them more often. In this case, you can use the free Kured utility, which is also recommended by Microsoft.

AKS, if it has installed updates and needs to restart the nodes, informs us about it by writing this information in the file /var/run/reboot-Required.

KURED

The Kured service we are going to run checks the file /var/run/reboot-Required and restarts the nodes if necessary. Calmly. You can specify it to run only at certain times. Thanks to this, we will not be surprised by an unexpected restart. Additionally, you can connect it to Prometheus and our communicator, for example Slack, so that we get information about reboots.

I run Kured as DaemonSet.

At the beginning, we add ServiceAccount, and assign it permissions and roles:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kured
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs:     ["get", "patch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs:     ["list","delete","get"]
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  verbs:     ["get"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs:     ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kured
subjects:
- kind: ServiceAccount
  name: kured
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kured
rules:
# Allow kured to lock/unlock itself
- apiGroups:     ["apps"]
  resources:     ["daemonsets"]
  resourceNames: ["kured"]
  verbs:         ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: kube-system
  name: kured
subjects:
- kind: ServiceAccount
  namespace: kube-system
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kured
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kured
  namespace: kube-system
---

In the second part of the file, we create a DeamonSet with the appropriate configuration:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured            
  namespace: kube-system 
spec:
  selector:
    matchLabels:
      name: kured
  updateStrategy:
   type: RollingUpdate
  template:
    metadata:
      labels:
        name: kured
    spec:
      serviceAccountName: kured
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      hostPID: true 
      restartPolicy: Always
      containers:
        - name: kured
          image: docker.io/weaveworks/kured:1.5.0
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true # Give permission to nsenter /proc/1/ns/mnt
          env:
            - name: KURED_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          command:
            - /usr/bin/kured
            - --ds-name=kured
            - --ds-namespace=kube-system
            - --reboot-days=mon
            - --reboot-sentinel=/var/run/reboot-required
            - --start-time=2am
            - --end-time=6am
            - --period=0h30m0s
            - --time-zone=Europe/Warsaw
            - --slack-hook-url=https://mattermost.primesoft.pl/hooks/84kbr5kkq3ycbkri1983xxt73w
            - --slack-channel=arthurdoc-gitlab
            - --slack-username=restartk8s
            - --prometheus-url=http://10.7.7.4:9090

Configuration:

our service name –ds-name = kured
namespace where our service is –ds-namespace = kube-system
list the days to check if a restart is needed –reboot-days = mon
we give the path to the file to check –reboot-sentinel = /var/run/reboot-required
information of what time can start –start-time = 2am
information about what time is to end –end-time = 6am
the time interval to be checked in the hours specified by us –period = 0h30m0s
timezone –time-zone = Europe / Warsaw
address to our weebhoock –slack-hook-url = https: // PASTE_GENERATED_ADDRESS
channel name –slack-channel = MATCH_NAME
username to be displayed –slack-username = restartk8s
prometheus address –prometheus-url = http: // PROMETHEUS_ADDRESS: 9090

We save the above code parts in the YAML file and implement in a standard way, for example:

kubectl apply -f NAZWA_PLIKU.yaml

We read logs in the traditional way:

kubectl logs NAZWA_KONTENERA -n kube-system

When everything runs fine and you enter the correct container name, you should see something like this:

The current state of nodes and information about them can be obtained by running the command:

kubectl get nodes -o wide

TEST

If you want to check if reboot and update of AKS nodes work, then correct Kured configuration. Then log in to the node and execute, for example, the command:

touch /var/run/reboot-required

Virtual machines (nodes) on which our containers are located can be found in a ‘Resource Group’ created automatically when creating AKS. This is what usually starts on MC_nameAKS … We log in there as to a regular VM.

You can find more about kured on GitHub https://github.com/weaveworks/kured

You can find more articles about AKS in the Kubernetes category.

Integration ACR with AKS

K8S – pods stuck on ‘Terminating’

Wojciech Lepczyński on How to monitor memory usage on AWS EC2 ??
I just repeated all the steps in my blog on a new clean ec2 machine with Ubuntu and it works.…
Sergio on How to monitor memory usage on AWS EC2 ??
Hi, I can't config CWAgent on EC2 instance, I follow the steps but somehow CWAgent didn't get the credentials and…