Last updated on April 20th, 2021
Today article about updating the system running Azure Kubernetes Service (AKS). It is not about upgrading the AKS version, but about upgrading the nodes themselves. Microsoft Azure manages the so-called “Matser Node” for us, installs the latest patch updates, etc. It takes care of its security and we do not have access to it. It is different with the other nodes on which our containers are running. Azure Kubernetes Service (AKS) installs patches on them, but it will not automatically restart these nodes itself, so the AKS update will not complete.
Nodes running Windows Server do not receive daily updates, only the entire AKS is updated which deploys new nodes with the latest image and patches. You can find information on this subject at https://docs.microsoft.com/pl-pl/azure/aks/use-multiple-node-pools#upgrade-a-node-pool
In this article, however, I will deal with Linux nodes, because I use them more often. In this case, you can use the free Kured utility, which is also recommended by Microsoft.
AKS, if it has installed updates and needs to restart the nodes, informs us about it by writing this information in the file /var/run/reboot-Required.
The Kured service we are going to run checks the file /var/run/reboot-Required and restarts the nodes if necessary. Calmly. You can specify it to run only at certain times. Thanks to this, we will not be surprised by an unexpected restart. Additionally, you can connect it to Prometheus and our communicator, for example Slack, so that we get information about reboots.
I run Kured as DaemonSet.
At the beginning, we add ServiceAccount, and assign it permissions and roles:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kured rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "patch"] - apiGroups: [""] resources: ["pods"] verbs: ["list","delete","get"] - apiGroups: ["apps"] resources: ["daemonsets"] verbs: ["get"] - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kured roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kured subjects: - kind: ServiceAccount name: kured namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: kube-system name: kured rules: # Allow kured to lock/unlock itself - apiGroups: ["apps"] resources: ["daemonsets"] resourceNames: ["kured"] verbs: ["update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: kube-system name: kured subjects: - kind: ServiceAccount namespace: kube-system name: kured roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kured --- apiVersion: v1 kind: ServiceAccount metadata: name: kured namespace: kube-system ---
In the second part of the file, we create a DeamonSet with the appropriate configuration:
apiVersion: apps/v1 kind: DaemonSet metadata: name: kured namespace: kube-system spec: selector: matchLabels: name: kured updateStrategy: type: RollingUpdate template: metadata: labels: name: kured spec: serviceAccountName: kured tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule hostPID: true restartPolicy: Always containers: - name: kured image: docker.io/weaveworks/kured:1.5.0 imagePullPolicy: IfNotPresent securityContext: privileged: true # Give permission to nsenter /proc/1/ns/mnt env: - name: KURED_NODE_ID valueFrom: fieldRef: fieldPath: spec.nodeName command: - /usr/bin/kured - --ds-name=kured - --ds-namespace=kube-system - --reboot-days=mon - --reboot-sentinel=/var/run/reboot-required - --start-time=2am - --end-time=6am - --period=0h30m0s - --time-zone=Europe/Warsaw - --slack-hook-url=https://mattermost.primesoft.pl/hooks/84kbr5kkq3ycbkri1983xxt73w - --slack-channel=arthurdoc-gitlab - --slack-username=restartk8s - --prometheus-url=http://10.7.7.4:9090
- our service name –ds-name = kured
- namespace where our service is –ds-namespace = kube-system
- list the days to check if a restart is needed –reboot-days = mon
- we give the path to the file to check –reboot-sentinel = /var/run/reboot-required
- information of what time can start –start-time = 2am
- information about what time is to end –end-time = 6am
- the time interval to be checked in the hours specified by us –period = 0h30m0s
- timezone –time-zone = Europe / Warsaw
- address to our weebhoock –slack-hook-url = https: // PASTE_GENERATED_ADDRESS
- channel name –slack-channel = MATCH_NAME
- username to be displayed –slack-username = restartk8s
- prometheus address –prometheus-url = http: // PROMETHEUS_ADDRESS: 9090
We save the above code parts in the YAML file and implement in a standard way, for example:
kubectl apply -f NAZWA_PLIKU.yaml
We read logs in the traditional way:
kubectl logs NAZWA_KONTENERA -n kube-system
When everything runs fine and you enter the correct container name, you should see something like this:
The current state of nodes and information about them can be obtained by running the command:
kubectl get nodes -o wide
If you want to check if reboot and update of AKS nodes work, then correct Kured configuration. Then log in to the node and execute, for example, the command:
Virtual machines (nodes) on which our containers are located can be found in a ‘Resource Group’ created automatically when creating AKS. This is what usually starts on MC_nameAKS … We log in there as to a regular VM.
You can find more about kured on GitHub https://github.com/weaveworks/kured
You can find more articles about AKS in the Kubernetes category.