Today article about updating the system running Azure Kubernetes Service (AKS). It is not about upgrading the AKS version, but about upgrading the nodes themselves. Microsoft Azure manages the so-called “Matser Node” for us, installs the latest patch updates, etc. It takes care of its security and we do not have access to it. It is different with the other nodes on which our containers are running. Azure Kubernetes Service (AKS) installs patches on them, but it will not automatically restart these nodes itself, so the AKS update will not complete.
Nodes running Windows Server do not receive daily updates, only the entire AKS is updated which deploys new nodes with the latest image and patches. You can find information on this subject at https://docs.microsoft.com/pl-pl/azure/aks/use-multiple-node-pools#upgrade-a-node-pool
In this article, however, I will deal with Linux nodes, because I use them more often. In this case, you can use the free Kured utility, which is also recommended by Microsoft.
AKS, if it has installed updates and needs to restart the nodes, informs us about it by writing this information in the file /var/run/reboot-Required.
KURED
The Kured service we are going to run checks the file /var/run/reboot-Required and restarts the nodes if necessary. Calmly. You can specify it to run only at certain times. Thanks to this, we will not be surprised by an unexpected restart. Additionally, you can connect it to Prometheus and our communicator, for example Slack, so that we get information about reboots.
I run Kured as DaemonSet.
At the beginning, we add ServiceAccount, and assign it permissions and roles:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kured
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list","delete","get"]
- apiGroups: ["apps"]
resources: ["daemonsets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kured
subjects:
- kind: ServiceAccount
name: kured
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: kube-system
name: kured
rules:
# Allow kured to lock/unlock itself
- apiGroups: ["apps"]
resources: ["daemonsets"]
resourceNames: ["kured"]
verbs: ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: kured
subjects:
- kind: ServiceAccount
namespace: kube-system
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kured
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kured
namespace: kube-system
---
In the second part of the file, we create a DeamonSet with the appropriate configuration:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kured
namespace: kube-system
spec:
selector:
matchLabels:
name: kured
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: kured
spec:
serviceAccountName: kured
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
hostPID: true
restartPolicy: Always
containers:
- name: kured
image: docker.io/weaveworks/kured:1.5.0
imagePullPolicy: IfNotPresent
securityContext:
privileged: true # Give permission to nsenter /proc/1/ns/mnt
env:
- name: KURED_NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- /usr/bin/kured
- --ds-name=kured
- --ds-namespace=kube-system
- --reboot-days=mon
- --reboot-sentinel=/var/run/reboot-required
- --start-time=2am
- --end-time=6am
- --period=0h30m0s
- --time-zone=Europe/Warsaw
- --slack-hook-url=https://mattermost.primesoft.pl/hooks/84kbr5kkq3ycbkri1983xxt73w
- --slack-channel=arthurdoc-gitlab
- --slack-username=restartk8s
- --prometheus-url=http://10.7.7.4:9090
Configuration:
- our service name –ds-name = kured
- namespace where our service is –ds-namespace = kube-system
- list the days to check if a restart is needed –reboot-days = mon
- we give the path to the file to check –reboot-sentinel = /var/run/reboot-required
- information of what time can start –start-time = 2am
- information about what time is to end –end-time = 6am
- the time interval to be checked in the hours specified by us –period = 0h30m0s
- timezone –time-zone = Europe / Warsaw
- address to our weebhoock –slack-hook-url = https: // PASTE_GENERATED_ADDRESS
- channel name –slack-channel = MATCH_NAME
- username to be displayed –slack-username = restartk8s
- prometheus address –prometheus-url = http: // PROMETHEUS_ADDRESS: 9090
We save the above code parts in the YAML file and implement in a standard way, for example:
kubectl apply -f NAZWA_PLIKU.yaml
We read logs in the traditional way:
kubectl logs NAZWA_KONTENERA -n kube-system
When everything runs fine and you enter the correct container name, you should see something like this:
The current state of nodes and information about them can be obtained by running the command:
kubectl get nodes -o wide
TEST
If you want to check if reboot and update of AKS nodes work, then correct Kured configuration. Then log in to the node and execute, for example, the command:
touch /var/run/reboot-required
Virtual machines (nodes) on which our containers are located can be found in a ‘Resource Group’ created automatically when creating AKS. This is what usually starts on MC_nameAKS … We log in there as to a regular VM.
You can find more about kured on GitHub https://github.com/weaveworks/kured
You can find more articles about AKS in the Kubernetes category.