WS Proxmox node reboot

Tips & Notes

If you're expecting to reboot every node in the cluster, do the node with the containers last, to limit the amount of downtime and reboots for them
Updating a node: `apt update` and `apt full-upgrade`
Make sure all VMs are actually migratable before adding to a HA group
If there are containers on the device you are looking to reboot- you are going to need to also create a maintenance mode to cover them (for example teamspeak or awstats)
Containers will inherit the OS of their host, so you will also need to handle triggers related to their OS updating, where appropriate

If a VM or container is going to incur downtime, you must let the affected parties know in advance. Ideally they should be informed the previous day.

Check all Ceph pools are running on at least 3/2 replication
Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
- The `compute.*` VM's are not to be migrated! Rebooting a node with such a VM present requires shutting down the VM!
Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy
You have communicated that downtime is expected to the users who will be affected (Ideally one day in advance)

Update the node: `apt update` and `apt full-upgrade`
Check the packages that are removed/updated/installed correctly and they are sane (to make sense)

Complete the pre-flight checks
If you want to reboot for a kernel update, make sure the kernel is updated by following the Update Process written above
Start maintenance mode for the Proxmox node and any containers running on the node
Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
Let affected parties know that the maintenance period you told them about in the preflight checks is about to take place.

Place the node in HA maintenance mode so the cluster does not trigger failover or recovery actions `ha-manager crm-command node-maintenance enable <node>`
Reboot node through web GUI
Wait for node to come back up
Wait for OSD's to be back online
disable node maintenance mode `ha-manager crm-command node-maintenance disable <node>`
Remove noout flag on host: `ceph osd unset-group noout <node>` ,to do this:
If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed.
Ackowledge & close triggers
Remove maintenance modes

Ensure that Kaboom API is running on Screwdriver or Paloma. This is to get the best performance for the VM.