WS Proxmox node reboot

From Delft Solutions

Revision as of 06:09, 29 February 2024 by LouisRaymond93 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Tips & Notes

If you're expecting to reboot every node in the cluster, do the node with the containers last, to limit the amount of downtime and reboots for them
Updating a node: `apt update` and `apt full-upgrade`

Pre flight checks

Check all Ceph pools are running on at least 3/2 replication
Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy

Reboot process

Start maintenance mode for the Proxmox node and any containers running on the node
Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`

Set noout flag on host: `ceph osd set-group noout <node>`
Reboot node through web GUI
Wait for node to come back up
Wait for OSD's to be back online
Remove noout flag on host: `ceph osd unset-group noout <node>`

  to do this:gain ssh access to host

Log in through IPA Run said command

If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed.
Ackowledge & close triggers
Remove maintenance modes

Aftercare

Ensure that Kaboom API is running on Screwdriver or Paloma. This is to get the best performance for the VM.

Retrieved from "https://docs.delftsolutions.nl/index.php?title=WS_Proxmox_node_reboot&oldid=227"