WS Proxmox node reboot

From Delft Solutions
Revision as of 05:09, 29 February 2024 by LouisRaymond93 (talk | contribs)
Jump to navigation Jump to search

Tips & Notes

  • If you're expecting to reboot every node in the cluster, do the node with the containers last, to limit the amount of downtime and reboots for them
  • Updating a node: `apt update` and `apt full-upgrade`

Pre flight checks

  • Check all Ceph pools are running on at least 3/2 replication
  • Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
  • Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy

Reboot process

  • Start maintenance mode for the Proxmox node and any containers running on the node
  • Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
Ceph-maintenance.png
  • Set noout flag on host: `ceph osd set-group noout <node>`
  • Reboot node through web GUI
  • Wait for node to come back up
  • Wait for OSD's to be back online
  • Remove noout flag on host: `ceph osd unset-group noout <node>`
  to do this:gain ssh access to host

Log in through IPA Run said command

  • If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed.
  • Ackowledge & close triggers
  • Remove maintenance modes

Aftercare

  • Ensure that Kaboom API is running on Screwdriver or Paloma. This is to get the best performance for the VM.