WS Proxmox node reboot: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
== Tips & Notes == | |||
* If you're expecting to reboot every node in the cluster, do the node with the containers last, to limit the amount of downtime and reboots for them | |||
* Updating a node: `apt update` and `apt full-upgrade` | |||
== Pre flight checks == | == Pre flight checks == | ||
* Check all Ceph pools are running on at least 2/3 replication | * Check all Ceph pools are running on at least 2/3 replication | ||
Line 14: | Line 18: | ||
* Remove noout flag on host: `ceph osd unset-group noout <node>` | * Remove noout flag on host: `ceph osd unset-group noout <node>` | ||
* If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed. | * If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed. | ||
* Ackowledge triggers | * Ackowledge & close triggers | ||
* Remove maintenance modes | * Remove maintenance modes | ||
== Aftercare == | |||
* Ensure that Kaboom API is running on Screwdriver or Paloma. This is to get the best performance for the VM. |
Revision as of 05:56, 27 February 2024
Tips & Notes
- If you're expecting to reboot every node in the cluster, do the node with the containers last, to limit the amount of downtime and reboots for them
- Updating a node: `apt update` and `apt full-upgrade`
Pre flight checks
- Check all Ceph pools are running on at least 2/3 replication
- Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
- Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy
Reboot process
- Start maintenance mode for the Proxmox node and any containers running on the node
- Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
- Set noout flag on host: `ceph osd set-group noout <node>`
- Reboot node through web GUI
- Wait for node to come back up
- Wait for OSD's to be back online
- Remove noout flag on host: `ceph osd unset-group noout <node>`
- If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed.
- Ackowledge & close triggers
- Remove maintenance modes
Aftercare
- Ensure that Kaboom API is running on Screwdriver or Paloma. This is to get the best performance for the VM.