WS Proxmox node reboot: Difference between revisions
Jump to navigation
Jump to search
(Created page with "## Pre flight checks: * Check all Ceph pools are running on at least 2/3 replication * Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) * Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy ## Reboot process * Start maintenance mode for the Proxmox node and any containers running on the node * Start maintenance mode for Ceph, specify that we only want to surpress the trigger for...") |
No edit summary |
||
Line 1: | Line 1: | ||
== Pre flight checks == | |||
* Check all Ceph pools are running on at least 2/3 replication | * Check all Ceph pools are running on at least 2/3 replication | ||
* Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) | * Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) | ||
* Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy | * Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy | ||
== Reboot process == | |||
* Start maintenance mode for the Proxmox node and any containers running on the node | * Start maintenance mode for the Proxmox node and any containers running on the node | ||
* Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning` | * Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning` |
Revision as of 05:22, 27 February 2024
Pre flight checks
- Check all Ceph pools are running on at least 2/3 replication
- Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
- Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy
Reboot process
- Start maintenance mode for the Proxmox node and any containers running on the node
- Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
- Set noout flag on host: `ceph osd set-group noout <node>`
- Reboot node through web GUI
- Wait for node to come back up
- Wait for OSD's to be back online
- Remove noout flag on host: `ceph osd unset-group noout <node>`
- Ackowledge triggers
- Remove maintenance modes