WS Proxmox node reboot: Difference between revisions

From Delft Solutions
Jump to navigation Jump to search
(Created page with "## Pre flight checks: * Check all Ceph pools are running on at least 2/3 replication * Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) * Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy ## Reboot process * Start maintenance mode for the Proxmox node and any containers running on the node * Start maintenance mode for Ceph, specify that we only want to surpress the trigger for...")
 
No edit summary
Line 1: Line 1:
## Pre flight checks:
== Pre flight checks ==
* Check all Ceph pools are running on at least 2/3 replication
* Check all Ceph pools are running on at least 2/3 replication
* Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
* Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
* Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy
* Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy


## Reboot process
== Reboot process ==
* Start maintenance mode for the Proxmox node and any containers running on the node
* Start maintenance mode for the Proxmox node and any containers running on the node
* Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
* Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`

Revision as of 05:22, 27 February 2024

Pre flight checks

  • Check all Ceph pools are running on at least 2/3 replication
  • Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
  • Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy

Reboot process

  • Start maintenance mode for the Proxmox node and any containers running on the node
  • Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
Ceph-maintenance.png
  • Set noout flag on host: `ceph osd set-group noout <node>`
  • Reboot node through web GUI
  • Wait for node to come back up
  • Wait for OSD's to be back online
  • Remove noout flag on host: `ceph osd unset-group noout <node>`
  • Ackowledge triggers
  • Remove maintenance modes