2
edits
No edit summary |
No edit summary |
||
| Line 4: | Line 4: | ||
* Make sure all VMs are actually migratable before adding to a HA group | * Make sure all VMs are actually migratable before adding to a HA group | ||
* If there are containers on the device you are looking to reboot- you are going to need to also create a maintenance mode to cover them (for example teamspeak or stats) | * If there are containers on the device you are looking to reboot- you are going to need to also create a maintenance mode to cover them (for example teamspeak or stats) | ||
* Containers will inherit the OS of their host, so you will also need to handle triggers related to their OS updating, where appropriate | * Containers will inherit the OS of their host, so you will also need to handle triggers related to their OS updating, where appropriate | ||
* If a VM or container is going to incur downtime, you must let the affected parties know in advance. Ideally they should be informed the previous day. | |||
== Pre flight checks == | == Pre flight checks == | ||
| Line 10: | Line 11: | ||
* Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) | * Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually) | ||
* Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy | * Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy | ||
* You have communicated that downtime is expected to the users who will be affected (Ideally one day in advance) | |||
== Reboot process == | == Reboot process == | ||
* Start maintenance mode for the Proxmox node and any containers running on the node | * Start maintenance mode for the Proxmox node and any containers running on the node | ||
* Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning` | * Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning` | ||
* Let affected parties know that the mainenance period you told them about in the preflight checks is about to take place. | |||
[[File:Ceph-maintenance.png|thumb]] | [[File:Ceph-maintenance.png|thumb]] | ||
* Set noout flag on host: `ceph osd set-group noout <node>` | * Set noout flag on host: `ceph osd set-group noout <node>` | ||
edits