WS Proxmox node reboot: Difference between revisions

← Older edit

WS Proxmox node reboot (view source)

Revision as of 06:16, 6 October 2025

463 bytes added , 6 October 2025

no edit summary

Dortund

125

edits

@@ Line 3: / Line 3: @@
 * Updating a node: `apt update` and `apt full-upgrade`
 * Make sure all VMs are actually migratable before adding to a HA group
-* If there are containers on the device you are looking to reboot- you are going to need to also create a maintenance mode to cover them (for example teamspeak or stats)
+* If there are containers on the device you are looking to reboot- you are going to need to also create a maintenance mode to cover them (for example teamspeak or awstats)
 * Containers will inherit the OS of their host, so you will also need to handle triggers related to their OS updating, where appropriate
 == Pre-Work ==
 * If a VM or container is going to incur downtime, you must let the affected parties know in advance. Ideally they should be informed the previous day.
-== Pre flight checks ==
+== Pre-flight checks ==
 * Check all Ceph pools are running on at least 3/2 replication
 * Check that all running VM's on the node you want to reboot are in HA (if not, add them or migrate them away manually)
+** '''The `compute.*` VM's are not to be migrated! Rebooting a node with such a VM present requires shutting down the VM!'''
 * Check that Ceph is healthy -> No remapped PG's, or degraded data redundancy
 * You have communicated that downtime is expected to the users who will be affected (Ideally one day in advance)
+== Update Process ==
+* Update the node: `apt update` and `apt full-upgrade`
+* Check the packages that are removed/updated/installed correctly and they are the sane (to make sense)
 == Reboot process ==
+* Complete the pre-flight checks
+* If you want to reboot for a kernel update, make sure the kernel is updated by following the Update Process written above
 * Start maintenance mode for the Proxmox node and any containers running on the node
 * Start maintenance mode for Ceph, specify that we only want to surpress the trigger for health state being in warning by setting tag `ceph_health` equals `warning`
@@ Line 20: / Line 27: @@
 [[File:Ceph-maintenance.png|thumb]]
 * Set noout flag on host: `ceph osd set-group noout <node>`
+# gain ssh access to host
+# Log in through IPA
+# Run the command
 * '''Reboot''' node through web GUI
 * Wait for node to come back up
 * Wait for OSD's to be back online
 * Remove noout flag on host: `ceph osd unset-group noout <node>` ,to do this:
-# gain ssh access to host
-# Log in through IPA
-# Run the command
 * If a kernel update was done, manually execute the `Operating system` item manually to detect the update. Manually executing the two items that indicate a reboot is also usefull if they were firing, to stop them/check no further reboots are needed.
 * Ackowledge & close triggers

WS Proxmox node reboot: Difference between revisions

WS Proxmox node reboot (view source)

Revision as of 06:16, 6 October 2025

Navigation menu

Search