Rebooting VM

From Delft Solutions
Jump to navigation Jump to search

Pre-flight checks

These instructions are not for rebooting Proxmox nodes or borders in our cluster! In the following informating 'host' will refer to either a VM, CT, or a physical server.

Ensure the host is not actively being used

Someone might be working with the host. Rebooting it will interrupt their work, so make sure noone is working on the host.

Ensuring this can be done by checking in with everyone, or announcing a maintenance period in the 'Organisational' stream and ensuring everyone has read the announcement.

Ensure rebooting the host does not interrupt an important process

Even though someone might not be actively working with a host, the host itself might be busy. Examples of this are: A client VM running a database migration, or a backup process is running.

Ensuring this can be done by also checking in with everyone, announcing a maintenance period ahead of time, or checking the host itself (or Zabbix for S3 backups)

Ensure you have SSH access to the host

For personal dev VM's this should not be an issue, but for work VM's this will require you to request 'sla-temporary-access' to the host from the First Responder. It will take some time for the host to become aware of the change, so make sure to do this more then an hour ahead of time.

Rebooting

This is the process for rebooting the host itself:

  1. Perform the pre-flight checks listed above
  2. Create a maintenance period for the host itself
  3. SSH into the host
  4. Run `apt update`
  5. Verify that the packages to update are sane. Get packages with `apt list --upgradable`. See section 'Package sanity check' for further details
  6. Update packages with `apt full-upgrade`. Before accepting, check that no unexpected package changes are happening for installing/updating/removing.
  7. After the updates are completed, reboot the host with `reboot`
  8. Wait for the host to be rebooted. You can check Zabbix, ping, or trying to SSH into the host again to check.
  9. On Zabbix, go to 'Configuration' -> 'Hosts', and click on 'items' for the rebooted host.
  10. Search for 'reboot', select the checkbox for both items and click on 'Execute now'.
  11. Search for 'operating', select the checkbox for the item that has a trigger that uses it, and select 'Execugte now'.
  12. Go to 'Monitoring' -> 'Problems'. Check the checkbox for triggers that fired for this host. Expected are '<host> has been restarted' and 'Operating system description has changed'. Scroll to the bottom and click on 'Mass update'. In the modal, check the checkbox for 'close problem', 'acknowledge', and in the 'message' box write the reason the host was rebooted. For example 'host was rebooted due to kernel update'
  13. Remove the maintenance period from the host

Package sanity check

Expected packages can be 'delftsolutions-*' for almost all hosts. For Gitlab (runners), expected packages can also include the gitlab-ee package or gitlab-runner packages