116
edits
m (→Handover) |
|||
| (9 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
= Checklist = | |||
This checklist is a shorter, imperative version of [[Incident Handling#Full_procedure|the longer procedure below]]. You're encouraged to read the [[Incident Handling#Full_procedure|full procedure]] at least once to improve your understanding of the core material. | |||
=== Critical Incidents === | |||
Critical incidents must be resolved within 16 hours. | |||
# Acknowledge trigger in Zabbix. | |||
# Check if incident is still ongoing. | |||
# If ongoing and clients are potentially affected, notify the affected clients via Slack. | |||
# Document all actions taken in Zulip topic. | |||
# Create plan of action. | |||
# Execute plan and document results in Zabbix thread. | |||
# If unresolved, create new plan. | |||
# When resolved: | |||
## Verify trigger is no longer firing. | |||
## Mark Zulip topic as resolved if no other incidents for host. | |||
## Check for related triggers and resolve them. | |||
Common issues that have occurred previously, and ''could'' occur again: | |||
* SSH down: Check MaxStartups throttling, apply custom SSH config | |||
* No backup: Verify backup process is running, check devteam email | |||
* HTTPS down on Sunday: this can be due to Gitlab updates | |||
=== Non-Critical Incidents === | |||
Non-critical incidents must be acknowledged within 9 hours and resolved within 1 week. | |||
# Acknowledge in Zabbix thread | |||
# Check metrics sheet for existing milestone | |||
## If a milestone exists: | |||
### Add Lynx project ID to Zulip topic | |||
### Add 🔁 emoji if ID already reported | |||
## If no milestone exists: | |||
### Add to metrics sheet | |||
### Create Lynx project (priority 99, then 20 after estimation) | |||
### Create Kimai activity | |||
### Document IDs in Zulip topic | |||
=== Informational Incidents === | |||
Informational incidents must be acknowledged within 72 hours. | |||
# Acknowledge in Zabbix | |||
# Verify issue | |||
# Take action if needed | |||
=== External Reports === | |||
# Acknowledge receipt | |||
# Classify report as critical, non-critical or informational. | |||
# Create a Zulip topic in SRE # Critical, SRE ## Non-critical or SRE ### Informational (depending on classification) and add sufficient details. | |||
# Proceed with checklist above for the type of incident. | |||
= Full procedure = | |||
== Zulip migration == | == Zulip migration == | ||
Due to a migration to Zulip, the integration as was available on Mattermost is not available yet on Zulip. This leads to the following process changes: | Due to a migration to Zulip, the integration as was available on Mattermost is not available yet on Zulip. This leads to the following process changes: | ||
edits