Incident Handling: Difference between revisions

Incident Handling (view source)

Revision as of 01:49, 8 October 2025

113 bytes added , 8 October 2025

→‎Critical Incidents: textual improvements

Jakobbuis

118

edits

@@ Line 6: / Line 6: @@
 # Acknowledge trigger in Zabbix.
-# Check if incident is still ongoing.
+# Check if the incident is still ongoing.
-# If ongoing and clients are potentially affected, notify the affected clients via Slack (when done, communicate this internally).
+# Determine whether the incident is ongoing
-# Document all actions taken in Zulip topic.
+# Determine whether clients are potentially affected, if so:
-# Create plan of action.
+## notify the affected clients (Slack preferred)
+## share the message sent to the client in the incident Zulip thread
+# Document all actions taken in the Zulip topic.
+# Create a plan of action.
 # Execute plan and document results in Zabbix thread.
-# If unresolved, create new plan.
+# If unresolved, create a new plan.
 # When resolved:
 ## Verify trigger is no longer firing.
-## Decide on when to notify affected clients (that you have notified of the incident) the incident has been resolved, and communicate this internally
+## Decide on when to notify affected clients (that you have notified of the incident), the incident has been resolved, and communicate this internally
-## Mark Zulip topic as resolved if no other incidents for host.
+## Mark Zulip topic as resolved if no other incidents for the host.
 ## Check for related triggers and resolve them.
 Common issues that have occurred previously, and ''could'' occur again:
 * SSH down: Check MaxStartups throttling, apply custom SSH config
-* No backup: Verify backup process is running, check devteam email
+* No backup: Verify backup process is running, check the devteam email
-* HTTPS down on Sunday: this can be due to Gitlab updates
+* HTTPS down on Sunday: this can be due to GitLab updates
 === Non-Critical Incidents ===

Incident Handling: Difference between revisions

Incident Handling (view source)

Revision as of 01:49, 8 October 2025

Navigation menu

Search