Incident Handling: Difference between revisions

Jump to navigation Jump to search
no edit summary
(remove common issues in imperative checklist, better listed below)
No edit summary
Line 14: Line 14:
# Check if the incident is still ongoing.
# Check if the incident is still ongoing.
# Determine whether the incident is ongoing
# Determine whether the incident is ongoing
# If this report came in via SRE - Report:
## keep that thread open until the incident is resolved
## post a link to the SRE - Report thread to any underlying technical threads in SRE # Critical, SRE ## Non-critical, or SRE ### Informational that is related
# Determine whether clients are potentially affected, if so:
# Determine whether clients are potentially affected, if so:
## notify the affected clients (Slack preferred)
## notify the affected clients (Slack preferred if available)
## share the message sent to the client in the incident Zulip thread
## share the message sent to the client in the incident Zulip thread
# Document all actions taken in the Zulip topic.
# Document all actions taken in the Zulip topic.
Line 26: Line 29:
## Mark Zulip topic as resolved if no other incidents for the host.
## Mark Zulip topic as resolved if no other incidents for the host.
## Check for related triggers and resolve them.
## Check for related triggers and resolve them.
## If there were any SRE - Report threads
### post a summary describing the high-level incident, that it is resolved and how it was resolved.
### post that summary message to any client channels such as Slack too.
### close the thread in SRE - Report


=== Non-Critical Incidents ===
=== Non-Critical Incidents ===
118

edits

Navigation menu