116
edits
No edit summary |
(→Critical Incidents: textual improvements) |
||
| Line 6: | Line 6: | ||
# Acknowledge trigger in Zabbix. | # Acknowledge trigger in Zabbix. | ||
# Check if incident is still ongoing. | # Check if the incident is still ongoing. | ||
# | # Determine whether the incident is ongoing | ||
# Document all actions taken in Zulip topic. | # Determine whether clients are potentially affected, if so: | ||
# Create plan of action. | ## notify the affected clients (Slack preferred) | ||
## share the message sent to the client in the incident Zulip thread | |||
# Document all actions taken in the Zulip topic. | |||
# Create a plan of action. | |||
# Execute plan and document results in Zabbix thread. | # Execute plan and document results in Zabbix thread. | ||
# If unresolved, create new plan. | # If unresolved, create a new plan. | ||
# When resolved: | # When resolved: | ||
## Verify trigger is no longer firing. | ## Verify trigger is no longer firing. | ||
## Decide on when to notify affected clients (that you have notified of the incident) the incident has been resolved, and communicate this internally | ## Decide on when to notify affected clients (that you have notified of the incident), the incident has been resolved, and communicate this internally | ||
## Mark Zulip topic as resolved if no other incidents for host. | ## Mark Zulip topic as resolved if no other incidents for the host. | ||
## Check for related triggers and resolve them. | ## Check for related triggers and resolve them. | ||
Common issues that have occurred previously, and ''could'' occur again: | Common issues that have occurred previously, and ''could'' occur again: | ||
* SSH down: Check MaxStartups throttling, apply custom SSH config | * SSH down: Check MaxStartups throttling, apply custom SSH config | ||
* No backup: Verify backup process is running, check devteam email | * No backup: Verify backup process is running, check the devteam email | ||
* HTTPS down on Sunday: this can be due to | * HTTPS down on Sunday: this can be due to GitLab updates | ||
=== Non-Critical Incidents === | === Non-Critical Incidents === | ||
edits