Incident Handling: Difference between revisions

Jump to navigation Jump to search
(→‎Critical Incidents: textual improvements)
Line 60: Line 60:
# When an incident is in progress, and person A is handling it, then all incidents in area X, are handled by person A, rather than the FR. Unless working day ends. Person A should communicate clearly to FR when their day is over.
# When an incident is in progress, and person A is handling it, then all incidents in area X, are handled by person A, rather than the FR. Unless working day ends. Person A should communicate clearly to FR when their day is over.
# FR always has the last word on what solution to apply for resolving an incident.
# FR always has the last word on what solution to apply for resolving an incident.
== Zulip migration ==
Due to a migration to Zulip, the integration as was available on Mattermost is not available yet on Zulip. This leads to the following process changes:
* Acknowlegements and triggers resolving are not posted to Zulip by Zabbix
* Triggers are grouped in a topic on Zulip per host
* When an incident has been fully resolved, mark the topic as resolved, when any other incidents reported for the host are resolved
* There's no `?ongoing`, instead for now we can track open incidents by checking for unresolved topics
* The posting of incidents is less smart (only posting when not posted yet), so in order to prevent an incident from not being reported due to network issues or the likes, a message is posted after an inteval (8 hours for non-critical and lower, 1 hour for critical and above) while the incident has not been acknowleged.
* Incidents can be manually tracked by creating a topic by hand and reporting the problem.
* There is no automatic gitlab issue creation or syncing anymore.
Finally, where this process says to do something on Mattermost, you should now do so on Zulip. The updates in the process chapters themselves are WIP.


== Critical incidents ==
== Critical incidents ==
116

edits

Navigation menu