Incident Handling: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 101: Line 101:
=== Acknowledging ===
=== Acknowledging ===
Fully acknowledging a non-critical incident requires the following tasks to have been completed:
Fully acknowledging a non-critical incident requires the following tasks to have been completed:
* Acknowledging the incident on Zabbix
* Acknowledging the incident on Zabbix, which means you take responsibility of completing the steps listed below.
* Add the non-critical incident as a milestone in the metrics sheet
 
 
The next steps don't have to be done immediatly, as they have dependencies, but be started and scheduled for completion the next work day.
 
Check if there's already a uncompleted milestone for this host with this issue in the metrics sheet.
If a milestone is already present:
* Report in the topic the Lynx project ID for resolving this issue.
 
If a milestone is NOT already present:
* Add the non-critical incident as a milestone in the metrics sheet, following the naming convention
** Start date is the date of the incident
** Start date is the date of the incident
** DoD states what needs to be true for the non-critical incident to be consider resolved
** DoD states what needs to be true for the non-critical incident to be consider resolved
* Add the non-critical incident to Lynx as a project
* Add the non-critical incident to Lynx as a project
** Follow the naming convention below for the title & project ID
** Tasks need to be added
** Tasks need to be added
** Final tasks needs to have the SLO deadline set as 'contraint'
** Final tasks needs to have the SLO deadline set as 'contraint'
** Project priority is set to 20 (as a default)
** Project priority is set to 99 while not estimated yet. After the estimation is done, the priority should be set to 20
** The tasks are estimated for SP
** The tasks are estimated for SP
* The Lynx project ID is reported in the non-critical incident's topic on Zulip
* The Lynx project ID is reported in the non-critical incident's topic on Zulip, and logged in the metrics sheet
* A Kimai activity is created in Kimai for the non-critical incident
* A Kimai activity is created in Kimai for the non-critical incident, following the naming convetion


Checklist (outdated)
==== Naming convention ====
# Acknowledge on Zabbix and state who is responsible for resolving this in the description
* Kimai activity name needs to follow the pattern: '<YYYY-MM> <problem_title>'. For <problem_title>, incorporate the trigger title and hostname for clarity.
# Communicate plan/next steps (even if that is gathering information)
* Milestone name needs to follow the pattern: 'Delft Solutions Hosting Incident response work <kimai_activity_name>'
# Communicate findings/results of executed plan, go back to previous step if not resolved
* Lynx project name needs to follow the pattern: 'Delft Solutions Hosting Incident response work <kimai_activity_name>'
# If there is no resolution to the incident, evaluate if the trigger needs updating/disabling
* Lynx project ID needs to follow the pattern: 'SRE<YYMM><XXX>', where <XXX> is some three letter shorthand that relates to the problem/host
# Resolve incident


== Informational incidents ==
== Informational incidents ==
116

edits

Navigation menu