92
edits
No edit summary |
No edit summary |
||
| Line 5: | Line 5: | ||
This write-up should be seen as one practical example that may help guide similar interventions in the future or serve as a starting point when assessing next steps in a hardware-related incident. | This write-up should be seen as one practical example that may help guide similar interventions in the future or serve as a starting point when assessing next steps in a hardware-related incident. | ||
== Confirm it's a hardware issue == | |||
In this case we received 2 alerts | |||
1. | |||
* iDRAC on banshee.idrac.ws.maxmaton.nl reporting critical failure | |||
* Overall System Status is Critical (5) | |||
2. | |||
* Overall System Status is Critical (5) | |||
* Problem with memory in slot DIMM.Socket.A1 | |||
To confirm the issue, we logged into the affected server (banshee) and ran the following commands: | |||
<pre lang="bash"> | |||
journalctl -b | grep -i memory | |||
journalctl -k | grep -i error | |||
</pre> | |||
We saw multiple entries reporting Hardware Error. | |||
This was also confirmed by checking hardware health on the iDRAC interface: | |||
[[File:Banshee.idrac.ws.maxmaton.nl restgui index.html 8ce2fb21ce62c14bc4975f040b973a5f(1).png|thumb|alt=Banshee's hardware health on iDRAC|Banshee's hardware health on iDRAC]] | |||
edits