Bug in HP Agents 8.70 causes HP DL580 G7 running ESX to reboot on PSU failure [SOLVED]
We discovered a bug in HP Agents 8.70 installed on ESX 4.1 running on HP ProLiant DL580 G7 servers.
Those machines come equipped with 4 PSUs. When the power is lost of 2 PSUs at the same time, a reboot is triggered by the HP Agents. This causes all VMs on that host to crash (HA will restart them on another host if configured properly). This reboot makes no sense as the system is perfectly capable of running on 2 PSUs.
The messages logfile show the following events:
— hpasmlited: CRITICAL: System Power Supply: General Failure (Power Supply 1)
— hpasmlited: WARNING: System Power Supplies Not Redundant
— hpasmlited: CRITICAL: System Power Supply: General Failure (Power Supply 3)
— hpasmlited: A System Reboot has been requested by the management processor in 60 seconds.
If the power is lost on 2 PSUs with a pause in between (we tested with 30 seconds), the reboot is NOT triggered.
I don’t know if this bug is specific to our environment and/or to ESX, but i strongly suggest to test this if you have DL 580 G7 hosts running ESX(i). Pull the power from 2 PSUs and monitor the messages logfile for any reboot messages.
As a workaround, you can stop the hp-health services (leaving you blind for the time being because you don’t have hardware monitoring) or pray to god that 2 PSUs won’t fail at the same time 🙂
HP confirmed this as a bug and it will be solved in the next release.
[UPDATE] The bug is solved in HP Agents for ESX 9.0.1a. Download them from here for ESX 4.1 http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4194641&prodNameId=4194642&swEnvOID=4091&swLang=8&mode=2&taskId=135&swItem=MTX-75fa30cc409745a5be3ef0e37e