I was puzzled when I saw that the ESXi hosts of my new lab are generating critical CPU fan errors. Strange, I’m sure I mounted the fan correctly – otherwise the system would have been shutdown automatically to prevent serious damage. 🙂
I had a look at the IPMI web interface and saw that a fan is detected – but why is it called “FRNT_FAN1“?
The appropriate vCenter Server view showed up the same information:
All disconnected fans and temperature sensors are listed as alerts. In the first place, I thought that this behavior generates the alert and analyzed thresholds using ipmitool on a Linux host:
# ipmitool -H myesxi.domain.loc -U admin sdr type Fan Password: CPU_FAN1 | A0h | lnr | 0.0 | 0 RPM FRNT_FAN1 | A2h | lnr | 0.0 | 0 RPM FRNT_FAN2 | A3h | lnr | 0.0 | 0 RPM FRNT_FAN3 | A4h | ok | 0.0 | 800 RPM FRNT_FAN4 | A5h | lnr | 0.0 | 0 RPM # ipmitool -H myesxi.domain.loc -U admin sensor get "FRNT_FAN1" Password: Locating sensor record... Sensor ID : FRNT_FAN1 (0xa2) Entity ID : 0.0 (Unspecified) Sensor Type (Threshold) : Fan (0x04) Sensor Reading : 0 (+/- 0) RPM Status : Lower Non-Recoverable Nominal Reading : 4480.000 Normal Minimum : 1040.000 Normal Maximum : 17920.000 Upper non-recoverable : 20000.000 Upper critical : 18960.000 Upper non-critical : 18000.000 Lower non-recoverable : 0.000 Lower critical : 0.000 Lower non-critical : 0.000 Positive Hysteresis : 80.000 Negative Hysteresis : 80.000 Minimum sensor range : Unspecified Maximum sensor range : Unspecified Event Message Control : Per-threshold Readable Thresholds : lnr lcr lnc unc ucr unr Settable Thresholds : lnr lcr lnc unc ucr unr Threshold Read Mask : lnr lcr lnc unc ucr unr Assertion Events : lnc- lcr- Assertions Enabled : lnc- lcr- Deassertions Enabled : lnc- lcr-
Disconnected fans are generating an alarm with lnr (lower non-recoverable) severity. Using ipmitool it is possible to read and set thresholds – but no negative values are possible. I had the idea to set negative thresholds for the fan sensors to stop the alert:
# ipmitool -U admin -H myesxi.domain.loc sensor thres "FRNT_FAN1" -- "-1" "-1" "-1" Password: Valid threshold '-1' for sensor 'FRNT_FAN1' not specified! ...
Amongst others, it is not possible to disable unneeded sensors in my setup (ASUS P9D-M and ASMB7-IKVM).
Let’s focus on the other abnormality – the wrong fan naming. I had a look at the mainboard and verified that the CPU fan was plugged into the “CPU_FAN1” port. After some tries I figured out that the “FRNT_FAN1” port is detected as “CPU_FAN1” by IPMI:
This also stopped the host hardware alert in ESXi / vCenter.
So finally, this was sufficient to fix the issue. The disconnected fans and sensors are still listed as errors – but no alarm is generated. It seems that the fan port names on the mainboard are faulty – but this might also be the result of a firmware update of the ASMB7-IKVM card. I think that this issue occured right after upgrading the firmware – but I’m not 100% sure.