ASUS P9D-M, ASMB7-IKVM and false vCenter alarms

ESXi host with fan alert

ESXi host with fan alert

I was puzzled when I saw that the ESXi hosts of my new lab are generating critical CPU fan errors. Strange, I’m sure I mounted the fan correctly – otherwise the system would have been shutdown automatically to prevent serious damage. ūüôā

I had a look at the IPMI web interface and saw that a fan is detected – but why is it called “FRNT_FAN1“?

IPMI fan and temperature sensors

IPMI fan and temperature sensors

The appropriate vCenter Server view showed up the same information:

vCenter fan and temperature sensors

vCenter fan and temperature sensors

All disconnected fans and temperature sensors are listed as alerts. In the first place, I thought that this behavior generates the alert and analyzed thresholds using ipmitool on a Linux host:

# ipmitool -H myesxi.domain.loc -U admin sdr type Fan
Password: 
CPU_FAN1         | A0h | lnr  |  0.0 | 0 RPM
FRNT_FAN1        | A2h | lnr |  0.0 | 0 RPM
FRNT_FAN2        | A3h | lnr |  0.0 | 0 RPM
FRNT_FAN3        | A4h | ok  |  0.0 | 800 RPM
FRNT_FAN4        | A5h | lnr |  0.0 | 0 RPM
# ipmitool -H myesxi.domain.loc -U admin sensor get "FRNT_FAN1"
Password: 
Locating sensor record...
Sensor ID              : FRNT_FAN1 (0xa2)
 Entity ID             : 0.0 (Unspecified)
 Sensor Type (Threshold)  : Fan (0x04)
 Sensor Reading        : 0 (+/- 0) RPM
 Status                : Lower Non-Recoverable
 Nominal Reading       : 4480.000
 Normal Minimum        : 1040.000
 Normal Maximum        : 17920.000
 Upper non-recoverable : 20000.000
 Upper critical        : 18960.000
 Upper non-critical    : 18000.000
 Lower non-recoverable : 0.000
 Lower critical        : 0.000
 Lower non-critical    : 0.000
 Positive Hysteresis   : 80.000
 Negative Hysteresis   : 80.000
 Minimum sensor range  : Unspecified
 Maximum sensor range  : Unspecified
 Event Message Control : Per-threshold
 Readable Thresholds   : lnr lcr lnc unc ucr unr 
 Settable Thresholds   : lnr lcr lnc unc ucr unr 
 Threshold Read Mask   : lnr lcr lnc unc ucr unr 
 Assertion Events      : lnc- lcr- 
 Assertions Enabled    : lnc- lcr- 
 Deassertions Enabled  : lnc- lcr-

Disconnected fans are generating an alarm with lnr (lower non-recoverable) severity. Using ipmitool it is possible to read and set thresholds – but no negative values are possible. I had the idea to set negative thresholds for the fan sensors to stop the alert:

# ipmitool -U admin -H myesxi.domain.loc sensor thres "FRNT_FAN1" -- "-1" "-1" "-1"
Password:
Valid threshold '-1' for sensor 'FRNT_FAN1' not specified!
...

Amongst others, it is not possible to disable unneeded sensors in my setup (ASUS P9D-M and ASMB7-IKVM).

Let’s focus on the other abnormality – the wrong fan naming. I had a look at the mainboard and verified that the CPU fan was plugged into the “CPU_FAN1” port. After some tries I figured out that the “FRNT_FAN1” port is detected as “CPU_FAN1” by IPMI:

IPMI fan and temperature sensors

IPMI fan and temperature sensors

This also stopped the host hardware alert in ESXi / vCenter.

So finally, this was sufficient to fix the issue. The disconnected fans and sensors are still listed as errors – but no alarm is generated. It seems that the fan port names on the mainboard are faulty – but this might also be the result of a firmware update of the ASMB7-IKVM card. I think that this issue occured right after upgrading the firmware – but I’m not 100% sure.

Sharing is caring

2 comments Write a comment

    • Hi Serg,

      on my ASUS P9D-M mainboard, the labels of the CPU and front case fans were mixed. So I simply attached the CPU fan to the front case fan port to fix the issue.

      Hope this helps!

      Best wishes,
      Christian.

Leave a Reply