Kill a non-responding VM under ESXi

It is possible that a VM is not responding any more - mostly it's a simple solution to restart the appropriate VM.

Under rare circumstances it is possible that this restart process isn't passing through. An indicator for this issue is that the process (which normally only takes a couple of seconds) needs multiple minutes and the progress bar freezes.

I had this problem recently - I was puzzled when I saw the following error message while starting the VM console:

1Unable to connect to the MKS: There is no VMware process running for config file ...

A workaround is to disable the hardware monitoring service in accordance with a VMware Knowledge Base article - but that didn't help a lot in my case.

I had to abort the VM process to restart it afterwards.

For this it is necessary to enable the SSH / console access of the ESXi host. If you're using vSphere Client you can find these settings below "Configuration" in the security profile settings.

When SSH is activated the following hint is displayed in the overview of the ESXi host:

1Konfigurationsprobleme
2Der Remote-Support-Modus (SSH) wurde für den Host xxx aktiviert

It is recommended to disable SSH after the non-responding VM was stopped.

After a SSH connection was established to the ESXi host you can easily find the parent process ID (PPID) of the VM (second column of the vmx processes) using ps and grep:

1~ # ps|grep -i DEADVM
227321599      vmm0:DEADVM
325806081      vmm1:DEADVM
421190063 27333880 mks:DEADVM           /bin/vmx
521181872 27333880 vcpu-0:DEADVM        /bin/vmx
621161393 27333880 vcpu-1:DEADVM        /bin/vmx

In this case the PPID is 27333880 - the following command kills this process. If this command isn't working (re-run the previous command after a couple of seconds!) try using the kill signal 9:

1# kill 27333880
2# kill -9 27333880

The VM is stopped - it's possible that the ESXi host marks the VM as "orphaned" for the next 30 seconds. Keep calm and wait - after a couple of seconds the VMs should able to boot again.

Translations: