VMware ESX and ESXi 3.5 U3 I/O failure on SAN LUN(s) and LUN queue is blocked indefinitely
KB Article 1008130
Updated Jan. 12, 2009

Products
VMware ESX

Product Versions
VMware ESX 3.5.x
VMware ESXi 3.5.x Embedded
VMware ESXi 3.5.x Installable

Symptoms

One or more of the following may be present:

* VMware ESX or ESXi host might get disconnected form VirtualCenter.

* All paths to the LUNs are in standby state.

* esxcfg-rescan might take a long time to complete or never completes (hung).

* Error messages matching this pattern are repeated continually in vmkernel:
vmkernel: cpu6:1177)SCSI: 675: Queue for device vml. has been blocked for 7 seconds.
vmkernel: cpu7:1184)SCSI: 675: Queue for device vml. has been blocked for 6399 seconds.

If you look at log entries previous to the first blocked message, you will see storage events and a failover attempt.
Example:
vmkernel: 31:19:32:26.199 cpu3:3824)Fil3: 5004: READ error 0xbad00e5
vmkernel: 31:19:32:29.224 cpu1:3961)StorageMonitor: 196: vmhba0:0:0:0 status = 0/5 0×0 0×0 0×0
vmkernel: 31:19:32:29.382 cpu2:1144)FS3: 5034: Waiting for timed-out heartbeat [HB state abcdef02 offset 3736576 gen 26 stamp 2748610023852 uuid 4939b0cf-c85aa695-158d-00144f021dd4 jrnl drv 4.31]
vmkernel: 31:19:32:29.638 cpu3:1053)<6>qla2xxx_eh_device_reset(1): device reset failed
vmkernel: 31:19:32:29.638 cpu3:1053)WARNING: SCSI: 4279: Reset during HBA failover on vmhba1:2:1 returns Failure
vmkernel: 31:19:32:29.638 cpu3:1053)WARNING: SCSI: 3746: Could not switchover to vmhba1:2:1. Check Unit Ready Command returned an error instead of NOT READY for standby controller .
vmkernel: 31:19:32:29.638 cpu3:1053)WARNING: SCSI: 4622: Manual switchover to vmhba1:2:1 completed unsuccessfully.
vmkernel: 31:19:32:29.638 cpu3:1053)StorageMonitor: 196: vmhba0:2:1:0 status = 0/1 0×0 0×0 0×0
vmkernel: 31:19:32:29.640 cpu2:1067)scsi(1): Waiting for LIP to complete…
vmkernel: 31:19:32:29.640 cpu2:1067)<6>qla2x00_fw_ready ha_dev_f=0xc
vmkernel: 31:19:32:30.532 cpu2:1026)StorageMonitor: 196: vmhba0:0:0:0 status = 0/2 0×0 0×0 0×0
last message repeated 31 times
vmkernel: 31:19:32:31.535 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×81
vmkernel: 31:19:32:31.541 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×82
vmkernel: 31:19:32:31.547 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×83
vmkernel: 31:19:32:31.568 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×84
vmkernel: 31:19:32:31.573 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×85
vmkernel: 31:19:32:31.576 cpu2:1067)<6>dpc1 port login OK: logged in ID 0×86
vmkernel: 31:19:32:32.531 cpu2:4267)StorageMonitor: 196: vmhba0:0:0:0 status = 0/2 0×0 0×0 0×0
last message repeated 31 times
vmkernel: 31:19:32:32.532 cpu1:3973)StorageMonitor: 196: vmhba0:0:0:0 status = 2/0 0×6 0×29 0×0

Resolution

This issue can occur on VMware ESX servers under the following conditions:

* Hypervisor version: VMware ESX 3.5 U3.
* SAN hardware: Active/Passive and Active/Active arrays (Fibre Channel and iSCSI).
* Trigger: This occurs when VMFS3 metadata updates are being done at the same time failover to an alternate path occurs for the LUN on which the VMFS3 volume resides .

A reboot is required to clear this condition.

VMware is working on a patch to address this issue. This knowledge base article will be updated after the patch is available.

© 2010 http://www.vmwarenews.de Creative Commons License
http://www.vmwarenews.de steht unter einer Creative Commons Namensnennung-Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz
Suffusion WordPress theme by Sayontan Sinha