What we need:
Determine why we have Persistent iSCSIprt errors in Hyper-V cluster Windows Server 2012
4x identical hosts Dell R720 running Windows server 2012 standard
4x - iSCSI NICs: Intel(R) Gigabit 4P I350-t Adapter - Driver 126.96.36.199 - FW Family 13.0.0
MTU set to 9000
Chimey disabled, auto Tuning disabled, congestion provider set to None
All kinds of OffLoad features were disabled in the iSCSI NICs properties
No teaming configuration is in place
RSS/TOE/TOEv2, Virtual Machine Queue (VMQ) and energy saving options are disabled
2x PCT6224 - FW 188.8.131.52
2 stack cables interconnecting them
STP port-fast enabled on all ethernet ports
MUT set to 9216 on all ethernet ports
Flow control is active in all ports
Speed in 1000 full duplex being auto negotiated
Multi, broad and unicast storm disable
None errors logged in any ethernet or stack ports
3x EQL Storages
2x EQPS6100X + 1x PS4100E all running 6.0.4
Customer is experiencing timeout and disconnections:
* All four hosts are logging the following System Events every 1 minute:
ID 9 / Source iScsiPrt / Target did not respond in time for a SCSI request. The CDB is given in the dump data.
ID 39 / Source iScsiPrt / Initiator sent a task management command to reset the target. The target name is given in the dump data.
ID 129 / Source iScsiPrt / No Description
This events shows up only in Windows Server 2012 environment, the issue was isolated and do not occur with Windows Server 2008 in the same SAN.
Troubleshooting steps so far:
- Disable windows Firewall and make sure the non-san subnets are excluded in the hit kit *
- Make sure the latest drivers from Dell D&D are installed for the iSCSI NICs
- Jumbo Fram size setup correctly.
- Disabled TOE, RSS and Large Send Offload, FlowControl setup
- netsh interface tcp set global autotuninglevel=disable netsh int tcp set global chimney=disabled netsh interface tcp set global rss=disabled
- Followed some practices from here: http://en.community.dell.com/support-forums/storage/f/3775/p/19480319/20326067.aspx
- RSS/TOE/TOEv2, Virtual Machine Queue (VMQ) and energy saving options are disabled
- All nodes are updated with the latest rollup update.
- Teaming not configured.
- Changed binding order on NICS
- Disabled NetBIOS on iSCSI • Made sure that each NIC is NOT set to register its connection in DNS • Remove File and Printer sharing and Client from Microsoft networks
- Following updates were applied:
http://support.microsoft.com/kb/2791465/en-US ( kb 2779768 )
http://support.microsoft.com/?id=2813630 suprimido no KB2838669
- Disabled TCP Delay ACK in Server
- Switches: STP port-fast enabled in all ethernet ports + MUT set to 9216 in all ethernet ports + Flow control is active in all ports + Unicast storm disabled
- Captured iScsi traffic with wareshark
Anyone can help us?
- Edited by Walter Cesar Thursday, July 04, 2013 8:39 PM
Did you ever fix this, we have exactly the same issue. Our situation may be a little different.
Two sites (with different subnets) combined in one Hyper-V 2012 R2 Failover Cluster.
On each site 3 x R720 and 1 x PS6100X.
All 6 servers are connected to both EQL group IPs.
We don't get these 3 events when we create an individual Hyper-V Failover Cluster for each site.
- Edited by Bart van Kleef Wednesday, June 18, 2014 6:48 PM
I have three Hyper-V 2012 (not R2) nodes and I'm experiencing similar issues.
I get iscsiprt error suddenly after a storage migration of a big VM ends: I launch the storage migration, it starts correctly and, more or less 60 seconds after it ends, iscsiprt errors appears in the event log of the node. I lose iSCSI connectivity with the iSCSI target on the storage were the VM was on before migration. The storage migrated VM works because is already moved to the "new" storage but obviously all the CSV on the "old" storage become inaccessible, I can see them offline or online (no access) so all the VMs that are on the old storage goes offline.
I noticed that it didn't happen eveytime I storage migrate a VM, but only sometimes and with big VMs.
I tried both with MPIO and with MCS since our storage supports both of them. I tuned some iSCSI parameters on the initiator. I leave a list of parameters I tried to tune at the end of this message.
The only thing I didn't do already is: disable RSS, disable TCP autotuninglevel and disable delayed ACK.
Someone can help me to understand what is causing these problems? What is AV you are referring to?
This is the parameters I tuned in order to increase timeouts but right now I didn't yet solved the problem:HKLM\SYSTEM\CurrentControlSet\Services\mpio\Parameters
UseCustomPathRecoveryInterval 0 -> 1
PDORemovePeriod 20 -> 120
PathRecoveryInterval 40 -> 40
EnableNOPOut 0 -> 1
MaxRequestHoldTime 60 -> 90
LinkDownTime 15 -> 35
Thanks in Advance,