none
iSCSIprt errors with persistent disconnections

    Question

  • What we need:

    ==========

    Determine why we have Persistent iSCSIprt errors in Hyper-V cluster Windows Server 2012

      

    Environment:
    ==========

    4x identical hosts Dell R720 running Windows server 2012 standard

    4x - iSCSI NICs: Intel(R) Gigabit 4P I350-t Adapter - Driver 12.1.76.0 - FW Family 13.0.0

    MTU set to 9000

    Chimey disabled, auto Tuning disabled, congestion provider set to None

    All kinds of OffLoad features were disabled in the iSCSI NICs properties

    No teaming configuration is in place

    RSS/TOE/TOEv2, Virtual Machine Queue (VMQ) and energy saving options are disabled

    2x PCT6224 - FW 3.3.5.5

    2 stack cables interconnecting them

    STP port-fast enabled on all ethernet ports

    MUT set to 9216 on all ethernet ports

    Flow control is active in all ports

    Speed in 1000 full duplex being auto negotiated

    Multi, broad and unicast storm disable

    None errors logged in any ethernet or stack ports

    3x EQL Storages

    2x EQPS6100X + 1x PS4100E all running 6.0.4

    Current situation:

    ===========

    Customer is experiencing timeout and disconnections:

    * All four hosts are logging the following System Events every 1 minute:

    ID 9 / Source iScsiPrt / Target did not respond in time for a SCSI request. The CDB is given in the dump data.

    ID 39 / Source iScsiPrt / Initiator sent a task management command to reset the target. The target name is given in the dump data.

    ID 129 / Source iScsiPrt / No Description

    This events shows up only in Windows Server 2012 environment, the issue was isolated and do not occur with Windows Server 2008  in the same SAN.

    Troubleshooting steps so far:

    ===================

    • Disable windows Firewall and make sure the non-san subnets are excluded in the hit kit *
    • Make sure the latest drivers from Dell D&D are installed for the iSCSI NICs
    • Jumbo Fram size setup correctly.
    • Disabled TOE, RSS and Large Send Offload, FlowControl setup
    • netsh interface tcp set global autotuninglevel=disable netsh int tcp set global chimney=disabled netsh interface tcp set global rss=disabled
    • Followed some practices from here: http://en.community.dell.com/support-forums/storage/f/3775/p/19480319/20326067.aspx
    • RSS/TOE/TOEv2, Virtual Machine Queue (VMQ) and energy saving options are disabled
    • All nodes are updated with the latest rollup update.
    • Teaming not configured.
    • Changed binding order on NICS
    • Disabled NetBIOS on iSCSI • Made sure that each NIC is NOT set to register its connection in DNS • Remove File and Printer sharing and Client from Microsoft networks
    • Following updates were applied:

    http://support.microsoft.com/kb/2791465/en-US ( kb 2779768 )
    http://support.microsoft.com/kb/2795944/en-US
    http://support.microsoft.com/kb/2822241/en-US
    http://support.microsoft.com/kb/2808584/en-US
    http://support.microsoft.com/?id=2838669
    http://support.microsoft.com/?id=2813630 suprimido no KB2838669
    http://support.microsoft.com/?id=2796000 
    http://support.microsoft.com/?id=2795997 
    http://support.microsoft.com/?id=2795993 
    http://support.microsoft.com/kb/2838669 

    • Disabled TCP Delay ACK in Server
    • Switches: STP port-fast enabled in all ethernet ports + MUT set to 9216 in all ethernet ports + Flow control is active in all ports + Unicast storm disabled
    • Captured iScsi traffic with wareshark

    Anyone can help us?

    VP


    • Edited by Vgn3r Pilr Thursday, July 04, 2013 8:39 PM
    Thursday, July 04, 2013 8:39 PM

Answers

All replies

  • Did you ever fix this, we have exactly the same issue. Our situation may be a little different.

    Two sites (with different subnets) combined in one Hyper-V 2012 R2 Failover Cluster.
    On each site 3 x R720 and 1 x PS6100X.
    All 6 servers are connected to both EQL group IPs.

    We don't get these 3 events when we create an individual Hyper-V Failover Cluster for each site.
    Wednesday, June 18, 2014 6:47 PM
  • yes, we fixed it. We disabled the AV in both nodes. Try that one.
    • Marked as answer by Vgn3r Pilr Wednesday, June 18, 2014 7:20 PM
    Wednesday, June 18, 2014 7:20 PM