none
Network/File copy performance on Dell R710's with Hyperthreading enabled

    Domanda

  • Anyone know if there's a fix to this, other than to upgrade to a R720? :)

    PROBLEM:  Slow file copy or network transfer speeds to/from a Dell R710 with dual teamed Broadcom NICs

    STEPS TO EXPOSE THE PROBLEM:

    1. Copy a large file from a non-R710 SERVERA to a R710 SERVERB, note the transfer speed.  I see speeds on my 1GB network of 3MBps
    2. Restart and disable logical processors (so, disable Hyperthreading) in the BIOS of SERVERB
    3. Copy another large file from the same SERVERA to the same SERVERB with HT now disabled and note the speed.  I now see speeds of 60MBps

    You could also see the problem by setting up transactional replication between SERVERA and SERVERB and reinitializing a subscription.  The initial snapshot going from SERVERB with HT off is going to be way faster than it will be going to SERVERB with HT on.  SQL replication is actually where I saw the problem.  Our replicator wasn't keeping up, yet the publisher and distributor and the subscriber servers were all sitting there idle.  In the end, turning off Hyperthreading fixed (or at least worked around) the problem.

    You could also see the problem by making the R710 be an SCCM primary site.  The server sits there mostly idle, yet files back up in the millions

    I've seen all kinds of posts about disabling TCP Offloading or large segment offloading or what have you in the broadcom BACS console, but those settings have zero effect on this problem.  Also, disabling SMB2 is not an option even though i've seen some say that will fix it too.  Upgrading to the latest Broadcom drivers will not fix it either.  The only thing I've found to fix it is disabling hyperthreading.

    Anyone seen this and is there a fix other than buying new hardware or disabling hyperthreading/SMB2?  That's not really a valid fix to cripple it like that.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com

    venerdì 23 marzo 2012 20:38

Risposte

  • Just to follow up on this...

    We believe we have this fixed now with Dell's help. 

    1. The servers all had 48GB installed, but only 32GB was showing.  Upgraded to Server 2008 R2 Enterprise to see the full 48GB
      (I don't believe this improved the WAN file copy speeds, just freed up the true amount of memory)
    2. Upgraded all drivers, firmware, BIOS, etc. to latest from Dell & Broadcom
      (This did not improve the WAN file copy speeds, in fact it might have made them worse, but nobody will help you troubleshoot until you perform this step)
    3. Dell tech informed us that the NIC settings should be set to this using NETSH
      I believe this was most of the real problem

    TCP Global Parameters
    ----------------------------------------------
    Receive-Side Scaling State : enabled
    Chimney Offload State : automatic
    NetDMA State : enabled
    Direct Cache Acess (DCA) : disabled
    Receive Window Auto-Tuning Level : normal
    Add-On Congestion Control Provider : ctcp
    ECN Capability : disabled
    RFC 1323 Timestamps : disabled

    so, commands to fix this using netsh would be like this:


    netsh int tcp set global rss=enabled
    netsh int tcp set global netdma=enabled
    netsh int tcp set global chimney=automatic
    netsh int tcp set global autotuninglevel=normal
    netsh int tcp set global congestionprovider=ctcp

    So in the end, it was really confusing how all of this could conspire to slow down network ONLY for WAN traffic, not LAN traffic, and only on the Dell 710...but it did.  Bill was helpful in that he pointed me to a Dell person who ultimately had the answer.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com


    • Contrassegnato come risposta Number2 lunedì 9 aprile 2012 20:56
    • Modificato Number2 lunedì 9 aprile 2012 21:05
    lunedì 9 aprile 2012 20:55

Tutte le risposte

  • are you running the latest firmware from dell for both the system and network controllers?
    • Proposto come risposta Bill - MCSE sabato 24 marzo 2012 14:10
    • Proposta come risposta annullata Number2 sabato 24 marzo 2012 18:45
    sabato 24 marzo 2012 14:10
  • Well, give me a little credit :)


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com

    sabato 24 marzo 2012 18:52
  • Can you test install a clean copy of windows with hyper-v enabled? The HAL is different for single vs multi core processors. Is is server 2003 r1? r2?

    domenica 25 marzo 2012 03:50
  •  

    Hi,

    Have you tried the suggestions Bill provided? If not, please try to update the BIOS, hard disk firmware, chipset and other hardware drivers to check the result.

    In addition, please also make sure that the system is up to date.

    For other troubleshooting information, please also refer to the following threads:

    Hyper-V File Copy Speeds Slow, from Host initiated copies 

    http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/ff9cfb50-649d-4e87-b87e-bddfb93e9883

    Windows 2008 network file transfers EXTREMELY slow 

    http://social.technet.microsoft.com/Forums/en-US/winservergen/thread/e807a6b5-5602-4600-ab4e-2e2057d2fc77

    Regards,

    Arthur Li

    TechNet Subscriber Support

    If you are TechNet Subscription user and have any feedback on our support quality, please send your feedback here.


    Arthur Li

    TechNet Community Support

    lunedì 26 marzo 2012 07:26
    Moderatore
  • I might be off the mark here, but I didn't see any references to Hyper-V, so I'm not sure what bearing that's going to have.

    In relation to the HALs, I don't believe there'd be an issue there as the PowerEdge T710 ships with 2 x E5606 Xeon CPUs, which are quad core units in their own right.

    John, I know this will be of cold comfort, but I can't help out with specifics as we don't run Dells here. However, whenever I've seen issues like this in the past, it has boiled down to network driver issues.

    Being a traditional HP shop person, I've encountered numerous issues over time with the teaming software as far back as the early Windows 2000 days. I've also come across my fair share of issues with Broadcom adapters (particularly when they started showing up as on-board adapters on the DL360 G2's around the 2003 timeframe).

    So, with that generally useless "I've seen weird things like that" stuff out of the way, all I can suggest in the absence of having that hardware is that it might pay to try sourcing a dual or quad port Intel server adapter and seeing if that provides any level of consistency, because although I have nothing to back it other than anecdotal experience, my instinct tells me this is going to be a thread scheduling issue with the Broadcom driver.

    Cheers,
    Lain

    lunedì 26 marzo 2012 08:28
  • It's maybe not the same trouble, but I found on some R710 that the teaming does not work at is best even with the nic upgraded.

    I had to make a fail-over team. The service that was using the bandwidth was losing some packet from time to time, it was not consistant with the teaming.


    MCP | MCTS 70-236: Exchange Server 2007, Configuring

    lunedì 26 marzo 2012 14:58
    Moderatore
  • Oh, sorry, I didn't put the OS details. I hate when people don't do that! :)

    This is Server 2008 R2 Service Pack 1, with all non zero-severity updates patched to it (sev > 0).

    I don't require Hyper-V on this box, are you saying just by having Hyper-V enabled, something magical will happen to fix this or were you just assuming the box was using Hyper-V?  My point was that with HyperTHREADING turned off, the problem gets better.

    This particular box I can't just muck with because it's production, but we have like 100+ servers that are 710's with Hyperthreading enabled so I could take a different one that's redundant or not being used and play with it.  But I don't like the idea of turning on Hyper-V on a server that doesn't need it.  I'd prefer just leaving it off and turning Hyperthreading off.  But if someone thinks there's troubleshooting value there, I'll consider whatever.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com

    lunedì 26 marzo 2012 15:02
  • Yagmoth555, thanks for your response...originally this was set up as a failover instead of dual active.  I initially thought, "Oh, it should be teamed to 2Gbps with active-active instead of failover with active-passive" but changing that had no effect so I don't think it's the same issue as yours this time.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com

    lunedì 26 marzo 2012 15:05
  • Hi John,

    As I mentioned above, I think the Hyper-V comment was misguided. I'd ignore that.

    I'm assuming you can reproduce the issue with Hyperthreading enabled on any one of the other 710's? If so, it would definitely be worth the 15 minutes it would take to add a different card that does not leverage the same driver (I only used Intel before as an example) and see if it makes any difference.

    If it's not reproducible on another supposedly identical system, then that raises more questions about the affected box than it answers.

    Cheers,
    Lain

    lunedì 26 marzo 2012 23:29
  • Yes, all our 710s have the exact same problem.

    Of course, the problem there is that it'd take 6 months of project planning and architecture review and red tape to replace a standard NIC with a non-standard NIC, so I'm not sure I can plunk another card in very quickly.

    The whole point was just to see if anyone was aware of this issue and knew of any fixes.  Sounds like there's nothing I'm missing.  I've had friends at other large companies tell me the problem has been fixed in the Dell 720s and Dell 820s, but this whole problem seems so weird.  Upgrading to a 720 or 820 isn't a fix obviously.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com

    martedì 27 marzo 2012 01:17
  • I apologize! I wrote hyper-v but meant hyperthreading! you can see where my mind is at.

    Regards, Bill

    martedì 27 marzo 2012 03:47
  • but my point was you'll need to set one of these systems up on a bench with a clean install of windows to troubleshoot it. since your running 2008 I dont think the HAL single vs multiprocessor idea would be an issue.

    do you have dell gold support? dell's tech's are pretty good. contacting them through their chat support site works pretty well and they should be able to handle this issue.

    http://www.dell.com/goldchat


    Regards, Bill

    • Contrassegnato come risposta Number2 martedì 27 marzo 2012 03:57
    • Contrassegno come risposta annullato Number2 lunedì 9 aprile 2012 20:55
    martedì 27 marzo 2012 03:54
  • Just to follow up on this...

    We believe we have this fixed now with Dell's help. 

    1. The servers all had 48GB installed, but only 32GB was showing.  Upgraded to Server 2008 R2 Enterprise to see the full 48GB
      (I don't believe this improved the WAN file copy speeds, just freed up the true amount of memory)
    2. Upgraded all drivers, firmware, BIOS, etc. to latest from Dell & Broadcom
      (This did not improve the WAN file copy speeds, in fact it might have made them worse, but nobody will help you troubleshoot until you perform this step)
    3. Dell tech informed us that the NIC settings should be set to this using NETSH
      I believe this was most of the real problem

    TCP Global Parameters
    ----------------------------------------------
    Receive-Side Scaling State : enabled
    Chimney Offload State : automatic
    NetDMA State : enabled
    Direct Cache Acess (DCA) : disabled
    Receive Window Auto-Tuning Level : normal
    Add-On Congestion Control Provider : ctcp
    ECN Capability : disabled
    RFC 1323 Timestamps : disabled

    so, commands to fix this using netsh would be like this:


    netsh int tcp set global rss=enabled
    netsh int tcp set global netdma=enabled
    netsh int tcp set global chimney=automatic
    netsh int tcp set global autotuninglevel=normal
    netsh int tcp set global congestionprovider=ctcp

    So in the end, it was really confusing how all of this could conspire to slow down network ONLY for WAN traffic, not LAN traffic, and only on the Dell 710...but it did.  Bill was helpful in that he pointed me to a Dell person who ultimately had the answer.


    Number2 - (John Nelson)
    Microsoft MVP (2009) - System Center Configuration Manager
    http://number2blog.com


    • Contrassegnato come risposta Number2 lunedì 9 aprile 2012 20:56
    • Modificato Number2 lunedì 9 aprile 2012 21:05
    lunedì 9 aprile 2012 20:55