Asked by:
Windows TCP Socket Buffer Hitting Plateau Too Early
Question
-
Note: This is a repost of a ServerFault Question edited over the course of a few days, originally here: http://serverfault.com/questions/608060/windows-tcp-window-scaling-hitting-plateau-too-early
Scenario: We have a number of Windows clients regularly uploading large files (FTP/SVN/HTTP PUT/SCP) to Linux servers that are ~100-160ms away. We have 1Gbit/s synchronous bandwidth at the office and the servers are either AWS instances or physically hosted in US DCs.
The initial report was that uploads to a new server instance were much slower than they could be. This bore out in testing and from multiple locations; clients were seeing stable 2-5Mbit/s to the host from their Windows systems.
I broke out
iperf -son a an AWS instance and then from a Windows client in the office:iperf -c 1.2.3.4[ 5] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 55185 [ 5] 0.0-10.0 sec 6.55 MBytes 5.48 Mbits/seciperf -w1M -c 1.2.3.4[ 4] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 55239 [ 4] 0.0-18.3 sec 196 MBytes 89.6 Mbits/secThe latter figure can vary significantly on subsequent tests, (Vagaries of AWS) but is usually between 70 and 130Mbit/s which is more than enough for our needs. Wiresharking the session, I can see:
-
iperf -cWindows SYN - Window 64kb, Scale 1 - Linux SYN, ACK: Window 14kb, Scale: 9 (*512)
-
iperf -c -w1MWindows SYN - Windows 64kb, Scale 1 - Linux SYN, ACK: Window 14kb, Scale: 9
Clearly the link can sustain this high throughput, but I have to explicity set the window size to make any use of it, which most real world applications won't let me do. The TCP handshakes use the same starting points in each case, but the forced one scales
Conversely, from a Linux client on the same network a straight,
iperf -c(using the system default 85kb) gives me:[ 5] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 33263 [ 5] 0.0-10.8 sec 142 MBytes 110 Mbits/secWithout any forcing, it scales as expected. This can't be something in the intervening hops or our local switches/routers and seems to affect Windows 7 and 8 clients alike. I've read lots of guides on auto-tuning, but these are typically about disabling scaling altogether to work around bad terrible home networking kit.
Can anyone tell me what's happening here and give me a way of fixing it? (Preferably something I can stick in to the registry via GPO.)
Notes
The AWS Linux instance in question has the following kernel settings applied in
sysctl.conf:net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.rmem_default = 1048576 net.core.wmem_default = 1048576 net.ipv4.tcp_rmem = 4096 1048576 16777216 net.ipv4.tcp_wmem = 4096 1048576 16777216I've used
dd if=/dev/zero | ncredirecting to/dev/nullat the server end to rule outiperfand remove any other possible bottlenecks, but the results are much the same. Tests withncftp(Cygwin, Native Windows, Linux) scale in much the same way as the above iperf tests on their respective platforms.First fix attempts.
- Enabling CTCP - This makes no difference; window scaling is identical. (If I understand this correctly, this setting increases the rate at which the congestion window is enlarged rather than the maximum size it can reach)
- Enabling TCP timestamps. - No change here either.
- Nagle's algorithm - That makes sense and at least it means I can probably ignore that particular blips in the graph as any indication of the problem.
- pcap files: Zip file available here: https://www.dropbox.com/s/104qdysmk01lnf6/iperf-pcaps-10s-Win%2BLinux-2014-06-30.zip (Anonymised with bittwiste, extracts to ~150MB as there's one from each OS client for comparison)
Second fix attempts.
I've enabled ctcp and disabled chimney offloading: TCP Global Parameters
---------------------------------------------- Receive-Side Scaling State : enabled Chimney Offload State : disabled NetDMA State : enabled Direct Cache Acess (DCA) : disabled Receive Window Auto-Tuning Level : normal Add-On Congestion Control Provider : ctcp ECN Capability : disabled RFC 1323 Timestamps : enabled Initial RTO : 3000 Non Sack Rtt Resiliency : disabledBut sadly, no change in the throughput.
I do have a cause/effect question here, though: The graphs are of the RWIN value set in the server's ACKs to the client. With Windows clients, am I right in thinking that Linux isn't scaling this value beyond that low point because the client's limited CWIN prevents even that buffer from being filled? Could there be some other reason that Linux is artificially limiting the RWIN?
Note: I've tried turning on ECN for the hell of it; but no change, there.
Third fix attempts.
No change following disabling heuristics and RWIN autotuning. Have updated the Intel network drivers to the latest (12.10.28.0) with software that exposes functioanlity tweaks viadevice manager tabs. The card is an 82579V Chipset on-board NIC - (I'm going to do some more testing from clients with realtek or other vendors)
Focusing on the NIC for a moment, I've tried the following (Mostly just ruling out unlikely culprits):
- Increase receive buffers to 2k from 256 and transmit buffers to 2k from 512 (Both now at maximum) - No change
- Disabled all IP/TCP/UDP checksum offloading. - No change.
- Disabled Large Send Offload - Nada.
- Turned off IPv6, QoS scheduling - Nowt.
Further investigation
Trying to eliminate the Linux server side, I started up a Server 2012R2 instance and repeated the tests using
iperf(cygwin binary) and NTttcp.With
iperf, I had to explicitly specify-w1mon both sides before the connection would scale beyond ~5Mbit/s. (Incidentally, I could be checked and the BDP of ~5Mbits at 91ms latency is almost precisely 64kb. Spot the limit...)The ntttcp binaries showed now such limitation. Using
ntttcpr -m 1,0,1.2.3.5on the server andntttcp -s -m 1,0,1.2.3.5 -t 10on the client, I can see much better throughput:Copyright Version 5.28 Network activity progressing... Thread Time(s) Throughput(KB/s) Avg B / Compl ====== ======= ================ ============= 0 9.990 8155.355 65536.000 ##### Totals: ##### Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s) ================ =========== ============== ================ 79.562500 10.001 1442.556 7.955 Throughput(Buffers/s) Cycles/Byte Buffers ===================== =========== ============= 127.287 308.256 1273.000 DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr) ============= ============= =============== ============== 1868.713 0.785 9336.366 0.157 Packets Sent Packets Received Retransmits Errors Avg. CPU % ============ ================ =========== ====== ========== 57833 14664 0 0 9.4768MB/s puts it up at the levels I was getting with explicitly large windows in
iperf. Oddly, though, 80MB in 1273 buffers = a 64kB buffer again. A further wireshark shows a good, variable RWIN coming back from the server (Scale factor 256) that the client seems to fulfil; so perhaps ntttcp is misreporting the send window.Further PCAP files have been provided, here:https://www.dropbox.com/s/dtlvy1vi46x75it/iperf%2Bntttcp%2Bftp-pcaps-2014-07-03.zip
-
Two more
iperfs, both from Windows to the same Linux server as before (1.2.3.4): One with a 128k Socket size and default 64k window (restricts to ~5Mbit/s again) and one with a 1MB send window and default 8kb socket size. (scales higher) -
One
ntttcptrace from the same Windows client to a Server 2012R2 EC2 instance (1.2.3.5). here, the throughput scales well. Note: NTttcp does something odd on port 6001 before it opens the test connection. Not sure what's happening there. -
One FTP data trace, uploading 20MB of
/dev/urandomto a near identical linux host (1.2.3.6) using Cygwinncftp. Again the limit is there. The pattern is much the same using Windows Filezilla.
Changing the
iperfbuffer length does make the expected difference to the time sequence graph (much more vertical sections), but the actual throughput is unchanged.So we have a final question through all of this: Where is this limitation creeping in? If we simply have user-space software not written to take advantage of Long Fat Networks, can anything be done in the OS to improve the situation?
Monday, July 7, 2014 7:35 AM -
All replies
-
Tuesday, July 8, 2014 9:17 AM
-
Hi,
From the above information, after you set up a new file server 2012R2, the slow copy issue begin to happen.
Does this issue happen on other server?
Thank you.
Best regards,
Steven Song
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Thursday, July 10, 2014 1:52 PM -
Hi Stephen, thanks for your reply.
The original tests were between Windows 7+8 clients and clients and a Debian Linux server. I launched a new 2012 R2 server to test against just to rule out the Linux side of things, but it really looks like the local client buffer isn't scaling properly, regardless of the destination server.
Terry.
Thursday, July 10, 2014 2:08 PM -
Hi Steve,
Have you had any more thoughts about this? It's still an issue for us and I'd love to get to the bottom of it.
Terry.
Thursday, July 24, 2014 6:20 PM -
Hi Terry,
This issue is complicated and I am consulting with our group, since there is not update so far, I'd like to suggest that you submit a service request to MS Professional tech support service so that a dedicated Support Professional can further assist with this request.
Thank you.
Best regards,
Steven Song
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Friday, July 25, 2014 1:00 PM -
Thanks, Steven,
I've opened a ticket and will follow up if anything useful comes out of it.
Terry.
Friday, July 25, 2014 1:52 PM -
Hi Terry,
Would you please let me know if there is any update from MS professional support. I also want to know the reason of this issue.
Thank you.
Best regards,
Steven Song
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Wednesday, August 6, 2014 8:50 AM