giovedì 22 marzo 2012 15:33
We've been facing an issue in our RDS environment for the last days. At random times of the day, RDS Session Host are disconnecting all their own users at the same time. However, both Session Host servers are not disconnecting the users together at the same time. One Session Host could disconnect users at 11H00AM while the other users on the other Session Host are staying connected properly and will be disconnected later.
We currently have 2 Session Hosts servers: TERM1 and TERM2. Both servers are Windows 2008 R2 SP1 French and are located in a vSphere 4.0 infrastructure with VMXNET3 network cards. There is also a Nexus 1000V virtual switch. We've been using TERM1 for over a year now, without any issues in this environment. TERM2 was built 2 days ago to replace TERM1 in an attempt to stop impacting the users.
The issue started last Friday morning, after we restarted TERM1. We restarted TERM1 because we added resources to it (+1 vCPU and +12GB of RAM). After this restart, users started to being kicked out randomly. They could be connected for 15minutes, 2 minutes or even 2 hours before being disconnected. They are in fact being disconnected and not logged off. When they connect back again few seconds later, their session is still on the Session Host and is restored. The settings on the Session Host are set so that the disconnected sessions will be log off after 60 minutes.
We built TERM2 urgently to move everyone to TERM2 until we fix TERM1, but TERM2 has the same issue. Even if TERM2 is a brand new Session Host started from a bare OS. One thing that I found weird on TERM1 right after the restart, is that a scvhost.exe process would eat up one vCPU for about 5 minutes and until it was done I couldn't access the network at all. The network icon near the clock showed No Connections, just like if a cable was disconnected. But remember, this is a virtual machine.
Looking at the security logs on TERM1 and TERM2, I can see people getting disconnected. I filtered the logs with event ID 4779 and I saw a bunch of users getting disconnected within 1-2 seconds. However, the error message in the log suggests that the user disconnected because of a network issue or the user hit the disconnect button. I can't find anything else neither in Application or System logs. Using perfmon I did analyse the network packets to find if the Session Host servers have errors in packets, but no luck in there. We have a Nexus 1000V in our vSphere 4 environment and there's nothing in the logs that would correlate with the disconnects.
Thinking it might be a GPO issue, I used the gpresult tool and generated several gpresult files with different users. I can't find anything in the GPOs that would explain the disconnect. Reading through RDS issues in the last days, I've come accross the "Protocol error" issue in my researches, which is caused by the RDP compression level being at maximum by default. Since our Windows are in French (we're from Québec), there might be a translation error, but the errors that appear on the Thin clients are not anything related to Protocol error. The error states only that there was a connection lost. Also, TERM1 has been working for over a year with these settings and these GPOs.
Both servers are up to date with Windows Updates. Drivers are at the latest with the latest VMware Tools we can have for our vSphere version. The computers that are connecting to are Windows XP, Windows 7 and we working properly before. It really looks like that the issue is on the Terminal Servers...but we can't put our finger on the root cause. We have found a 'pattern', if you can call it a pattern : When a user logs into the Session Host, there are chances that he and everyone on this Session Host will be kicked off. Looking at the security logs, it seems that the disconnections occurs few seconds after someone logged in. Unfortunately, it is never the same user, so we can't think about corrupted profiles. We don't use roaming profiles; the users had brand new profiles on TERM2 and the issue happens there as well.
I'm currently running perfmon on multiple metrics to try to find something...however we're running out of ideas.
Do you guys have any ideas ?
Tutte le risposte
lunedì 26 marzo 2012 07:21Moderatore
Based on my past experience, this happens in conditions of heavy traffic to the server along with large client packets (i.e. lot of input activity on the client). As a result, the data stream gets corrupted and the RDS server disconnects the client.
Please refer to the following steps to troubleshoot.
Make the following registry changes on the RDP client machine:
[HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client]
"Keep Alive Interval"=dword:00000001
Make the following registry changes on the terminal server:
Check the group policy on the Terminal server.
Computer Configuration -> Administrative templates -Windows Components -> Remote Desktop Services > Remote Desktop Session Host >Session Limits
Set Time Limit For Active Idle Terminal Services Session to Never
Disable all SNP Features on the Server:
• netsh int tcp set global chimney=disabled
• netsh int tcp set global rss=disabled
• netsh int ip set global taskoffload=disabled
• netsh int tcp set global autotuninglevel=disabled
• netsh int tcp set global congestionprovider=none
• netsh int tcp set global ecncapability=disabled
• netsh int tcp set global timestamps=disabled
Disable IPV6 on server and client and verify the issue.
How to disable certain Internet Protocol version 6 (IPv6) components in Windows Vista, Windows 7 and Windows Server 2008
Change the Security Layer value to RDP Security Layer in TSCONFIG.MSC.
SecurityLayer REG_DWORD 0x1
In addition to the above mentioned steps please install the latest Hotfix ID:981156(http://support.microsoft.com/kb/981156 ) to update Termdd.sys to version 6.1.7601.21772.
Please also reinstall the Integration Services for your VMs to see if it’s fixed.
Looking forward to your feedback.
Technology changes life……
- Contrassegnato come risposta Yuan WangMicrosoft Contingent Staff, Moderator domenica 1 aprile 2012 16:52