none
Users on both RDS Session Host are getting disconnected randomly

    Domanda

  • Hi,

    We've been facing an issue in our RDS environment for the last days. At random times of the day, RDS Session Host are disconnecting all their own users at the same time. However, both Session Host servers are not disconnecting the users together at the same time. One Session Host could disconnect users at 11H00AM while the other users on the other Session Host are staying connected properly and will be disconnected later.

    We currently have 2 Session Hosts servers: TERM1 and TERM2. Both servers are Windows 2008 R2 SP1 French and are located in a vSphere 4.0 infrastructure with VMXNET3 network cards. There is also a Nexus 1000V virtual switch. We've been using TERM1 for over a year now, without any issues in this environment. TERM2 was built 2 days ago to replace TERM1 in an attempt to stop impacting the users.

    The issue started last Friday morning, after we restarted TERM1. We restarted TERM1 because we added resources to it (+1 vCPU and +12GB of RAM). After this restart, users started to being kicked out randomly. They could be connected for 15minutes, 2 minutes or even 2 hours before being disconnected. They are in fact being disconnected and not logged off. When they connect back again few seconds later, their session is still on the Session Host and is restored. The settings on the Session Host are set so that the disconnected sessions will be log off after 60 minutes.

    We built TERM2 urgently to move everyone to TERM2 until we fix TERM1, but TERM2 has the same issue. Even if TERM2 is a brand new Session Host started from a bare OS. One thing that I found weird on TERM1 right after the restart, is that a scvhost.exe process would eat up one vCPU for about 5 minutes and until it was done I couldn't access the network at all. The network icon near the clock showed No Connections, just like if a cable was disconnected. But remember, this is a virtual machine.

    Looking at the security logs on TERM1 and TERM2, I can see people getting disconnected. I filtered the logs with event ID 4779 and I saw a bunch of users getting disconnected within 1-2 seconds. However, the error message in the log suggests that the user disconnected because of a network issue or the user hit the disconnect button. I can't find anything else neither in Application or System logs. Using perfmon I did analyse the network packets to find if the Session Host servers have errors in packets, but no luck in there. We have a Nexus 1000V in our vSphere 4 environment and there's nothing in the logs that would correlate with the disconnects.

    Thinking it might be a GPO issue, I used the gpresult tool and generated several gpresult files with different users. I can't find anything in the GPOs that would explain the disconnect. Reading through RDS issues in the last days, I've come accross the "Protocol error" issue in my researches, which is caused by the RDP compression level being at maximum by default. Since our Windows are in French (we're from Québec), there might be a translation error, but the errors that appear on the Thin clients are not anything related to Protocol error. The error states only that there was a connection lost. Also, TERM1 has been working for over a year with these settings and these GPOs.

    Both servers are up to date with Windows Updates. Drivers are at the latest with the latest VMware Tools we can have for our vSphere version. The computers that are connecting to are Windows XP, Windows 7 and we working properly before. It really looks like that the issue is on the Terminal Servers...but we can't put our finger on the root cause. We have found a 'pattern', if you can call it a pattern : When a user logs into the Session Host, there are chances that he and everyone on this Session Host will be kicked off. Looking at the security logs, it seems that the disconnections occurs few seconds after someone logged in. Unfortunately, it is never the same user, so we can't think about corrupted profiles. We don't use roaming profiles; the users had brand new profiles on TERM2 and the issue happens there as well.

    I'm currently running perfmon on multiple metrics to try to find something...however we're running out of ideas.

    Do you guys have any ideas ?

    Thank you,

    Guillaume.

    giovedì 22 marzo 2012 15:33

Risposte

  • Hi,


    Based on my past experience, this happens in conditions of heavy traffic to the server along with large client packets (i.e. lot of input activity on the client). As a result, the data stream gets corrupted and the RDS server disconnects the client.

    Please refer to the following steps to troubleshoot.


    STEP 1
    Make the following registry changes on the RDP client machine:
    [HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client]
    "Keep Alive Interval"=dword:00000001
    Make the following registry changes on the terminal server:
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server]
    "KeepAliveInterval"=dword:00000001
    "KeepAliveEnable"=dword:00000001


    STEP 2
    Check the group policy on the Terminal server.
    Computer Configuration -> Administrative templates -Windows Components -> Remote Desktop Services > Remote Desktop Session Host >Session Limits
    Set Time Limit For Active Idle Terminal Services Session to Never
     

    STEP 3
    Disable all SNP Features on the Server:
                • netsh int tcp set global chimney=disabled
                • netsh int tcp set global rss=disabled
                • netsh int ip set global taskoffload=disabled
                • netsh int tcp set global autotuninglevel=disabled
                • netsh int tcp set global congestionprovider=none
                • netsh int tcp set global ecncapability=disabled
                • netsh int tcp set global timestamps=disabled


    STEP 4
    Disable IPV6 on server and client and verify the issue.

    How to disable certain Internet Protocol version 6 (IPv6) components in Windows Vista, Windows 7 and Windows Server 2008
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;929852    
      

    STEP 5
    Change the Security Layer value to RDP Security Layer in TSCONFIG.MSC.

    HKLM\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp
    SecurityLayer REG_DWORD 0x1

    In addition to the above mentioned steps please install the latest Hotfix ID:981156(http://support.microsoft.com/kb/981156 ) to update Termdd.sys to version 6.1.7601.21772.


    Please also reinstall the Integration Services for your VMs to see if it’s fixed.

    Looking forward to your feedback.

     


    Technology changes life……

    lunedì 26 marzo 2012 07:21

Tutte le risposte

  • Hi,


    Based on my past experience, this happens in conditions of heavy traffic to the server along with large client packets (i.e. lot of input activity on the client). As a result, the data stream gets corrupted and the RDS server disconnects the client.

    Please refer to the following steps to troubleshoot.


    STEP 1
    Make the following registry changes on the RDP client machine:
    [HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client]
    "Keep Alive Interval"=dword:00000001
    Make the following registry changes on the terminal server:
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server]
    "KeepAliveInterval"=dword:00000001
    "KeepAliveEnable"=dword:00000001


    STEP 2
    Check the group policy on the Terminal server.
    Computer Configuration -> Administrative templates -Windows Components -> Remote Desktop Services > Remote Desktop Session Host >Session Limits
    Set Time Limit For Active Idle Terminal Services Session to Never
     

    STEP 3
    Disable all SNP Features on the Server:
                • netsh int tcp set global chimney=disabled
                • netsh int tcp set global rss=disabled
                • netsh int ip set global taskoffload=disabled
                • netsh int tcp set global autotuninglevel=disabled
                • netsh int tcp set global congestionprovider=none
                • netsh int tcp set global ecncapability=disabled
                • netsh int tcp set global timestamps=disabled


    STEP 4
    Disable IPV6 on server and client and verify the issue.

    How to disable certain Internet Protocol version 6 (IPv6) components in Windows Vista, Windows 7 and Windows Server 2008
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;929852    
      

    STEP 5
    Change the Security Layer value to RDP Security Layer in TSCONFIG.MSC.

    HKLM\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp
    SecurityLayer REG_DWORD 0x1

    In addition to the above mentioned steps please install the latest Hotfix ID:981156(http://support.microsoft.com/kb/981156 ) to update Termdd.sys to version 6.1.7601.21772.


    Please also reinstall the Integration Services for your VMs to see if it’s fixed.

    Looking forward to your feedback.

     


    Technology changes life……

    lunedì 26 marzo 2012 07:21
  • Did you get your issue resolved?  We are experiencing the same exact thing.  We are really getting this from our external users the most.  2nd is remote app users.  If we go full desktop we have not seen this issue yet, no disconnects.

    We will be trying these steps and reporting back.

    5 Windows server 2008 R2 RDS session hosts

    VMWare 5.x virtual servers

    Clients are XP and Win7 and Win8

    martedì 5 novembre 2013 13:34
  • Are there integration services for VMWare?
    martedì 5 novembre 2013 14:57
  • I have done all you asked from step1 through step5.  I will monitor after restarting session servers tonight and let you know what we observe.
    martedì 5 novembre 2013 14:58