none
Softgrid on TS/Citrix Windows 2003 R2 Server SP2 - Unexpected error 03-00001002

    Question

  • Hi Guys

     

    Below is the current config of out Citrix environment:

     

    HP DL360 Server G4 with 4Gb RAM and Dual 3.0 Ghz Intel Xeon

    Windows Server 2003 R2 SP2 (fully patched without IE7)

    Citrix Presentation Server 4.5 Hotfix rollup 1

    Office 2003 SP3

    Softgrid for TS 4.1.1.310

    McAfee VirusScan Enterprise 8.5i Patch 3

     

    All Citrix servers are rebooted daily.

     

    At what appears to be random times the softgrid client fails, and effects all users on the box. The user who first experiences the problem gets the folowing dialogue box:

     

    "The SoftGrid Client could not be started. An unexpected error occured. Please report the following error code to your System Administrator. Error Code: 411136-04F09003-00001002"

     

    followed by:

     

    "The System is too busy to complete the request. if the problem persists, please report it to your System Administrator"

     

    Whereas for everyone else the softgrid applications just stop working.

     

    The only way to fix it is to reboot the box. According to "Services" the softgrid services are running, If I attempt to restart them they hang in the "starting" state. All other applications appear uneffected, however it takes about 15 minutes for a user to log off safely from their session.

     

    This happens almost daily and can occur with 5 people or 35 people on the box.

     

    I can't find any circumstances that cause it, or a guaranteed way to recreate it, it can happen at any point during the day.

     

    I have checked out http://support.microsoft.com/kb/930625 but this referrs to version 3.x and those registry keys no longer exist.

     

    Below is an extract from C:\Program Files\Softricity\SoftGrid for Terminal Servers\sftlog.txt which appears when the problem occurs

     

    [10/16/2007 10:17:07.152 SWAP WRN] {hap=26:tid=52C0}
    Launch thread would not exit, terminating it

     

    [10/16/2007 10:17:09.465 USER WRN] {tid=52C0}
    Terminating thread Handle=0x00000750 Id=0x00003364 RefCnt=0x00000002 FromId=0x000052c0 Context=created in C:\Documents and Settings\All Users\Documents\SoftGrid Client\Dumps\sftlist-2932-21184-20071016-101707.dmp

     

    An analysis of the dump shows:

     

    User Mini Dump File with Full Memory: Only application data is available

    Symbol search path is: srv*C:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: c:\windows\i386
    Windows Server 2003 Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer
    Debug session time: Wed Oct 10 15:35:03.000 2007 (GMT+1)
    System Uptime: 0 days 11:02:15.294
    Process Uptime: 0 days 11:00:13.000
    ....................................................................
    eax=00151590 ebx=00000000 ecx=0000005a edx=0012fcf0 esi=00000000 edi=00000028
    eip=7c8285ec esp=0012fb9c ebp=0012fc04 iopl=0         nv up ei pl zr na pe nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
    ntdll!KiFastSystemCallRet:
    7c8285ec c3              ret
    0:000>
    0:000> !analyze -show
    Error code: 0x0 - The operation completed successfully.
    0:000> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Exception Analysis                                   *
    *                                                                             *
    *******************************************************************************

    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for sftlist.exe -
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for sftcore.dll -
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for sftfsi.dll -
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for sftsync.dll -

    FAULTING_IP:
    +0
    00000000 ??              ???

    EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
    ExceptionAddress: 00000000
       ExceptionCode: 80000003 (Break instruction exception)
      ExceptionFlags: 00000000
    NumberParameters: 0

    FAULTING_THREAD:  00000b8c

    DEFAULT_BUCKET_ID:  STATUS_BREAKPOINT

    PROCESS_NAME:  sftlist.exe

    ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

    NTGLOBALFLAG:  0

    APPLICATION_VERIFIER_FLAGS:  0

    LAST_CONTROL_TRANSFER:  from 7c82776b to 7c8285ec

    STACK_TEXT: 
    0012fb98 7c82776b 77e418b2 00000028 00000000 ntdll!KiFastSystemCallRet
    0012fb9c 77e418b2 00000028 00000000 00000000 ntdll!NtReadFile+0xc
    0012fc04 77f65edb 00000028 0012fcc8 0000021a kernel32!ReadFile+0x16c
    0012fc30 77f65f82 00000028 0012fcc8 0000021a advapi32!ScGetPipeInput+0x2a
    0012fca4 77fb75af 00000028 0012fcc8 0000021a advapi32!ScDispatcherLoop+0x51
    0012fee8 00438484 0047bebc 77e6474a 00142577 advapi32!StartServiceCtrlDispatcherA+0x93
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0012ff18 0045315a 00400000 00000000 00142577 sftlist!SWClientSyncInterface:Surpriseperator=+0x36d44
    0012ffc0 77e6f23b 00000000 00000000 7ffdf000 sftlist!SWClientSyncInterface:Surpriseperator=+0x51a1a
    0012fff0 00000000 00452fd5 00000000 78746341 kernel32!BaseProcessStart+0x23


    STACK_COMMAND:  ~0s; .ecxr ; kb

    FOLLOWUP_IP:
    sftlist!SWClientSyncInterface:Surpriseperator=+36d44
    00438484 85c0            test    eax,eax

    SYMBOL_STACK_INDEX:  6

    SYMBOL_NAME:  sftlist!SWClientSyncInterface:Surpriseperator=+36d44

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: sftlist

    IMAGE_NAME:  sftlist.exe

    DEBUG_FLR_IMAGE_TIMESTAMP:  46ad89ff

    PRIMARY_PROBLEM_CLASS:  STATUS_BREAKPOINT

    BUGCHECK_STR:  APPLICATION_FAULT_STATUS_BREAKPOINT

    FAILURE_BUCKET_ID:  APPLICATION_FAULT_STATUS_BREAKPOINT_sftlist!SWClientSyncInterface:Surpriseperator=+36d44

    BUCKET_ID:  APPLICATION_FAULT_STATUS_BREAKPOINT_sftlist!SWClientSyncInterface:Surpriseperator=+36d44

    Followup: MachineOwner
    ---------

     

    Unfortunately the above analysis doesn't mean much to me, I'm hoping that someone out there can shed some light on what is going on and ultimately what is causing the problem.

     

    Any help. suggestions, advice, or anything remotely useful would be greatly received.

     

    Thanks

    Paul

    Tuesday, October 16, 2007 10:15 AM

Answers

  • Hi Guys

     

    I have the answer!

     

    The good news is that this has nothing to do with softgrid!!

     

    We have been running with this configuration for a week now and not seen the issue, although the "fix" I believe is more of a delay than a proper fix - by that I mean the system is able to run longer before the problem would inevitably happen - As we reboot every night we no longer see it.

     

    The following changes/additions to the registry are what have solved the problem:

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\MemoryManagement\SessionPoolSize to 64

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\MemoryManagement\SessionViewSize to 48

     

    See http://support.microsoft.com/kb/840342 for info on these regisrty keys, additionally check out this thread http://support.citrix.com/forums/thread.jspa?forumID=137&threadID=92919&start=0&tstart=0 which is what helped.

     

    For reference we are now running:

     

    32 bit Windows 2003 Server R2 SP2

    All patches (as of 06 December 2007)

    IE7

    Office 2003 SP3

    VirusScan Enterprise 8.5i Patch 3 + AntiSpyware Module

    Citrix Presentation Server 4.5 with:

    Hotfix Rollup Pack PSE450W2K3R01 http://support.citrix.com/article/CTX112618

    Hotfix PSE450R01W2K3035 http://support.citrix.com/article/CTX115275

    Hotfix PSE450R01W2K3003 http://support.citrix.com/article/CTX114104 

     

    Hope this helps anyone in the same situation!

     

    Cheers

    Paul

    Thursday, December 06, 2007 10:57 PM

All replies

  • As a bit more info, I'm currently working on the assumption that it is caused by the most recent user who logs on - they may not be the person who notices that softgrid has died but are (for some reason) ultimately the cause.

     

    I'm also trying a server without Antivirus to see if this is causing the problem.

     

    Does anyone have any ideas of what I can log/monitor/check for to find out what is happening?

     

    Cheers

    Paul

    Tuesday, October 16, 2007 10:28 AM
  • VirusScan 8.5i appears to cause the problem.

     

    The server without VirusScan has performed flawlessly for the last 2 days.

     

    Does anyone have any ideas what I need to tell McAfee when I open a support request?

     

    In the mean time I'm going to try reverting to VirusScan 8.0 and see if the problem still occurs. Will keep you posted.

     

    Cheers

    Paul

     

    Thursday, October 18, 2007 2:05 PM
  • A suggestion in the meantime: try to exclude SoftGrid client's data directory (C:\Documents and Settings\All Users\Documents\SoftGrid Client) from AV and see if that helps? Alternatively, exclude cache files in that directory (sftfs.*).

     

    /Kalle

    Friday, October 19, 2007 5:57 AM
    Moderator
  • Good idea!

     

    I've already tried excluding these process from being scanned

     

    "C:\Program Files\Softricity\SoftGrid for Terminal Servers\sftvsa.exe"

    "C:\Program Files\Softricity\SoftGrid for Terminal Servers\sftlist.exe"

     

    I will now add C:\Documents and Settings\All Users\Documents\SoftGrid Client\*.* to the exclusion list and see if this solves the problem

     

    Thanks

    Paul

    Friday, October 19, 2007 11:12 AM
  • Did this amendment to the Anti-Virus resolve the issue?

     

    I'm also getting this error occuring and a reboot normal clears it for a few days.

     

    Charles

    Monday, October 22, 2007 9:14 AM
  • Today is the first day of testing. Its only been in use for about 2 hours so far!

     

    I will post back in a day or so and let you know how its going.

     

    Fingers crossed

     

    Cheers

    Paul

    Monday, October 22, 2007 9:23 AM
  • Its looking good - usually by this time of day the problem has occured.

     

    Tomorrow I'll disable logons on a few of the Citrix boxes and get some user load my test box to see if it holds up!

    Monday, October 22, 2007 3:07 PM
  •  Its not fixed... its just happened again

     

    Anyone have any other suggestions?

    Tuesday, October 23, 2007 3:14 PM
  •  I should have thought of this earlier....

     

    I will also exclude the Q:\ drive from being scanned to see if that makes any difference

    Thursday, October 25, 2007 8:40 AM
  • Right guys I'm barking up the wrong tree completely, exactly the same problem has occured on a server without any anti virus installed.

     

    I'm at a loss now, anyone from Microsoft got any thoughts?

     

    We're not using Softgrid to provide many applications yet as we're still in the early stages, and this problem is really slowing down any progress.

     

    From looking at the sftlog.txt logs for yesterday I can see that the only Softgrid application used on that server was:

    • Adobe Reader 8
    • MapInfo Professional 8.5

    Adobe Reader 8 was run 118 times and MapInfo Professional 8.5 was run once.

     

    It can't be MapInfo causing the problem as it hasn't been run previously when this problem has occured. Yesterday it was run at 09:50 and the problem occured arround about 15:30 in the afternoon.

     

    An analysis of the dump shows exactly the same as the one in my original post - Can anyone decipher what that dump is showing?

     

    I'm really beginning to tear my hair out now!

     

    Cheers

    Paul

    Friday, October 26, 2007 8:08 AM
  • If the Machine Debug Manager service is present in the Adobe 8 package, disable it in the package.

    If it is not clear what package was opened, turn the logging on the TS client to verbose and look for
    Service 'MDM' failed to start. (rc 1AA0242F-41D)
    --this is a reference to the Machine Debug Manager. Look for the package that contains the Machine Debug Manager service and disable the service in that package.

    Friday, November 02, 2007 9:47 PM
  • Hi Jack

     

    Thanks, I will turn the logging to verbose to see if I can get more information from the logs and investigate the MDM service suggestion.

     

    I assume to turn logging to verbose its the same for V4 as it is for V3? http://support.microsoft.com/kb/931585

     

    Have you had trouble with Adobe Reader in a Softgrid environment?

     

    I'm currently investigating if this problem is related to something else that I've observerd on our Citrix servers - I've posted it over at http://www.brianmadden.com/Forum/Topic/92473 as its not Softgrid related, but I suspect it could be the hidden cause I'm searching for.

     

    Cheers

    Paul

    Monday, November 05, 2007 2:13 PM
  • I agree with Jack - this is almost always caused by some package that includes the Machine Debug Manager.  If that doesn't pan out I'd recommend opening a case with Microsoft support.  They can take a look at a complete memory dump and tell you why this is failing. 

     

    J.C. Hornbeck,

    Microsoft Corporation

     

    The SoftGrid Team blog: http://blogs.technet.com/softgrid/

    The SMS & MOM Team blog: http://blogs.technet.com/smsandmom/

     

    Tuesday, November 06, 2007 11:39 PM
  • Hi Guys

     

    I have the answer!

     

    The good news is that this has nothing to do with softgrid!!

     

    We have been running with this configuration for a week now and not seen the issue, although the "fix" I believe is more of a delay than a proper fix - by that I mean the system is able to run longer before the problem would inevitably happen - As we reboot every night we no longer see it.

     

    The following changes/additions to the registry are what have solved the problem:

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\MemoryManagement\SessionPoolSize to 64

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\MemoryManagement\SessionViewSize to 48

     

    See http://support.microsoft.com/kb/840342 for info on these regisrty keys, additionally check out this thread http://support.citrix.com/forums/thread.jspa?forumID=137&threadID=92919&start=0&tstart=0 which is what helped.

     

    For reference we are now running:

     

    32 bit Windows 2003 Server R2 SP2

    All patches (as of 06 December 2007)

    IE7

    Office 2003 SP3

    VirusScan Enterprise 8.5i Patch 3 + AntiSpyware Module

    Citrix Presentation Server 4.5 with:

    Hotfix Rollup Pack PSE450W2K3R01 http://support.citrix.com/article/CTX112618

    Hotfix PSE450R01W2K3035 http://support.citrix.com/article/CTX115275

    Hotfix PSE450R01W2K3003 http://support.citrix.com/article/CTX114104 

     

    Hope this helps anyone in the same situation!

     

    Cheers

    Paul

    Thursday, December 06, 2007 10:57 PM
  • Hi Paul

     

    Did the registry key changes fix the issue?

     

     

    Thanks

     

     

    Stuart

    Friday, January 11, 2008 9:25 AM
  • Hi Stuart

     

    Things have been working perfectly for about a month now

     

    Are you experiencing the same problem(s)?

     

    If this is of any use one thing I've noticed that seems to cause problems is multiple people (IT Staff) using the Citrix Access Management Console - I've restricted access now and definately seen an improvement.

     

    Cheers

    Paul

    Friday, January 11, 2008 9:42 AM
  • Hi Paul

     

    Thanks very much for the quick reply.

     

    We are experiencing very similar problems to you. Users are reporting the 2 error messages in your starting post. In most cases the servers are not under heavy load.

     

    When I login to the server, I attempt to re-start the softgrid services and they hang on start-up. Sometimes the server will even hang, which requires a re-boot from the button.

     

    Did you notice any adverse affects from implementing the reg keys?

     

    Just out of curiosity, did you upgrade the softgrid client?

     

    Stuart
    Friday, January 11, 2008 10:26 AM
  • We have experienced no adverse affects with these registry keys - we are publishing full desktops as apposed to applications and have upto 20 sessions per server without problems.

     

    We are running Softgrid 4.1.1.310 - there is a new patch out SoftGrid 4.1 SP1 Hotfix Rollup Package 1 (4.1.2.21) but we haven't upgraded to that yet.

     

    When you next experience the problem - log onto the console and open task manager

     

    1. Do you find this "sticky" ?

    2. Have a look in the processes tab, add the username column and see if there are processes assigned to a username that is not logged onto the box (not listed in the task manager "users" tab)

    3. Are you using Citrix or just TS?


    Cheers

    Paul

     

    Friday, January 11, 2008 12:43 PM
  • Hi Paul

     

    We are running Citrix PS4 W2k3 R2.

     

    One of the Softgrid clients stopped launching applications today. The citrix server was still accepting connections.

     

    I tried to re-start the Softgrid services but got the following error –

    "Error 1114: A dynamic link library (DLL) initialisation routine failed to load"

     

    I checked inside the task manager. Did you ever see this error message?

     

    Stuart

     

    Monday, January 14, 2008 2:03 PM
  • Hi Stuart

     

    Yes I've seen that error and I've never managed to restart the service - always a reboot to clear the problem.

     

    I've also notice that when this issue occurs - all other application (none softgrid ones) work fine, new logons to the server work fine too - It's just when the user tries to log off it takes about 15 to 20 minutes for the session to log off.

     

    Also we've found that the server suffering from the problem fails to respond to the Citrix Management Console, and when logged in locally opening Terminal Services Manager is also unresponsive.

     

    I'm pretty certain that its not a Softgrid issue, it just suffers badly when this problem arrises. I say this as during my testing I was "lucky" enough to have the same problem on a Citrix box without Softgrid.

     

    Its handy to know your using Citrix 4.0 - we have considered rolling back from 4.5 to see if the problem goes away.

     

    From browsing the web, reading forums etc I'm getting the feeling its due to a conflict between wdica.sys and win32k.sys - I've nothing to backup this up though.

     

    Have a read of http://www.brianmadden.com/Forum/Topic/92473 this explains what I'm referring to about Task Manager.

     

    Cheers

    Paul

    Monday, January 14, 2008 7:53 PM