none
Server slowdown once per week between one hour window

    Question

  • I am in desperate need of some community help. I have a virtual server, Windows 2003 SP2 Ent 32 bit, that stops responding once a week (same day) in the early afternoon hours (between one hour window). For example, if you're connected via RDP, your session will freeze up and the task bar will turn black. If you're trying to RDP during this time, you cannot. I've also found if you leave this alone, sometimes it will bounce back, but that doesn't help the angry users trying to work at the time.

    This system has been around awhile. It appears to have been a Windows 2000 server upgrade, and it was a physical machine that was p2v'd. 

    The latest event (same day and time frame) when it stopped responding, we tried to reboot and it went into a blue screen reboot loop. It flashed so quickly I did not see what it said. None of the safe modes would even work. Luckily Last Known Good Config did and it came back. I've change the setting so it doesn't auto reboot on crash and set it to provide a dump for future crashes as this option was also off. 

    The server has 8 vCPU and 8GB of memory. It has two disks as well as two mapped raw luns, one very small and one at 1.3 tb. (attached luns are from same array as vm datastore) These luns were also attached when it was a physical machine. I was told this issue existed before it was virtualized too. 

    What is the best tool to use to capture the issue? I've tried perfmon, procmon, procexp, and even recently started exploring performance advisor, but I find that there's a real lack of good documentation out there on these tools with real world examples. I often find myself with data I don't know what to do with, or how to dig into it properly, or if I even captured what I wanted/needed. 

    I first need to find the main issue...memory, cpu, disk i/o, network, etc. Then I need to find what runaway process is crushing it once per week. I would ve very grateful for any advice, suggestions, or direction.

    Sel



    Wednesday, June 19, 2013 2:00 PM

Answers

  • I think try to rule out also for any memory leak in your application software installed.

    If you have a test machine, you can install one by one the same application that is running on that problematic VM.

    Of course don't install one after the other, install the first application check for a couple of days or any time you decide that you think that the application is clear. Then install the other application.

    I think you have to do this, as what you said any monitoring tool before it can get a log it freezes up.

    Good luck!


    Every second counts..make use of it.

    Friday, June 21, 2013 1:59 AM

All replies

  • Hi Sel,

    To trace the issues, there are many tools published by microsoft and third party.

    as a initial monitoring, I would suggest you to

    * enable and monitor the event logs.

    * Investigate the Error code which you get in blue dump as mentioned in - http://support.microsoft.com/kb/972110

    * Please refer the below link for few more tools to monitor server.

    http://msdn.microsoft.com/en-us/performance/cc825801.aspx

    Once you found the issues then try to apply remedy for it.

    HTH


    Thanks & Regards,
    Amit Katkar (MCITP Windows 2008)
    ------------------------------------------------------------
    This posting is provided "AS IS" with no warranties or guarantees and confers no rights.


    • Edited by Amit Katkar Wednesday, June 19, 2013 3:06 PM added correct link
    Wednesday, June 19, 2013 3:04 PM
  • Hi Amit,

    Thank you for the reply. 

    *I have been monitoring the event logs (System, Application, Security) for some time and there is never anything definitive prior to, during, or after the severe slowdown.

    *I don't have any dump files yet. This slow down problem has been going on for awhile, but this is the first time it has blue screened after a reboot, so I just enabled the option to collect a dump going forward. This part could be related, or it could be an additional problem to the weekly ceasing of the server. 

    *It appears those tools are not compatible with Windows 2003. 

    Sel



    • Edited by selymp Wednesday, June 19, 2013 7:21 PM updating
    Wednesday, June 19, 2013 6:46 PM
  • A good tool to inspect BSOD dumps is BlueScreenView. Out of interest, what does this VM serve? Does your host provide any telemetry on what happens exactly and when? Ie: High CPU spikes, network load, high disk I/O on the VM, etc...
    Thursday, June 20, 2013 4:00 AM
  • since you know the time and date, just do a simple check on the task manager.

    During that time that you expect the machine to slow down, open task manager and check the processes which one is consuming a lot of memory it could at least give you a clue that the particular process could bring down the whole server.

    Check also the performance tab, whether the CPU or the memory spikes at maximum.

    Have you tried to check the performance monitor for any clue?

    check out this link: http://www.windowsnetworking.com/articles-tutorials/windows-2003/Windows_2003_Performance_Monitor.html


    Every second counts..make use of it.

    Thursday, June 20, 2013 5:43 AM
  • The vm has some Siebel applications running on it. Sorry I cannot elaborate more. I have nothing to do with the app side of things. Sadly, the vm performance tab on our hosts in our vCenter has not displayed anything but real time charts since our outsourcer has taken over and loaded monitoring software into it. *sigh*
    Thursday, June 20, 2013 2:01 PM
  • I have tried leaving Process Explorer open during this time, but  it to stops responding before it reports anything of value. The only thing I noticed was that save.exe was consuming a lot of resources which goes to EMC Networker backups, but I ruled that out recently by removing the client from the console, killing the process, and completely uninstalling the client from the vm, and it still happened. 

    I ran a Performance Monitor log a couple weeks back for a few hours, and of course it was the one time in months it didn't completely freeze up completely, but I still thought I could wade through the logs to see what was using the most resources. Unfortunately, I can't make heads or tails of what I have here and trying to piece together how to use it from youtube videos and forum posts has been a challenge thus far. 

    I actually started with PerfMon, and that too was confusing to me, but I'll take a look at your link to see if I can make some sense of it. 

    Thanks for the replies. :)

    Sel

    Thursday, June 20, 2013 2:08 PM
  • I think try to rule out also for any memory leak in your application software installed.

    If you have a test machine, you can install one by one the same application that is running on that problematic VM.

    Of course don't install one after the other, install the first application check for a couple of days or any time you decide that you think that the application is clear. Then install the other application.

    I think you have to do this, as what you said any monitoring tool before it can get a log it freezes up.

    Good luck!


    Every second counts..make use of it.

    Friday, June 21, 2013 1:59 AM
  • Thanks for the reply. I think if I were going to go down that road (installing on a different server), I'd just create a fresh vm and ask the application owner to rebuild his environment. But since a lot of time and energy goes into these application set ups, I'm not sure how that will go over with that team. 

    Today's the day so I should get another shot at trying to capture the issue. :(

    Tuesday, June 25, 2013 1:45 PM
  • The plot thickens. So again today I was running Server Advisor and watching with ProcessExplorer while I was connected via RDP. The app guy im's me saying he can no longer map to server (any drive) nor can he manage the machine from another. Meanwhile I'm on the server perusing just fine. I try to map from my computer and another server, and no dice. Tried to manage from both, negative.

    I never saw any one process use more than 12% of the cpu (most of it was idle) and there was ample memory, so I decided to reboot it for him because he says it comes back (the ability to map) after a reboot. So I did, and that's when it decided to hang. Took a long time to shut down, so I just reset the vm after about 5 to 7 minutes at the "shutting down" screen. 

    I checked the Server Advisor report and don't see anything unusual. I started investigating on the SAN side, and I did find that the attached RDM is a metalun which is probably spread out over a number of raid groups, I still can't comprehend why it only happens a certain day and time of the week, and why you can no longer manage or map to the server during this time. Something is seriously funky with this thing. *sigh*

    And as soon as I rebooted and the server came up, I could map to a drive again, the local drives of the vm and the raw disk attached too. So strange!
    • Edited by selymp Tuesday, June 25, 2013 8:26 PM more info added
    Tuesday, June 25, 2013 8:21 PM
  • for the map drive check out this:

    http://support.microsoft.com/kb/297684?wa=wsignin1.0

    or give this command :

    net config server /autodisconnect:-1


    Every second counts..make use of it.

    Wednesday, June 26, 2013 2:19 AM
  • Thanks again for the reply. I don't think that is the issue because that problem doesn't prevent you from accessing the shared drive, does it? I thought it disconnects but will reconnect when requested. 

    I probably should have been more detailed though. When I say "map to", in this case I meant access a shared drive on a remote server via the Run command. (\\server\Z$)

    So, when this issue happens, you can no longer map to a drive or make a mew RDP connection or manage the computer remotely via compmgmt.msc. (until a reboot, then like magic it all comes back)

    P.S. My ProcMon logs for this latest event were corrupt because I had to reboot the vm without taking the os down gracefully. :(


    • Edited by selymp Wednesday, June 26, 2013 6:14 PM added more info
    Wednesday, June 26, 2013 6:14 PM
  • Haven't found much new with this in the last week, but I'll get another shot at it tomorrow. My gut tells me to monitor the memory utilization for a handful of application processes.

    I'm trying to read up on 32 bit memory address space and how much memory each process can use, but it's so confusing. On the memory side there's private bytes, working set, virtual memory size, etc. and then for one process, sometimes you see 10 or more instances (threads??), so I'm not sure if each instance is counted on it's own, or if I should be adding them up to tell if the process is leaking or using too much at a certain moment in time. 

    If anyone has a simple, straight forward way to monitor process memory usage with an easy to read output, I'd love the help. I have played with perfmon and set a bunch of counters for suspect processors, but when I test them, the output is baffling to me. I also downloaded vmmap and would need a course to figure out what I'm looking at. I have process explorer open constantly, but no logging in that so it's hard to capture real time when things start crawling. I'll try ProcMon again (last week my logs corrupted because I had to do a hard reboot)

    Are there any definite numbers/limits I should be looking for that would hint to a problem with one or more of the suspect processes? For example, the process with the largest virtual memory size right now is 1gb, but there's four other instances of the same name between 120mb and 200 mb's each. Should I be adding them, or analyzing them individually?

    As always, any comments or advice are appreciated. 


    Monday, July 01, 2013 3:50 PM
  •  I should be adding them up to tell if the process is leaking or using too much at a certain moment in time.

    Please check which application is using a lot of memory consumption.

    Analyze them individually, close application you suspect is giving problem (but make sure nobody is using) and check system performance.

    check out this link to find out more:

    What are Memory Leaks?

    http://msdn.microsoft.com/en-us/library/ms859408.aspx

    if you have a software developer with you, ask help from him/her.


    Every second counts..make use of it.

    Tuesday, July 02, 2013 10:15 AM