none
Disk related server performance? RRS feed

  • Question

  • Hi

    During certain times, users on an Exchange cluster say their Outlook is slow. Placing them to cached mode has 'solved' the problem, but I think we're just masking it.

    I ran Perfmon on the server for the whole day. It's an Exchange 2007 Server, Single Copy Cluster, connected to an HP SAN.

    I'm not much of a SAN expert, so I was just comparing the results of Perfmon with other servers we don't have issues with.

    I can see that when the users are complaining of problems, the Avg Disk sec/Read for the three affected mailbox stores gets much higher than the others, we're talking values of about 0.147. Since the scale is 1000, I assume this means 147ms? Am I correct?

    I saw this article here (http://www.r71.nl/kb/technical/185-disk-queue-length-vs-disk-latency-times-which-is-best-for-measuring-database-performance) that says really we should be looking at values less than 0.002. If so, this is really high!

    Am I correct? Are there any other tests we can run before we speak to my SAN team? I had some further questions...

    1. Is there any MS published data on what the values for Avg Disk Sec/read and Avg Disk Sec/write should be

    2. The Avg Disk sec/write is ok, it seems to be the Avg Disk sec/read that has this high value for the database. I assume thismeans that users would have a problem 'reading' Exchange data? Does this tend to be more common?

    3. What's the 'fix' for this? More disks in the SAN to deal with the IO?
    Friday, April 15, 2011 10:26 PM

Answers

  • Yes you're expected to run it during production hours to to guage the performance under normal conditions at peak user concurency. Yes your IO latency is way too high. You're at 147ms, it should be under 20ms. You need to get with your SAN guy and re-evaluate your disk layout whether you have incorrect RAID layout or whether you just don't have enough disks to support the IO. You also need to do some more data gathering to determine where the bottleneck is more than likely its your disk, but you want to rule out if you have bad hba cards or caching issues etc.

    The perfmon metrics are below.

    Monitoring Mailbox Servers
    http://technet.microsoft.com/en-us/library/bb201689(EXCHG.80).aspx

     

    LogicalDisk(*)\Avg. Disk sec/Read

    PhysicalDisk(*)\Avg. Disk sec/Read

    Shows the average time, in seconds, of a read of data from the disk.

    Bb201689.note(en-us,EXCHG.80).gifNote:
    When looking at disks using Perfmon.exe, an understanding of the underlying disk subsystem is key to determining which counters (physical disk or logical disk) to look at. Windows Clustering can use volume mount points to overcome the 26-drive limitation of the operating system, so drives may show up as numbers indicating physical disks rather than having drive letters. For more information about volume mount points, see Volume Mount Points and File Systems.

    Should be below 20 milliseconds (ms) at all times on average.

    For servers with more than 1,000 users, 20-ms disk times may not be fast enough to return responses to the client to accommodate user load. Check remote procedure call (RPC) averaged latencies to ensure these are within recommended values and adjust the disk subsystem for increased I/Os.


    James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
    Sunday, April 17, 2011 4:18 PM

All replies

  • Run the performance troubleshooter from the toolbox. If you choose the circumstances that you are seeing it will use the correct counters and then give you some feedback. That is what I would do to begin with.

    Simon.


    Simon Butler, Exchange MVP
    Blog | Exchange Resources | In the UK? Hire Me.
    Friday, April 15, 2011 11:20 PM
  • Hi Simon

    Can we run the performance troubleshooter at any time during the day? Does it cause an overhead to the server?

    Sunday, April 17, 2011 2:39 PM
  • Yes you're expected to run it during production hours to to guage the performance under normal conditions at peak user concurency. Yes your IO latency is way too high. You're at 147ms, it should be under 20ms. You need to get with your SAN guy and re-evaluate your disk layout whether you have incorrect RAID layout or whether you just don't have enough disks to support the IO. You also need to do some more data gathering to determine where the bottleneck is more than likely its your disk, but you want to rule out if you have bad hba cards or caching issues etc.

    The perfmon metrics are below.

    Monitoring Mailbox Servers
    http://technet.microsoft.com/en-us/library/bb201689(EXCHG.80).aspx

     

    LogicalDisk(*)\Avg. Disk sec/Read

    PhysicalDisk(*)\Avg. Disk sec/Read

    Shows the average time, in seconds, of a read of data from the disk.

    Bb201689.note(en-us,EXCHG.80).gifNote:
    When looking at disks using Perfmon.exe, an understanding of the underlying disk subsystem is key to determining which counters (physical disk or logical disk) to look at. Windows Clustering can use volume mount points to overcome the 26-drive limitation of the operating system, so drives may show up as numbers indicating physical disks rather than having drive letters. For more information about volume mount points, see Volume Mount Points and File Systems.

    Should be below 20 milliseconds (ms) at all times on average.

    For servers with more than 1,000 users, 20-ms disk times may not be fast enough to return responses to the client to accommodate user load. Check remote procedure call (RPC) averaged latencies to ensure these are within recommended values and adjust the disk subsystem for increased I/Os.


    James Chong MCITP | EA | EMA; MCSE | M+, S+ Security+, Project+, ITIL msexchangetips.blogspot.com
    Sunday, April 17, 2011 4:18 PM