none
high Write latencies in datafiles for some large databases.

    Question

  • Needed your guidance to fix one of the performance issue where the writes are slow.

    Some of the information:
    SQL Server 2008R2.
    OLTP environment.
    32GB RAM.
    16 Disks Raid 10.
    Write latency on some of the db files is over 700 ms (virtual file stats dmv).
    Over all DISK (not just file) latency in peak hours (perfmon)  goes frequently to 200 ms.
    Reads are pretty quick (below 30ms) in comparison to writes.
    Writes in the log file are pretty quick (below 10ms) in comparison to writes to the data files.

    I don’t see a huge buffer pool utilization (below 20 most of the times even during the peak hours) so I feel there is not a lot of activity which the server processes.

    This is a pure OLTP environment which has 60% select and 40% DML (inserts, updates, deletes).

    When there are more DMLs particularly the inserts as the new orders are accepted by the application, the database shows a lot of slowness and user complain that they are not able to generate the order sometime.
    I feel the application has a design issue causing page splits as I found the same using fn_dblog and the perfmon counter page splits/sec.

    I am then planning to move the fill factor all the way to 70% instead of 90% currently we have as the server default.
    I know will impact the selects but I feel selects are not causing much of the issues.
    We are also planning to add and create separate RAID disk groups for log, tempdb, data files and also move the largest (busiest) database to its own filegroup.

    Apart from page splits and creating separate disk groups, do you guys see any other thing that I may consider for fixing this high writes latency issue.

    Wednesday, March 12, 2014 5:18 AM

All replies

  • What does the below script return?

    WITH Waits AS
    (
      SELECT
        wait_type,
        wait_time_ms  /1000. AS wait_time_s, 
             100. * wait_time_ms / SUM(wait_time_ms)OVER()AS pct,
             ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
          FROM sys.dm_os_wait_stats
          WHERE wait_type NOT IN (N'CLR_SEMAPHORE',    N'LAZYWRITER_SLEEP',
            N'RESOURCE_QUEUE',   N'SQLTRACE_BUFFER_FLUSH',
            N'SLEEP_TASK',       N'SLEEP_SYSTEMTASK',
            N'WAITFOR',          N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
            N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
            N'XE_TIMER_EVENT',   N'XE_DISPATCHER_JOIN',
            N'LOGMGR_QUEUE',     N'FT_IFTS_SCHEDULER_IDLE_WAIT',
            N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT',
            N'CLR_AUTO_EVENT',   N'DISPATCHER_QUEUE_SEMAPHORE',
            N'TRACEWRITE',       N'XE_DISPATCHER_WAIT',
            N'BROKER_TO_FLUSH',  N'BROKER_EVENTHANDLER',
            N'FT_IFTSHC_MUTEX',  N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
            N'DIRTY_PAGE_POLL',  N'SP_SERVER_DIAGNOSTICS_SLEEP',
    N'ONDEMAND_TASK_QUEUE',N'BROKER_RECEIVE_WAITFOR'
    )
    )
          --filter out additional irrelevant waits
    SELECT
      W1.wait_type,
      CAST(W1.wait_time_s AS DECIMAL(12,2))AS wait_time_s,
      CAST(W1.pct AS DECIMAL(12,2))AS pct,
      CAST(SUM(W2.pct)AS DECIMAL(12,2))AS running_pct
    FROM Waits AS W1
      JOIN Waits AS W2
        ON W2.rn<= W1.rn
    GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
    HAVING SUM(W2.pct)-W1.pct < 90-- percentage threshold
    ORDER BY W1.rn;


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, March 12, 2014 5:55 AM
  • Uri, I am glad you responded. wait_type wait_time_s pct running_pct DBMIRROR_EVENTS_QUEUE 4962658.71 42.26 42.26 DBMIRRORING_CMD 4716832.07 40.17 82.43 LCK_M_IX 489246.39 4.17 86.59 PAGEIOLATCH_SH 386267.49 3.29 89.88 LCK_M_U 306206.13 2.61 92.49 Should be Mirroring related waits not be in exclusion list? Amongst the above, I can confirm that DBMIRROR_EVENTS_QUEUE, DBMIRRORING_CMD and PAGEIOLATCH_SH are frequently shows in waits. The server was restarting during the weekend (THree days back).
    Wednesday, March 12, 2014 9:57 AM
  • resending in the correct format

    Uri, I am glad you responded.

    Should be Mirroring related waits not be in exclusion list? Amongst the above, I can confirm that DBMIRROR_EVENTS_QUEUE, DBMIRRORING_CMD and PAGEIOLATCH_SH are frequently shows in waits. The server was restarting during the weekend (THree days back).

    Wednesday, March 12, 2014 10:01 AM
  • Hi,

    Guess mirroring is configured for your environemnt .I would like you to see REDO queue and SEND queue.How are they behaving is your mirror  lagging behind principal a lot.What type of mirroring is it Sync or Async.

    For mirroring best practice please refer to below links

    http://technet.microsoft.com/en-us/library/cc917681.aspx

    Things to consider when configuring mirroring

    One cannot rule out possibility of disk being slow.Or network not able to cope up with logs being generated to send.You should parallely ask your SAN team to look at disks .

    How often do you rebuild indexes and update stats.This could also be due to indexes being fragmented or missing indexes.


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers


    • Edited by Shanky_621 Wednesday, March 12, 2014 11:17 AM
    Wednesday, March 12, 2014 10:58 AM
  • Thanks Shanky,

    Let me post the latest wait stats. I cleared the historical one six hours back so these stats represent the peak hours and probably the clear picture.

    wait_type            wait_time_s    pct        running_pct
    DBMIRROR_EVENTS_QUEUE        113786.19    51.17        51.17
    DBMIRRORING_CMD            92709.71    41.70        92.87

    Missing indexes, statistics, or fragmentation must not be a problem.
    Disk is certainly slow thats why we have ordered a new disk array.

    I dont have much experience in Mirroring so can you tell me some quick setting that I may change to make it faster.

    Wednesday, March 12, 2014 2:09 PM
  • Thanks Shanky,

    I dont have much experience in Mirroring so can you tell me some quick setting that I may change to make it faster.

    See main factors are CPU RAM and Network bandwith connecting Principal to Mirror.Principal and mirror should be as close in configuration as possible.

    You should know how much transaction your database is generating in a min or within half hour.So that you can decide how much bandwith you require

    Make sure you have proper value for MAX SERVER MEMORY for SQL server instance .

    Check redo logs and send queue to see how much transaction is pending on Mirror and how much is log generated repectively.

    If you are OK with some data loss go for Async mirroring (this also requires Enterprise edition)

    Know peak time of your transactions make sure no other resource puts load on network during this time.

    Instead of listening to me I would advise you to have a look at below article.It has all details

    http://support.microsoft.com/kb/2001270/en-gb 


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Wednesday, March 12, 2014 2:45 PM
  • I feel the baisc hygene issues in my environment are pretty much in place.

    Mirroring is ASYNC here.

    I digged further today and found that: 
    SELECT statement with update lock is causing the issue.

    There had been other issues with this environment too but gradually we fixed them all.
    This update lock looks to the main culprit which is shown in the wait stats (LCK_M_U) too.

    business says that they need to put this update lock on the queries in order to maintain consistency.
    I am too analyzing from my side.  Will post you guys with more observations.

    Wednesday, March 12, 2014 4:32 PM