none
Avg Disk Sec/Read high on drive which hosts sql log files

    Question

  • Avg Disk Queue Length & Avg Disk Sec/Write is less than or around 20msec (=0.02sec) in our sql drives, which is good. Per MS recommendations it needs to be around or lesser than 15-20msec. We have different LUN's for different sql files, Data, Log, Sys dbs, Tempdb all live on different luns.

    There is only one drive where Avg Disk Sec/Read is high, around 0.08 secs. Now, this drive only hosts SQL user database log files. SQL log drive should be as fast as possible, on writes. Per, my knowledge & reading, SQL only reads from the log drive when it is starting up (Page 42, MCTS Exam 70-432 SQL 2008 Self paced Kit - Mike Hotek). So, I think the high #'s seen there are because of the hourly (& half hourly on 2 databases) litespeed transaction log backups that we take.

    We have opened a case & EMC doesn't see a issue on the drive. I think there is no performance issue with sql here, sql writes should be good & reads high # caused by t-log backups should be neglected. Am I right when I say that there is no performace issue here?

    Appreciate your comments.

    Tuesday, June 29, 2010 12:27 AM

Answers

  • If the EMC grab and the perfmon data are not matching up, then it could be an issue with the difference between the IO queue size on the SAN and the OS. Windows starts counting IO duration from the time the OS has posted the IO and not when the physical drive picked it up. Also, check if you are seeing any stuck stalled IO warnings in the SQL Errorlogs. If not and no performance degradation is noticed, then it might not be a big cause for concern.


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    • Marked as answer by SQLRocker Tuesday, July 27, 2010 1:48 PM
    Wednesday, June 30, 2010 3:07 PM

All replies

  • Yes, I was using Lightspeed a couple of gigs ago, and it seemed to use a humongous amount of space for temp files now and again (we could tell when we ran out of disk space!), so that would show up on your stats somewhere.

    Josh

     

    Tuesday, June 29, 2010 1:12 AM
  • Thanks for replying Josh, However I don't think its related to my question,my question is about disk performance queuing & how t-log backups are contributing to increased read numbers on perfmon, which should be the normal behaviour, even without Litespeed , the native t-log backups will give you high Avg Disk Sec/Read. Litespeed has worked great for us btw.

    I just wanted to know if what I was thinking in my earlier post is right or am I missing something.

    Tuesday, June 29, 2010 1:31 AM
  • SQLRocker,

     

    There are many things that can cause reads in the Transaction log files and of course T-Log backup is only one of them. Diff backup [any kind of backup], checkpoint, transaction rollbacks, database mirroring, DBCC CHECKDB, crash recovery etc and there are few others also that can cause log reads.

     

    Reference:

     

    http://www.sqlskills.com/BLOGS/PAUL/post/What-can-cause-log-reads-and-other-transaction-log-questions.aspx

     

    You haven’t mentioned the RAID level you are using for the T-logs.


    http://SankarReddy.com/
    Tuesday, June 29, 2010 1:58 AM
  • Thanks for replying Josh, However I don't think its related to my question,my question is about disk performance queuing & how t-log backups are contributing to increased read numbers on perfmon, which should be the normal behaviour, even without Litespeed , the native t-log backups will give you high Avg Disk Sec/Read. Litespeed has worked great for us btw.

    I just wanted to know if what I was thinking in my earlier post is right or am I missing something.


    Well I think I'm confirming your earlier post, Litespeed is doing an extra read, and then writing its own files, which if it causes head movements will give you longer times, but overall is nothing to worry about, maybe you can redirect it to a different file for output if you have yet another LUN you can requisition.

    Josh

     

    Tuesday, June 29, 2010 4:14 AM
  • SQLRocker,

     

    There are many things that can cause reads in the Transaction log files and of course T-Log backup is only one of them. Diff backup [any kind of backup], checkpoint, transaction rollbacks, database mirroring, DBCC CHECKDB, crash recovery etc and there are few others also that can cause log reads.

     

    Reference:

     

    http://www.sqlskills.com/BLOGS/PAUL/post/What-can-cause-log-reads-and-other-transaction-log-questions.aspx

     

    You haven’t mentioned the RAID level you are using for the T-logs.


    http://SankarReddy.com/

    Yup, there are lots of things that can cause higher Read IO against logs and Paul covered them nicely in the above blog post mentioned by Sankar in response to a request I made to him by email.  Log backups are just a small part of what might actually be reading from the Log files in the database.  Triggers would be a big contributor to LOG Read operations, followed by CHECKPOINT in simple reovery, log shipping and mirroring.
    Jonathan Kehayias
    http://sqlblog.com/blogs/jonathan_kehayias/
    http://www.twitter.com/SQLSarg
    http://www.sqlclr.net/
    Please click the Mark as Answer button if a post solves your problem!
    Tuesday, June 29, 2010 6:19 AM
  • I'd also suggest looking at the Virtual File Stats for the transaction logs.  These will show how many physical read  operations occurred for each file.  This will give you a picture of how much SQL Server is using the I/O system.  Note that the stats are cumulative since server restart, so you may want to take a snapshot every 10 minutes so you can capture the before and after effects of your log backups.

     

    SELECT mf.physical_name, db_name(vfs.database_id) AS DatabaseName, vfs.*

    FROM sys.dm_io_virtual_file_stats(null, null) vfs

    JOIN sys.master_files mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id

    ORDER BY vfs.database_id, vfs.file_id

    Tuesday, June 29, 2010 7:26 AM
  • Thanks Sankar, Josh, Jim & as always Jonathan.

    I went through the article by Paul & the only thing relevant to our environment is the t-log backup. So, the t-log backups are causing the high numbers that we are seeing in perfmon (we use SCOM which gives us perfmon #’s + history, which is great). Also, after 40 mins past the hr, the backups are over & we do not see any spikes till the next hour.  

    So is this an issue at all? As I was saying before, writes need to be fast which they are, should we be concerned at all about high read #’s.  Sankar, Log drive is RAID 10, Disk is aligned, FAU is 64KB.

    Jim, I checked your query too, the num of reads & num of bytes read increased after every t-log backup.

    Tuesday, June 29, 2010 6:14 PM
  • The Virtual File Stats query will tell you how many read operations SQL Server is performing, and as you've said that the transaction logs are the only things on the disk, it's safe to assume that all read/write operations of any consequence will show up in the Virtual File Stats.  This means that the number of reads, and the Bytes Read should be much smaller than the number of writes, so any increase in the time taken to perform the read or write will reflected in the average.  If you're only seeing 2000 reads in an hour, I wouldn't be worried.

    Also consider that writes will typically be in very small blocks, as each write operation on the transaction log must be hardened immediately.  This is in contrast to a log backup, which can comfortably issue a large read - I don't believe the counters you are examining care whether the I/O operation is small or large.  Have a look at the Avg. Disk Bytes/Read vs the Avg. Disk Writes/Read to see if you can account for a difference in the Avg Disk Sec counters that would drag the average time up for Read operations.

    Finally, consider the I/O stall times from the Virtual File Stats query.  If the read stall time is very small, then SQL Server has not had to wait long to perform the read operation.  Without seeing the numbers, it's hard to say whether I feel it's a performance issue, but it sounds OK - a slight performance degradation when performing log backups, but this is to be expected.

    Tuesday, June 29, 2010 9:16 PM
  • Average Disk Sec/Read would just tell you the difference between the time when the OS received an acknowledgement of the IO completion request and the time when the OS posted that particular IO operation. If the disk driver is choosing to buffer the IOs after a certain limit and choosing to send all the acknowledgements together, then that would also cause the perfmon to repro goofy numbers. I have seen this happen on virtualized environments. When you perform backups, your IO pattern on that particular disk changes (in your case LUN on the SAN). This is because we read fairly larger blocks (as already mentioned previously in this post) when the log file contents are being dumped out into a file. If your log backup sizes are large, then the reads would be higher because the first part of the log backup involves the calculation of the last checkpoint time, open transaction, last backup time etc. All this information is obtained from the log file. So, high number of reads are definitely expected during the time when log backups are occurring. Also, the perfmon numbers would also involve the normal operations which are occurring on the database.

     


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Tuesday, June 29, 2010 10:30 PM
  • Thanks Jim & Amit for replying. Jim, these are the #'s from the query:

    700 reads/hr < 2000reads/hr
    200789 Writes in an hr, which is way higher than reads
    715442176 num of bytes read
    795824128 num of bytes written, again higher

    IO Stall Read ms 54944 = around 1 min

    How do the numbers look to you?

    Amit, this is a physical server. So, if high # of reads are expected during a t-log backup, then this is not a issue, is that correct?

    Tuesday, June 29, 2010 11:46 PM
  • Unless and until you consistently see values hovering around 30ms or very frequent spikes over 30ms, it means that your disk is able to handle the increased workload without any issues. In your case, the high number of reads would be associated with the 1 hour duration between subsequent log backups. If it's possible to change the log backup frequency to 30 minutes and stagger the log backups for the user databases and check the values, you might notice a difference. A consistent value of 800ms is definitely a cause of concern in terms of latency provided perfmon is reporting the numbers correctly. If the value goes down after the log backup, then it would mean that the extra amount of IO that is performed during the log backup operation is too much for the disk to handle. Before you start changing the backup schedule and frequency, I would suggest a SAN capture along with the Perfmon to corelate the data and find out if the SAN is also reporting a latency during that time period when perfmon shows higher disk response times.


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Wednesday, June 30, 2010 12:30 AM
  • Thanks Amit, We already are taking t-log backups every half hr on 2 of our biggest databases & the rest are staggered. The SAN capture was taken but EMC hasn't found anything wrong on the disk.
    Wednesday, June 30, 2010 1:39 PM
  • If the EMC grab and the perfmon data are not matching up, then it could be an issue with the difference between the IO queue size on the SAN and the OS. Windows starts counting IO duration from the time the OS has posted the IO and not when the physical drive picked it up. Also, check if you are seeing any stuck stalled IO warnings in the SQL Errorlogs. If not and no performance degradation is noticed, then it might not be a big cause for concern.


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    • Marked as answer by SQLRocker Tuesday, July 27, 2010 1:48 PM
    Wednesday, June 30, 2010 3:07 PM
  • Thanks again Amit, We donot see anything on the sql logs & there is no noticable performance degradation.
    Wednesday, June 30, 2010 5:19 PM