none
DatabaseIntegrityCehck fails intermittently.

    Question

  • Have been running the Ola Hallengren scripts on our new 2016 Enterprise server for about 6 weeks or so now.  The database Integrity script only works intermittently on some instances.  Others are fine.  I cannot find a cause for these failures.

    

    The failure on the 9<sup>th</sup> failure above for example, seems to be fine for some databases, but then just stops:


    I did run the checkdb manually against the database shown, and got the following initial message, but it appeared to complete ok:

    DBCC CHECKDB will not check SQL Server catalog or Service Broker consistency because a database snapshot could not be created or because WITH TABLOCK was specified.

    Completed though with no issues:

    This seems to relate to disk space, but my disk space have no issues:

    So I’m a bit stumped.

    Any clues, most appreciated.

    Regards

    Ian

    Monday, December 10, 2018 5:34 PM

Answers

  • On the Advanced tab for the Step definition you can direct out of the job to a file. You need to do that, so that you can get full error messages.

    I'm not sure where the disk space came into the picture, but a common error with DBCC is OS error 665 which occurs when the hidden snapshot for DBCC cannot be created or be maintained. This is often due to fragmentation in the file system.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Marked as answer by IanSE2950 Tuesday, December 11, 2018 10:42 AM
    Monday, December 10, 2018 10:39 PM
  • Your screenshot shows the partial output message and doesn't help to identify the issue. 

    Since you're using Ola's solution, you need to log to output files to see the full error message. By default, the output files are created in the SQL error log directory. 

    Once you have the error message logged in the output file, read it and see if it gives any pointers to why the job's failing. It's difficult to guess what the error could possibly be without seeing the complete error message.


    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    • Marked as answer by IanSE2950 Tuesday, December 11, 2018 10:42 AM
    Tuesday, December 11, 2018 2:51 AM

All replies

  • On the Advanced tab for the Step definition you can direct out of the job to a file. You need to do that, so that you can get full error messages.

    I'm not sure where the disk space came into the picture, but a common error with DBCC is OS error 665 which occurs when the hidden snapshot for DBCC cannot be created or be maintained. This is often due to fragmentation in the file system.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Marked as answer by IanSE2950 Tuesday, December 11, 2018 10:42 AM
    Monday, December 10, 2018 10:39 PM
  • Your screenshot shows the partial output message and doesn't help to identify the issue. 

    Since you're using Ola's solution, you need to log to output files to see the full error message. By default, the output files are created in the SQL error log directory. 

    Once you have the error message logged in the output file, read it and see if it gives any pointers to why the job's failing. It's difficult to guess what the error could possibly be without seeing the complete error message.


    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    • Marked as answer by IanSE2950 Tuesday, December 11, 2018 10:42 AM
    Tuesday, December 11, 2018 2:51 AM
  • Just run Ola's DBCC command in the query window and see what is going on

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, December 11, 2018 5:54 AM
    Answerer
  • Hi everyone

    Thanks for the replies.

    I ran the output to a file as requested.  Actually ran this on different instance experiencing the same issues, all databases except one succeeded.  The failure message was as follows, not much help over what has been shown above:

    Date and time: 2018-12-11 10:09:19
    Command: DBCC CHECKDB ([xxxxxxx]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
    Msg 5030, Level 16, State 12, Server XXXXX\XXXX, Line 1
    The database could not be exclusively locked to perform the operation.
    Msg 7926, Level 16, State 1, Server XXXXX\XXXX, Line 1
    Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.
    Outcome: Failed
    Duration: 00:00:20
    Date and time: 2018-12-11 10:09:39


    Tuesday, December 11, 2018 10:18 AM
  • Hi everyone

    When I reread message above, it gave me the extra clue to check the simple things.  I was overthinking and didn't check the obvious.  Turns out that some users were leaving persistent connections, and quite simply it was just the reason above ie Msg 5030!!!

    Oh dear....now just got to figure out how to get users to ensure they're logged out.  However, how would these consistency checks work where you may have a database that is accessed all the time, eg backend for websites?

    However, thanks for idea to redirect output, I changed the filename to make easier to locate, under the advanced options.....all working ok now :-)

    Regards

    Ian

    Tuesday, December 11, 2018 10:42 AM
  • DBCC CHECKDB *is* online, using an (internal) database snapshot.

    I would check how you configured that job. It seems like you (Ola's procedure) are executing DBCC CHECKDB with the TABLOCK option. Check the scheduled job, the job step and post back what it look like. We will take a look if you are specifying TabLock = 'Y'.

    If not, then it seems like Ola's DatabaseIntegrityCheck procedure captures the error fro a "regular" CHECKDB and falls back to using the TABLOCK option. That would mean that you have a problem, for instance with a highly fragmented file system so that the database snapshot cannot be produced.


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Tuesday, December 11, 2018 1:16 PM
    Moderator

  • Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.

    This error is expected if the database snapshot cannot be created. When this happens, DBCC CHECKDB tries to acquire locks to get a transactionally consistent image of the database (offline checking). It tries to acquire the database-level exclusive lock to perform the consistency checks. It doesn't wait indefinitely to acquire locks, and times out if the exclusive lock cannot be obtained in a timely manner. 

    You'd have to investigate why CHECKDB is unable to generate a DB snapshot. Two reasons that come to mind are: your DB is on a non-NTFS file system or, you specified TABLOCK option in the DBCC CHECKDB command. 


    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    Tuesday, December 11, 2018 1:40 PM
  • This is the script (no Tablock I can see) - nevertheless when I "kicked" user off who was connected to offending database, it worked.  Is the default to try and lock the database? :

    /****** Object:  Job [DatabaseIntegrityCheck - USER_DATABASES]    Script Date: 11/12/2018 14:58:30 ******/
    BEGIN TRANSACTION
    DECLARE @ReturnCode INT
    SELECT @ReturnCode = 0
    /****** Object:  JobCategory [Database Maintenance]    Script Date: 11/12/2018 14:58:30 ******/
    IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'Database Maintenance' AND category_class=1)
    BEGIN
    EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'Database Maintenance'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    DECLARE @jobId BINARY(16)
    EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'DatabaseIntegrityCheck - USER_DATABASES', 
    @enabled=1, 
    @notify_level_eventlog=2, 
    @notify_level_email=3, 
    @notify_level_netsend=0, 
    @notify_level_page=0, 
    @delete_level=0, 
    @description=N'Source: https://ola.hallengren.com', 
    @category_name=N'Database Maintenance', 
    @owner_login_name=N'sa', 
    @notify_email_operator_name=N'IT Admins', @job_id = @jobId OUTPUT
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    /****** Object:  Step [DatabaseIntegrityCheck - USER_DATABASES]    Script Date: 11/12/2018 14:58:30 ******/
    EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'DatabaseIntegrityCheck - USER_DATABASES', 
    @step_id=1, 
    @cmdexec_success_code=0, 
    @on_success_action=1, 
    @on_success_step_id=0, 
    @on_fail_action=2, 
    @on_fail_step_id=0, 
    @retry_attempts=0, 
    @retry_interval=0, 
    @os_run_priority=0, @subsystem=N'CmdExec', 
    @command=N'sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q "EXECUTE [dbo].[DatabaseIntegrityCheck] @Databases = ''USER_DATABASES'', @LogToTable = ''Y''" -b', 
    @output_file_name=N'$(ESCAPE_SQUOTE(SQLLOGDIR))\DatabaseIntegrityCheck_$(ESCAPE_SQUOTE(JOBID))_$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM)).txt', 
    @flags=0
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'User Databases - Integrity Check - Schedule', 
    @enabled=1, 
    @freq_type=8, 
    @freq_interval=64, 
    @freq_subday_type=1, 
    @freq_subday_interval=0, 
    @freq_relative_interval=0, 
    @freq_recurrence_factor=1, 
    @active_start_date=20180615, 
    @active_end_date=99991231, 
    @active_start_time=230000, 
    @active_end_time=235959, 
    @schedule_uid=N'ef1aef58-1b57-4398-b933-31a61e4ba626'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    COMMIT TRANSACTION
    GOTO EndSave
    QuitWithRollback:
        IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
    EndSave:
    GO

    Tuesday, December 11, 2018 3:02 PM
  • This is the script (no Tablock I can see) - nevertheless when I "kicked" user off who was connected to offending database, it worked.  Is the default to try and lock the database? :

    No, the default is to use an internal database snapshot. Only when a snapshot cannot be created is when the CHECKDB resorts to locking the DB to get a consistent image. In your case, locate the offending database and see why a snapshot cannot be created on it. Check the volume where the database files are stored and ensure it's NTFS formatted. 

    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    Tuesday, December 11, 2018 4:48 PM
  • All the databases reside on the same SAN volume.

    But this still doesn't explain why if I kick users off like I did the other day, it worked fine, given what you are saying?

    Regards

    Ian

    Wednesday, December 12, 2018 9:08 AM
  • Hi everyone

    Thanks for the replies.

    I ran the output to a file as requested.  Actually ran this on different instance experiencing the same issues, all databases except one succeeded.  The failure message was as follows, not much help over what has been shown above:

    Date and time: 2018-12-11 10:09:19
    Command: DBCC CHECKDB ([xxxxxxx]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
    Msg 5030, Level 16, State 12, Server XXXXX\XXXX, Line 1
    The database could not be exclusively locked to perform the operation.
    Msg 7926, Level 16, State 1, Server XXXXX\XXXX, Line 1
    Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.
    Outcome: Failed
    Duration: 00:00:20
    Date and time: 2018-12-11 10:09:39


    As per error log it shows the DBCC Command Required Exclusive Lock on Database Level to perform snapshot before starting.

    It makes sense that something is blocking the DBCC command while taking the exclusive lock.

    I am aware you had your issue resolve by kicking that user but out of curiosity

    Can you check the isolation level of the user connection which is causing the problem and also for any open transactions the user connection has or any procedure running or maybe any transaction rollbacks etc and post the details ?

    ----------------------------

    Please Mark This As Answer if it solved your issue 
    Please Vote This As Helpful if it helps to solve your issue 
    Yuvraj Singh Bais 
    ----------------------------


    Wednesday, December 12, 2018 9:41 AM
  • Probably a little inaccurate to imply I "kicked" the users off.  What I did do, was to ask the user to gracefully disconnect themselves from the database.

    This issue occurs on any databases that have open connections when the integrity check is run by the looks of it.

    The original schedule was to do this at 10am on a Saturday.  I've changed this to 11pm on a Saturday now, which should hopefully lessen the chance of someone using the database, although we do have some staff who leave themselves logged in with open connections.

    However, when this happens again I will check the details above, could you advise about Isolation level?

    Regards

    Ian

    Wednesday, December 12, 2018 9:50 AM
  • I would suggest you have a quick look on the below for isolation levels

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/questions-about-t-sql-transaction-isolation-levels-you-were-too-shy-to-ask/#tenth

    ----------------------------

    Please Mark This As Answer if it solved your issue 
    Please Vote This As Helpful if it helps to solve your issue 
    Yuvraj Singh Bais 
    ----------------------------

    Wednesday, December 12, 2018 10:12 AM
  • .. and did you check the file system? It is NTFS? Something else that seems fishy?

    Tibor Karaszi, SQL Server MVP (Web Blog)

    Wednesday, December 12, 2018 1:14 PM
    Moderator
  • Yes they are all NTFS volumes.  Indeed, the other working scripts, the databases etc are all stored on the SAN area, so definitely nothing to do with permissions.
    Wednesday, January 2, 2019 11:06 AM