none
SQL Error 18456, Serverity 14, State 11 Help!!

    Question

  • Hi All

    We have a 4 node cluster, A/A/A/P and each node is giving error 18456 severity 14 state 11.

    The error is slightly different depending which node you are on.

    If its an active node it gives

    Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON', Reason: Token-based server access validation failed with an infrastructure error. [CLIENT: xxx.xxx.xxx.xxx]

    If its the passive node you get

    Login failed for user 'DOMAIN\SERVER$', Reason: Token-based server access validation failed with an infrastructure error. [CLIENT: <named pipe>]

    The errors are happening on all nodes

    So node 1 gets login errors from 2,3,4, node 2 from 1,3,4, node 3 from 1,2,4 etc.

    I have run the below as suggested in one post but to no joy.

    declarecLogins cursor

          for

          select Name

            from sys.server_principals

           where type_desc LIKE  'WINDOWS%'

          

    opencLogins

    declare@login varchar(100)

    fetchcLogins into @login

    while@@FETCH_STATUS=0

          begin

                begin try

                      declare @sql nvarchar(max)

                      set @sql = 'alter login ' + quotename(@login,'[') + ' with  name = ' + quotename(@login,'[')

                      execute (@sql)

                end try

                begin catch

                      if ERROR_NUMBER() = 15098

                            print @login  + ' failed validation with the system SID. This user/group account has been recreated. You will need to drop and recreate the login and associated database user accounts'

                      else

                            print @login + ' - ' + error_message() + ' (' + cast(error_number() as varchar(10)) + ')'

                end catch

                fetch cLogins into @login

          end

    deallocatecLogins

    There are no Windows scheduled tasks which span all servers, no SSIS packages which span the servers, no SQL jobs which span servers, no linked servers.

    The only thing I see strange is that the MS Cluster Service is running as local system, but NT SERVICE\ClusSvc has public access only which is all it needs.

    Any help would be appreciated.

    Thanks

    Wednesday, February 22, 2012 1:41 PM

All replies

  • Do you get the same error when connecting through SSMS or SQLCMD using a domain account or domain group?

    Did you have any changes in your domain infrastructure recently? Was everything working fine before?


    SQL Server Database Administrator

    Wednesday, February 22, 2012 2:24 PM
  • We can connect to SSMS via Windows or SQL authentication successfully, its only when the nodes try to speak to other nodes.

    As a side note, I have run a trace for Audit Logon Failed and its coming back with something trying to run SQLCMD as local system as its trying to run as NT AUTHORITY\ANONYMOUS LOGON.  Just trying to find out what as I can see SQLCMD being spawned in task manager, unless its happening to fast for it to keep up.

    I recently started this new job only a week ago so cannot comment on any changes.

    One thing I have noticed is that as its a 3rd party who provide our hosting, there is a service called WOTS which runs SrvAny.exe and in the folder there is a load of SQL scripts, now I think this might be some custom monitoring the 3rd party has, think I might log a call to see if it calls SQLCMD and if we can disable the service to see if it does stop the errors and then tell them to make it use a domain account which we can give the nessesary rights.  But not holding my breath on that one.

    Wednesday, February 22, 2012 2:38 PM
  • If you have SQLCMD running under the LocalSystem account, it means the original process must have local administrator rights and must be using some kind of impersonation technique such as PSexec to impersonate the LocalSystem account and then run SQLCMD. If your instance has the "NT AUTHORITY\SYSTEM" login with sysadmin privileges (which is the installation default), then that process will gain control over your instance. Be careful.

    You should work with your hosting provider to solve the server to server authentication problem. It is something related to the domain infrastructure and/or account permission configurations. Also, you should check out when the problem started happening, and then try to correlate with any changes done to your domain infrastructure. Trying to find the cause with no history whatsoever will be more difficult.


    SQL Server Database Administrator

    Wednesday, February 22, 2012 3:38 PM