locked
Another SQL MP Ver 6.7.7.0 question - having issues since upgrading RRS feed

  • Question

  • Hi,  I have upgraded my SQL MP to latest 6.7.7.0 and its turning into a bit of a nightmare.

    I've read may other post with people out there having similar issues - but haven't come across any confirmation of whether this is a buggy MP update (as has happened in the past).

    So was just firstly hoping if anyone can advise if this MP is good to go, and its just a case of working through my specific errors or am I wasting time if this version is inherently flawed.

    My issues are (mostly) multiple MSSQL2012: Discovery failed amd MSSQL: Monitoring failed.  I've tried a combination of several approaches to resolve - RunAs Account /Profile distribution to target SQL servers, Configuring the Always On settings for Component mgmt/WMI for replica nodes, using the SQL Addednum MP from Kevin Holman to use SSID for monitoring instead of using RunAs - all to no avail.  Sometimes one approach will resolve one alert but then be replaced by another eg Discovery alert disappears but then get monitoring failure alerts.

    So if anyone could point me to the latest state of play on this MP release (trawled loads of other posts which offer various answers which havent resolved my environment, or other post remain open/unanswered).

    Would be even better if there was a prescriptive guide (not the MP documentation) that could walk through how to set up monitoring of SQL from scratch which will work 100% of the time - I know this is a big ask since everyones environment is different but a bit Like Kevin holmans original set of guides for RunAs config for the 4/5 most common scenarios.  Even now I had read that Kevins addendum MP does away with the need for RunAs configuration which seemed to be contradicted in another post by the SQL team that "while we do not support monitoring SQL with SSID for HealthService"?...

    Apologies for the ramble/rant - but I've been wrestling with this for a couple of weeks not and not getting anywhere fast!

    Tuesday, November 1, 2016 11:52 AM

Answers

  • Yes, there is a known issue with using SID to monitor always on. The problem is that when SERVERA is checking the health of the AlwaysOn cluster, it might be checking the health on SERVERB, but SERVERA's health service SID doesn't have permissions on SERVERB. I have instructions from MS to take care of this, but it's really ugly. I'm seeing if there's a script or something easier to do this.

    "Fear disturbs your concentration"

    • Marked as answer by new2scom Thursday, November 17, 2016 10:38 AM
    Tuesday, November 1, 2016 3:51 PM
  • 1   -    I have updated my SQL addendum MP, and constantly update it as I find issues or customer recommendations.  6.7.2.0 had some issuse with the low priv script - which would result in some databases not getting a login (non-readable secondary replicas, etc.).  This would be easy to see and troubleshoot - as the documentation is clear that a SQL login to each DB is a requirement and you'd get an alert of failed to login to database.  :-)

    Regardless - you should be using my latest MP.

    2   -   I don't recommend going back to the old MP.  The new SQL MP is fine - it simply has TWO workflows (discoveries) that have issues when using Local System as the default agent action account and RunAs account for SQL workflows (via service SID or not).  It makes some ASSUMPTIONS that firewalls will be open and that the RunAs accounts will have network access.  Whether this was a good idea or not is debatable - as this is a significant change and one of the first MP's ever to need external access to other systems like that.... but regardless - it isnt hard to fix.  I posted a simple security configuration on my latest SQL addendum blog update:

    https://blogs.technet.microsoft.com/kevinholman/2016/08/25/sql-mp-run-as-accounts-no-longer-required-2/

     


    Kevin Holman http://blogs.technet.com/b/kevinholman

    • Marked as answer by new2scom Thursday, November 17, 2016 10:38 AM
    Thursday, November 3, 2016 5:47 AM

All replies

  • Yes, there is a known issue with using SID to monitor always on. The problem is that when SERVERA is checking the health of the AlwaysOn cluster, it might be checking the health on SERVERB, but SERVERA's health service SID doesn't have permissions on SERVERB. I have instructions from MS to take care of this, but it's really ugly. I'm seeing if there's a script or something easier to do this.

    "Fear disturbs your concentration"

    • Marked as answer by new2scom Thursday, November 17, 2016 10:38 AM
    Tuesday, November 1, 2016 3:51 PM
  • Hi thanks for this...

    So is there any working solution I can implement to use say for example Kevin Holmans Addendum MP to implement SSID For SCOM Health Service that eliminates the need for RunAs Config/distribution - EXCEPT for the SQL 2012 Always ON nodes, and for these servers I need to implement the traditional RunAs Account distribution to Specific Targets.

    Just to clarify - my standard monitoring account is 'Local System'.  I have also worked with the SQL Team to create a service account which I use as the RunAs Account - this account has SA permissions in SQL AND it is a member of an AD group which the SQL team add this Group to the local Administrators group of all the servers they manage, so therefore the RunAs account should also have local Admin effective permissions.

    In my scenario above can you/anyone suggest the best way to approach how to develop a monitoring plan for SQL going forward.  I've actually had to take a pretty drastic step of suspending monitoring of SQL on all our SQL estate (200+ servers) which is not a good thing to do but my console is being flooded with noise when testing various permutations.

    Any help MUCH appreciated...

    Tuesday, November 1, 2016 4:13 PM
  • Andy - can you shoot me an email with the instruction you got from MS on how to resolve it and I can investigate?

    new2scom:  are you seeing bad ALERTS in SCOM about configuration issues - or are you just seeing events logged on the always on servers logging errors in the OpsMgr event log?

    I have always on running and monitored with SID - I dont see any errors except, from the PRIMARY server of an Availability group - I see:

    Log Name:      Operations Manager
    Source:        Health Service Script
    Event ID:      7103
    Description:
    Management Group: SCOMA. Script: ModuleAction Module: DatabaseReplicaAlwaysOnDiscovery.ps1 Version: 6.7.7.0 : Error occured during Always On Database Replica discovery.
    Computer: SQL2A 
    Reason: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)) 
    Position:214 
    Offset:13 

    and alert:   MSSQL 2012: Discovery failed

    Is this what you are seeing?


    Kevin Holman http://blogs.technet.com/b/kevinholman

    Tuesday, November 1, 2016 6:52 PM
  • Hi Kevin,

    just FYI I'm away from the office til next week but will definitely be keen to pick up this as a priority when I return..

    In the meantime from memory the alerts were a mixture of specific MSSQL 2012: Discovery Failure for SQL 2012 Always On nodes and a bunch of other MSSQL: Monitoring failure alerts for a number of different probes like stolen server memory, SPN registration and others.  My issue was been clouded by the fact I was trying different things to resolve but wasn't sure what the root cause was, whether it was the SQL Management pack being buggy or the limitation of Addendum MP were SID wasn't able to monitor 2012 AO components properly.  Just to say I did attempt to remove everything ie uninstall Addendum MP, and roll back to previous SQL MP version 6.7.2.0, Remove the RunAs config etc , but when I added changes back it I never got a working resolution and sort of went full circle with still no working solution.

    So at this stage I'd be happy to go back to a clean slate SQL monitoring wise, ie roll back/apply whatever 'good' version is recommended, and then apply the Addendum MP again if its recommended that it is a viable alternative to the cumbersome RunAs config/distribution method- even if it means that for certain groups of say 2012 Always on Nodes I have to use the Old RunAs method.  Am I right in thinking its OK to use a combination of methods ie the Addendum MP/SID method for the majority and standard RunAs for those 2012 servers which cause ongoing issues.

    If you could recommend a strategy for me to try and tackle this would be great ie should I definitely use SID method with the addendum MP (def prefer to remove need for RunAs if poss) and should I only use that alongside SQL MP ver 6.7.2.0 (which I think yhe Addendum MP was written for) or should I continue to try and deploy the latest 6.7.7.0 version and troubleshoot that.  Just for reference in terms of my SQL environment it would probably be akin to your 'scenario 5' in your guidance for setting up RunAs distribution as we have 200+ SQL serfers, 85% are managed internally by our SQL Team and the remaining 15% are hosted on our server infrastructure but are owned/managed directly by vendor/3rd party.

    Any advice much appreciated - I know I've maybe not answered your question directly but just wanted to respond off the top of my head before vacation - I would be very keen and happy to provide any additional detail on my return to help get to a workable solution for monitoring our SQL estate.

    Many thanks...

    Tuesday, November 1, 2016 9:24 PM
  • Just a quick update...

    I wasnt happy leaving my SCOM environment with SQL monitoring effectively non-existent so have pushed the holiday back a day to give me a chance to get back to some sort of workable state.

    So my plan is to:

    1. Completely uninstall SQL MP V.6.7.7.0 MP (and dependencies/Custom Override MP's) - DONE

    2. Reinstall SQL MP ver 6.7.2.0 - DONE

    3. Install Kevins Addendum MP and configure monitoring using SSID (Low Privilege) on all SQL agents - wont configure any RunAs account/distribution at this stage.

    This will hopefully get me to a place where I can at least start monitoring SQL again - I am expecting maybe some 2012 Always On issues - which I will pick up on again when I get back.

    Am hoping someone can then advise if I should/can safely proceed and upgrade SQL MP to Ver 6.7.7.0 (with addendum v 6.7.2.0)/SSID config) or if there is any new guidance/solutions for the 2012 Always On affected servers or whether I need to handle these with the original RunAs solution and distribute to these specific group of servers.

    Will keep you posted of any progress before I leave...

    Wednesday, November 2, 2016 10:23 AM
  • 1   -    I have updated my SQL addendum MP, and constantly update it as I find issues or customer recommendations.  6.7.2.0 had some issuse with the low priv script - which would result in some databases not getting a login (non-readable secondary replicas, etc.).  This would be easy to see and troubleshoot - as the documentation is clear that a SQL login to each DB is a requirement and you'd get an alert of failed to login to database.  :-)

    Regardless - you should be using my latest MP.

    2   -   I don't recommend going back to the old MP.  The new SQL MP is fine - it simply has TWO workflows (discoveries) that have issues when using Local System as the default agent action account and RunAs account for SQL workflows (via service SID or not).  It makes some ASSUMPTIONS that firewalls will be open and that the RunAs accounts will have network access.  Whether this was a good idea or not is debatable - as this is a significant change and one of the first MP's ever to need external access to other systems like that.... but regardless - it isnt hard to fix.  I posted a simple security configuration on my latest SQL addendum blog update:

    https://blogs.technet.microsoft.com/kevinholman/2016/08/25/sql-mp-run-as-accounts-no-longer-required-2/

     


    Kevin Holman http://blogs.technet.com/b/kevinholman

    • Marked as answer by new2scom Thursday, November 17, 2016 10:38 AM
    Thursday, November 3, 2016 5:47 AM
  • Hi Kevin,

    Thanks for explaining and pointing me to your updated Addendum MP.  After a rollback and reinstalling I'm now back to latest SQL MP Ver 6.7.7.0 and your Addendum 6.7.7.2...

    So plan now is to configure/enable 2008/2012 agents to use SSID and then run corresponding task to create the healthservice login for SQL as per the guide.  My original plan (as the SQL Team preferred as they always like most secure) was/is to create the SQL account as Low Priv, but incase/try to expedite/avoid troubleshooting other minor config issues, if I've run the task to create account as Low Priv but then want to try creating as SysAdmin can I just run that task aswell - or will the SQL Team first have to remove the account created as Low Priv first - basically can I run either ofr both create tasks on the same agent?

    One other query - not sure this is an real issue yet or just the fact that the import of MP's is still quite fresh and all discoveries etc have yet to complete...What I'm seeing is some agents/machines appearing in some of the SQL views including the Addendum seed discoveries, and I'm also getting some SQL Alerts (MSSQL:Monitoring Failed for the DiscoverySQL2008/12DBEngine scriptsfor agent but these machines have been identified as targets to 'undiscover SQL' - as they ar not machines our SQL Team manage - this has been done using the excludelist method on SQL DB Engine discovery method and also undiscover SQL role at the Installation Seed level - and these customizations have been saved to an Override pack which I've reimported after having the issues (and I've reran the remove-SCOMdisableClassInstance cmdlet)  So what I'm trying to say is - do you think there should be no issue/conflict over the addendum MP seed discovery/DB Engine computers listed with machines that have been targeted to actually Undiscover SQL role at the Windows computer Class/SQL DB Engine level?

    Sorry for going on a bit - but just to make it clear that after your guidance I'm in a MUCH better place with getting back on track with my SQL Monitoring and just also to say I've always had 100% faith in all your previous guidance/blog posts and reguarded you as a bit of a guru on all things SCOM related - so your help here have been invaluable so many thanks!!


    • Edited by new2scom Thursday, November 3, 2016 11:07 AM
    • Marked as answer by new2scom Thursday, November 17, 2016 10:37 AM
    • Unmarked as answer by new2scom Thursday, November 17, 2016 10:37 AM
    Thursday, November 3, 2016 11:07 AM
  • Hi Kevin (again, sorry)...just a quick update before signing off for today...

    Think I'm finally out of the woods now - after running the tasks for enabling SSID and creating the SQL login as Sysadmin, with a bit of patience and trying to force discovery along, I'm finally at the point where everything in the 2008/2012 Seed and DB Engine nodes of your addendum pack are showing green/healthy (apart from my SCOM SQL servers which I deliberately didnt target).

    I went through the additional config for our Always On nodes to add the relevant replica node to local admin group of the other and vise versa.  So far my console is free of any MSSQL Discovery/monitoring alerts - only actual SQL related ones indicating potential issues which may be due to new monitoring capabilities of the latest pack - Horray!...

    I'll see what happens overnight when I know all the background discoveries/configs have been picked up, but if all stays the same by this time tomorrow then happy days, and you dont know what a relief that is for me!!.. :-)

    Just before I finish- more out of curiosity which I may have mentioned previously:

    1. I created healthservice login as SA, but if (the SQL Team) want me to try Low Priv instead, can I just run that task and it will overwrite the existing login's permission accordingly or must the account be manually removed/not exist first - and similar if I was to reapply SA permission to replace Low Priv.

    2. If I have agents with SQL roles already undiscovered through override, is it possible they could still show up in the seed/DB Engine nodes of your Addendum MP - just incase there are any agents that look as though I need to perform the config steps but are actually not required.

    3. Re the SCOM SQL servers I have ignored this config for them as they as self monitoring and not an issue - am I correct in that assumption?...

    Many thanks again for all your help with this!!...Cheers...

    Thursday, November 3, 2016 5:40 PM
  • What's your email addy?

    "Fear disturbs your concentration"

    Thursday, November 3, 2016 9:05 PM