locked
Second SCOM Management Server SPN issue RRS feed

  • Question

  • Hello, 

    We stood up a second management server two months ago and up until our patching weekend (when it restarted), we have had no issues with it joining the Resource Pool and running normally. After the server rebooted, it was no longer able keep the Data Access Service running. Long story short... I believe it is due to not correctly setting the SPN's for this server after configuration.

    Here is my question, if we run the setspn -a command and restart the SDK service, will we experience a service outage?

    Would it be best to only run the setspn -a command on the second management server? The reason I ask, if we run that to correct the SPN issue, we are worried that we will break the entire SCOM service and have a worse issue that will lead to a prolonged service outage. By running it only on the second management server that is already down, we are hoping to prevent that service outage. 

    Thanks for any advice or help!

    Monday, March 2, 2015 7:15 PM

Answers

  • If this is a SPN issue then refer the below to fix it:

    http://blogs.technet.com/b/kevinholman/archive/2011/08/08/opsmgr-2012-what-should-the-spn-s-look-like.aspx

    Also is the SDK action account same on both MS ? Verify that by going to Services.msc on both the Management servers.

    Also change the Service action from Services.msc for the SDK service to local and re change it to domain account and stat / restart to see if it is stable.

    Also try to add the SCOM Action account and SDK account as local admin in the 2nd MS and check.


    Gautam.75801

    • Proposed as answer by Yan Li_ Monday, March 9, 2015 2:49 AM
    • Marked as answer by Yan Li_ Thursday, March 12, 2015 9:13 AM
    Tuesday, March 3, 2015 5:08 PM

All replies

  • Hi There,

    Just a question. How can you confirm this is a SPN issue ? Was there a console alert or an event log pointing to the SPN issue ? Like the below ?

    Also post the event logs here for analysis plz. This can also happen due to many reasons i.e logon failure etc.

    So posting the errors / alerts / events here will do good for analysis.


    Gautam.75801

    Tuesday, March 3, 2015 5:03 AM
  • Thanks for the help! 

    Yes we received Event ID 26371 immediately after the SCOM service came back online. We saw all the correct events in order starting with 102, 105, 326, 26361, 2900, 21031, 2002, 7026, 7019, and then a bunch of 20021 as the agents came back up. I can't post the event log as it contains all of our FQDN's and what-not. After the 26371 event we are seeing 26380, 26338, 26339, 33333 and then it repeats with the SDK restarting (event id 26361) and then 26371, 26338, 26331, 33333, 26339, and finally a 26380 killing the SDK. 

    So I am assuming that after I ran setspn -l and saw that our SPN's are incorrect that I need to register the SPN's correctly. The issue being that I am worried that if I do so, we will experience a service disruption even worse than what we are experiencing now, which is to say that everything is running fine on our first Management Server but we can't get the Data Access Service to run on our Second Management Server. As I have been digging further into the event log, I am seeing that most of the errors after the 26371 indicate that our OperationsManager DB is read only. This is another reason I believe it is the SPN issue as our original Management Server is reading/writing just fine to the OperationsManager DB. 

    Tuesday, March 3, 2015 4:50 PM
  • If this is a SPN issue then refer the below to fix it:

    http://blogs.technet.com/b/kevinholman/archive/2011/08/08/opsmgr-2012-what-should-the-spn-s-look-like.aspx

    Also is the SDK action account same on both MS ? Verify that by going to Services.msc on both the Management servers.

    Also change the Service action from Services.msc for the SDK service to local and re change it to domain account and stat / restart to see if it is stable.

    Also try to add the SCOM Action account and SDK account as local admin in the 2nd MS and check.


    Gautam.75801

    • Proposed as answer by Yan Li_ Monday, March 9, 2015 2:49 AM
    • Marked as answer by Yan Li_ Thursday, March 12, 2015 9:13 AM
    Tuesday, March 3, 2015 5:08 PM
  • We have checked those points that you noted and have found some issues with the environment. For that reason we will re-install the second Management Server then correctly assign the SPN's to it. If it performs correctly, we will then fail all of the agents over to it as their primary MS and then correct the SPN on our first Management Server. The reason being is that the first management server is currently running the DAS as a local user. It is working correctly at this time, but in this configuration we can't seem to get the second management server to correctly join the DAS. 

    I greatly appreciate your time and thoughts! I will update the thread after we re-install the second management server. 

    Thursday, March 5, 2015 3:01 PM
  • Hi There,

    No problem. Just fail over / Change he management server to the agents and go ahed with the activity.


    Gautam.75801

    Thursday, March 5, 2015 3:03 PM