none
Exchange 2010 Service Pack 1 bug with duplicate machine SIDs

    Question

  • Dear all,

    Some of us have cloned virtual machines on which we did not run sysprep (or newsid). Yes I know we shouldn't have - but I never had issues with it as long as I clone them before renaming and domainjoining them. Sysprep is quite a time consumer for a non-existing problem in my opinion. Well, I might have to revise that statement after Exchange 2010 Service Pack 1 came out although even Mark Russinovich (previous SysInternals - now Microsoft - creator of NewSid, PsTools and much more) is on my side.

    See e.g. the "Machine SID Duplication Myth" on http://blogs.technet.com/b/markrussinovich/archive/2009/11/03/3291024.aspx

    Anyway I was setting up this nice litte test environment with 2xDC and 3xEXCH servers. Before doing to much configuration on it I decided to apply the just released Exchange 2010 Service Pack 1. Now all of a sudden I can't open the EMC and get an error like this:

    Initialization failed

    The following error occurred when retrieving user information for 'DOMAIN\USER'

    The operation couldn't be performed because object 'S-1-5-21-401783670-1437582426-1158247984-500' couldn't be found on 'dc1.contoso.com'. It was running the command 'Get-LogonUser'.

    It seems that quite a few other people have this issue as well - see: http://msexchangeteam.com/archive/2010/09/01/456094.aspx

    I did some research on it and this is what I found out:

    It seems that EMC is searching for the machine SID (and not the DOMAIN SID for that computer) - probably as a bug fix to the KB981033 issue (customers who put dots in their computer Netbios names - who would ever do that?!#.)

    In my case the machine SID was the same on all virtual servers:

    S-1-5-21-401783670-1437582426-1158247984

    (same result you get when running psgetsid without any parameters)

    All my virtual machines were of course cloned before they ever joined a domain. As my child wisdom told me a machine will get a unique domain SID when it's joined to a domain - even when several machines with the same machine SID are joined to a domain they will always get a unique domain SID. Hence when I run:

    psgetsid DOMAIN\MACHINEACCOUNT$
    The results are all different:
    DC1:
    S-1-5-21-401783670-1437582426-1158247984-1000
    DC2:
    S-1-5-21-401783670-1437582426-1158247984-1103
    MAIL1:
    S-1-5-21-401783670-1437582426-1158247984-1105
    MAIL2:
    S-1-5-21-401783670-1437582426-1158247984-1104
    MAIL3:
    S-1-5-21-401783670-1437582426-1158247984-1601

    When I go through adsiedit and check the "objectSid" attribute of all the computer accounts they are indeed the same unique values that psgetsid DOMAIN\COMPUTERNAME$ gave me from the command line.

    It clearly looks as this is a huge bug in the EMC. The search that it does in the background should have been based on a full ldap-path, objectSid (the domain one), objectGUID or something else than the local machine SID.

    Mark Russinovich writes this on his blog regarding duplicate SID issues:

    The final case where SID duplication would be an issue is if a distributed application used machine SIDs to uniquely identify computers. No Microsoft software does so and using the machine SID in that way doesn’t work just for the fact that all DC’s have the same machine SID. Software that relies on unique computer identities either uses computer names or computer Domain SIDs (the SID of the computer accounts in the Domain).

    However it actually looks like a piece of Microsoft software (Exchange 2010 Service Pack 1) uses machine SIDs to identify computers.

    Sorry for the long explanation - but does anyone know if this is known bug or a "by design feature" in the new service pack?

    In other words does anybody know if this is an issue that the Microsoft Exchange Server team will be addressing in a new hotfix or rollup or will we just have to live with it and make it standard procedure to always sysprep images and also do a check for duplicate machine SIDs (+ reinstall any servers with duplicate machine SIDs) before applying Exchange Server 2010 Service Pack 1 on any existing servers?

    In case it's by design then it would have been nice if the prerequisites check in the installation wizard could have detected the duplicated SIDs.

    Sunday, September 26, 2010 1:02 PM

Answers

  • Update: I have now reinstalled all three Exchange Servers by:

    Shut down exchange services -> copy database files -> shut down VMs -> reset computer accounts in AD -> install new VMs (sysprep'ed) -> join to domain with old servers names -> copy databases to new VMs (same locations) -> run Exchange 2010 SP1 setup with "Setup /m:RecoverServer /InstallWindowsComponents".

    Exchange Management Console and everything else now works again.

    Keep in mind that the only thing changed here is the local machine SIDs as I reused the old computer accounts in the domain and they still have the old domain SIDs in the objectSID attribute.

    Monday, September 27, 2010 7:15 AM

All replies

  • Mark Russinovich writes this on his blog regarding duplicate SID issues:

    The final case where SID duplication would be an issue is if a distributed application used machine SIDs to uniquely identify computers. No Microsoft software does so and using the machine SID in that way doesn’t work just for the fact that all DC’s have the same machine SID. Software that relies on unique computer identities either uses computer names or computer Domain SIDs (the SID of the computer accounts in the Domain).

    However it actually looks like a piece of Microsoft software (Exchange 2010 Service Pack 1) uses machine SIDs to identify computers.

     


    To be fair, Mark's blog on this issue came out in November of 2009. :) http://blogs.technet.com/b/markrussinovich/archive/2009/11/03/3291024.aspx

    This is just my humble guess, but I don't think the product team expects servers to be deployed in this fasion. Normal provsioning tools would always end up with something like sysprep (Even vSphere uses it for cloning) being run. It may not be a situation that has ever been tested, I honestly don't know.

     

    The more interesting thing is the SID you posted as an error appears to be the well known Administrator SID for your domain and not a computer account SID. You posted..

    The following error occurred when retrieving user information for 'DOMAIN\USER'

    The operation couldn't be performed because object 'S-1-5-21-401783670-1437582426-1158247984-500' couldn't be found on 'dc1.contoso.com'. It was running the command 'Get-LogonUser'.

    If we look at the well known SIDs in http://support.microsoft.com/kb/243330 we see S-1-5-21domain-500 = Administrator.

    I know this sounds silly, but would you happen to be logged into the machine as a local administrator account and not a domain account by any chance? Was the 'Administrator' account the account used to originally install Exchange 2010? If it wasn't then you may have to add it to the appropriate admin group for Exchange as well as eable it for remote powershell with Set-User <identity> -RemotePowerShellEnabled:$True .


    Microsoft Premier Field Engineer, Exchange
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Former Microsoft MVP, Exchange Server
    My posts are provided “AS IS” with no guarantees, no warranties, and they confer no rights.
    Sunday, September 26, 2010 2:00 PM
  • Hi Brian,

    I am logged on to the servers as the domain admin (DOMAIN\Administrator) and I also installed Exchange using the domain admin account. I didn't have this problem before SP1 and I get the same error on all three Exchange Servers. All of them are brand new installs (clean windows 2008r2 image with newest windows updates -> clean Exchange 2010 install (typical CAS/HUB/MBX install) + win/exch hotfixes. I installed them 1½ month and didn't touch them before yesterday when I installed SP1 on all three. I had no problems opening the EMC on all three servers just minutes before installing SP1.

    The SID mentioned in the error is my domain admin account SID:

    "psgetsid S-1-5-21-401783670-1437582426-1158247984-500" results in:

    DOMAIN\Administrator

    "psgetsid DOMAIN\Administrator" results in:

    S-1-5-21-401783670-1437582426-1158247984-500

    "psgetsid MAIL1\Administrator" results in:

    S-1-5-21-401783670-1437582426-1158247984-1000

    So it seems my domain admin account SID is unique (well at least with the -500 in the end)

    Others have solved the problem by: copy database files -> shut down vm -> delete computer account in AD -> install new vm -> run sysprep -> join to domain with old servers name -> copy databases to new vm (same location) -> run Exchange setup with "Setup /m:RecoverServer /InstallWindowsComponent" which leads me to think that the problem is not related to my domain admin SID but my machine SID.

    "psgetsid DOMAIN" results in:

    S-1-5-21-401783670-1437582426-1158247984

    "psgetsid MAIL1" results in:

    S-1-5-21-401783670-1437582426-1158247984

    So what I suspect is that although the error says:

    The operation couldn't be performed because object 'S-1-5-21-401783670-1437582426-1158247984-500' couldn't be found on 'dc1.contoso.com'.

    Then it probably means:

    The operation couldn't be performed because object 'S-1-5-21-401783670-1437582426-1158247984-500' couldn't be found on 'S-1-5-21-401783670-1437582426-1158247984'.

    And yes, I shouldn't have deployed it this way. Luckily it's a test setup so I can just do a reinstall (with sysprep'ed image).

    However I can very well imagine that quite a few customers have been running Exchange 2010 for a while in production with duplicated SIDs and no problems whatsover before they apply SP1.

    Sunday, September 26, 2010 3:42 PM
  • I fear that you will run into all sorts of problems with SIDs if you create cloned systems and don't run Sysprep. At least, it's been my bitter experience that many problems occur with permissions when you use non-Sysprep'd cloned systems. For example, I've run into issues creating a FSW for a DAG or adding a mailbox server to a DAG. All these issues go away when you do a proper job and use Sysprep to prepare cloned machines - and this happened in Exchange 2010 RTM so it's not an SP1 issue. It may be that Exchange is becoming more particular as Microsoft tunes some of the newer features of Exchange 2010 in SP1 but the fundamental and underlying truth is that you can expect problems if you clone machines for Exchange 2010 and do not use Sysprep.

    - Tony

    (http://thoughtsofanidlemind.wordpress.com)

    Sunday, September 26, 2010 4:51 PM
  • Some of us have cloned virtual machines on which we did not run sysprep (or newsid). Yes I know we shouldn't have - but I never had issues with it as long as I clone them before renaming and domainjoining them. Sysprep is quite a time consumer for a non-existing problem in my opinion. Well, I might have to revise that statement after Exchange 2010 Service Pack 1 came out although even Mark Russinovich (previous SysInternals - now Microsoft - creator of NewSid, PsTools and much more) is on my side.


    There are alot of scenarios where duplicate SIDs cause issues that aren't documented in the blog post you cite, unfortunately. Bottom line is that you are doing yourself a HUGE disservice by not running sysprep and you really need to get out of the habit you're in. Sysprep cleans up a ton of identifying information unique to a machine. SIDs are just one small piece of it.

    I wouldn't expect Exchange or really any application that's highly dependent on AD or inter-machine communication to necessarily behave properly in the scenario you've created. Either way it's undoubtedly not a supported scenario which means there may be issues.


    My Book - Active Directory, 4th Edition
    My Blog - www.briandesmond.com
    Sunday, September 26, 2010 5:43 PM
  • I wouldn't expect Exchange or really any application that's highly dependent on AD or inter-machine communication to necessarily behave properly in the scenario you've created. Either way it's undoubtedly not a supported scenario which means there may be issues.
    Thanks - I did learn my lesson and will be running sysprep even on my test scenarios from now on. My point is that if it was truely AD aware it would be using domain SIDs (which are unique) and not machine SIDs. Machine SIDs should not be used for inter-machine communication anyway around.
    Monday, September 27, 2010 3:17 AM
  • Update: I have now reinstalled all three Exchange Servers by:

    Shut down exchange services -> copy database files -> shut down VMs -> reset computer accounts in AD -> install new VMs (sysprep'ed) -> join to domain with old servers names -> copy databases to new VMs (same locations) -> run Exchange 2010 SP1 setup with "Setup /m:RecoverServer /InstallWindowsComponents".

    Exchange Management Console and everything else now works again.

    Keep in mind that the only thing changed here is the local machine SIDs as I reused the old computer accounts in the domain and they still have the old domain SIDs in the objectSID attribute.

    Monday, September 27, 2010 7:15 AM