none
USN Rollback

    Question

  • I have a PDC showing the famous error 2103 "The Active Directory Domain Services database has been restored using an unsupported restoration procedure". This is a vm but as far as I know no system state restore or vm rollback has been performed but after a maintenance window where it was gracefully stopped and started in the beginning of the month it started displaying this error.

    According to the article  I don't have much of a choice here but demote and remove the PDC.

    This is a production multienant environment with 2 DC, the PDC 01 and the DC 07 so I'm trying to figure what are the safest steps I can take to correct this without bringing down the whole environment at all.

    For starters when I run the repadmin /showutdvec I can see the guid of a dc that doesn't exist. This must be the guid of a DC that failed last year that I demoted and removed from the network.

    01

    Caching GUIDs.
    ..
    Default-First-Site-Name\QUALTECHS1-01 @ USN   5636879 @ Time 2017-04-13 10:34:54

    67fd89e4-e119-4956-8858-91c46e5ae366 @ USN    209438 @ Time 2016-12-25 13:09:56
    Default-First-Site-Name\QUALTECHS1-07 @ USN   1478897 @ Time 2017-04-13 10:34:25

    07
    Caching GUIDs.
    ..
    Default-First-Site-Name\QUALTECHS1-01 @ USN   5636875 @ Time 2017-04-13 10:34:44

    67fd89e4-e119-4956-8858-91c46e5ae366 @ USN    209438 @ Time 2016-12-25 13:09:56
    Default-First-Site-Name\QUALTECHS1-07 @ USN   1478902 @ Time 2017-04-13 10:34:48

    I created a script that produced the results above in the DC 07. So here is the second "head scratcher". As you can see even though I ran the script in 07 the results are different in both servers. According to the article above the PDC 01 is the one displaying the message therefore is the only one that should show the difference in the USN numbers. Am I doing something wrong here?

    I haven't noticed any problems so far besides the Netlogon service to pause which is one of the symptoms for USN rollback. Is it possible that this is all being caused by the guid of the DC that doesn't exist in the network anymore?

    That DC was removed last year and the USN rollback message only started in the beginning of the month of April.

    If I have no choice but remove the PDC 01 the steps I'm planning to perform are:

    1. Adding a new DC to the existing forest for scalability

    2. Seize FSMO on DC 07

    3. Demote and remove PDC 01 from the network

    This I'm hopping is the safest way to get this fixed.

    Comments/suggestions welcome.

    Just one more. Is there a really good way to find out if the objects from the PDC 01 are not replicating anymore? For example if I create a new OU or a new user in the PDC 01 it not replicate to DC 07 for example?


    Tony


    • Edited by Tony Amaral Thursday, April 13, 2017 7:25 PM
    Thursday, April 13, 2017 7:19 PM

Answers

  • Make sure you have Systemstate backups for your domain/forest, incase of any need of restore.

    1. OPen the registry and go to below path

    2. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

    3. There will be a DSA Not Writable REG_DWORD and it should be showing a  value is 0x4

    4. Delete that "DSA Not Writable" and reboot the domain controller

     

    If the problem continues, you may need to demote and promote this DC.

    • Marked as answer by Tony Amaral Friday, April 14, 2017 3:28 PM
    Friday, April 14, 2017 2:47 PM

All replies

  • couple of questions

    1. When you say this DC was stopped and started, you mean Paused i suspect and resumed later ? Or do you mean shutdown and started later. This could explain

    2. What version of ESX host are you running and operating system version of windows . As vmgenid is available only on certain.

    When you demote the PDC, do it using the force option, and then clear the metadata manually. Dont let the pdc talk to the other domain controllers. That 

    Thursday, April 13, 2017 7:35 PM
  • We had to do maintenance in the compute node where these vms are running so all the vms were done a proper Windows shutdown. If we run the risk of an AD rollback every time we do a reboot or a proper shutdown then that a major Windows stability issue.

    We are running vSphere on the hardware host.


    Tony

    Thursday, April 13, 2017 9:37 PM
  • VM-Generation ID functionality requires the hypervisor vendor to create the virtual machine identifier and expose it to the guest. VMware has provided this functionality in the following releases of vSphere:

    • VMware vSphere 5.0 Update 2 (vCenter Server and ESXi must both be at 5.0 Update 2)
    • VMware vSphere 5.1 (ESXi must be at least 5.0 Update 2)

    from https://blogs.vmware.com/apps/2013/01/windows-server-2012-vm-generation-id-support-in-vsphere.html

    and

    https://support.microsoft.com/en-us/help/875495/how-to-detect-and-recover-from-a-usn-rollback-in-windows-server-2003,-windows-server-2008,-and-windows-server-2008-r2

    Thursday, April 13, 2017 10:08 PM
  • We definitely run vSphere 5 not sure exactly the version. I'll discuss this with our datacenter partner to find out exactly how things are setup on the back end. Really appreciate the information.

    Just an additional clarification. On the part of seizing the FSMO roles on another DC. I've found information where some say you should do it with the PDC still on some say you should do it after demoting and shutdown your PDC. I believe I did this on my lab with the PDC still on but it's a while. Does it make any difference either way?

    Any idea on how to remove the orphan guid? My guess is it has no impact. It's been there since last year and only realized after this issue came up so I'm assuming it has no impact in the normal operation of the DCs.


    Tony

    Thursday, April 13, 2017 10:29 PM
  • I have confirmed that we run vSphere 5.5 and all the VMWare's recommendations were in place so "theoretically" the USN rollback should have not happened.

    Also I've been monitoring the PDC and I haven't seen a 2103 error logged in the Directory Service since yesterday. The last error shows right after I fixed the error 613 "svchost (3932) The version store for this instance (0) has reached its maximum size of 2Mb".

    That's strange, I'd expect to see error 2103 logged on a regular interval as it shows the previous days. This is all just strange.

    Is there anyone that can shed some light on this? I'm not interested in going through the exercise of correcting a USN rollback issue unless there's definitely something wrong.

    Thanks


    Tony

    Friday, April 14, 2017 2:07 PM
  • is this windows server 2008 R2 or windows server 2012 R2 ?
    Friday, April 14, 2017 2:36 PM
  • We only run Windows 2012 R2.

    Tony

    Friday, April 14, 2017 2:42 PM
  • Make sure you have Systemstate backups for your domain/forest, incase of any need of restore.

    1. OPen the registry and go to below path

    2. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

    3. There will be a DSA Not Writable REG_DWORD and it should be showing a  value is 0x4

    4. Delete that "DSA Not Writable" and reboot the domain controller

     

    If the problem continues, you may need to demote and promote this DC.

    • Marked as answer by Tony Amaral Friday, April 14, 2017 3:28 PM
    Friday, April 14, 2017 2:47 PM
  • I have state backups and I had seen those steps that you mention above I just wasn't sure that would be the best course of action. If I continue to not see the message or any major issues that's probably what I'll do first before moving forward with the full replacement of the PDC.

    Thanks for all the great feedback Narayanan I really appreciate it.


    Tony

    Friday, April 14, 2017 3:28 PM