none
Snapshots and Domain Controllers - Are they ALWAYS bad???

    Question

  • One sees many posts against using snapshots on virtualized domain controllers. If the VM is running at the time the snapshot is taken, this makes a good amount of sense. But if the VM is off, how is this fundamentally different than any other means of backup???

    In a larger sense, if ALL of the machines in a domain are cleanly shutdown, then snapshots taken, are there really any issues if the entire domain is restored back to the snapshots, even is significant (but less than tombstone) time/operations had occured in the mean time???


    Microsoft MVP / ALM Ranger

    Sunday, August 12, 2012 3:03 PM

Answers

  • The 'official' guidance is to not do it. 

    For the same reason that you need to properly plan your AD deployment and frankly never-ever want all the DCs or BDCs to go down together.

    The primary reasoning behind the MSFT guidance is that, frankly, there are a great number of admins out there that don't understnad how AD works (they just use it) or, they will put AD in a VM, expect snapshots and backups to solve all their problems and not properly design the AD infrastructure itself.

    By telling folks to not do it, you at least force the discussion of designing a proper AD infrastructure.  This is the root of the recommendation / guidance.

    Things happen, AD breaks, hardware dies, etc. It happens.  But with a properly designed AD, you build a new controller, promote it, replicate, move on.  Becuase some portion of the DC's still existed, it was still alive and could be contacted.

    If anyone puts themselves into a situation where their entire AD infrastructure would need to be recovered from a backup, that would be painful.  Highly possible, but painful.

    In this case a running snapshot adds no more value than a powered off snapshot.  As any data in memory is (frankly) a mute point.

    Also, prior to Server 2012, snapshots are not merged live in the background.  Which means you take a running snapshot, back up, delete it, but you remain running in a snapshotted state.  (a differencing disk chain exists).

    This leads into a storage issue that many folks get themselves in to because they do this very thing and never power off the VM for the snapshot disks to merge.

    When you roll all this together, it is far safer to advise folks not to do it - than to explain all the ifs and what nots.

    It is not wrong, it is not misleading, it is (howerver) the safest path.

    Being in this forum since Hyper-V was in its origional beta, I have helped many folks through issues.  And speaking with many peers agree that it is far easier to advise folks to not do something than to help them out of the problem becuase they didn't take the time to fully understand the implications.  And, frankly, we can't always tell folks of all the implications all the time.  We too, learn those from experience.

    That is why I generally try to not give folks an answer in the forums, but rather describe things.  That way, if it was a jr. admin, he learned something, if it was a sr. admin she might already know, but her thinking is clarified.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    Monday, August 13, 2012 3:29 PM
    Moderator

All replies

  • More of a directory service question. I'd ask them here.

    http://social.technet.microsoft.com/Forums/en-US/winserverDS/threads

     http://technet.microsoft.com/en-us/library/dd363553(v=ws.10)

     

     

     


    Regards, Dave Patrick ....
    Microsoft Certified Professional
    Microsoft MVP [Windows]

    Disclaimer: This posting is provided "AS IS" with no warranties or guarantees , and confers no rights.

    Sunday, August 12, 2012 3:17 PM
  • It is to do with the update sequence number (USN) getting out of sync with other domain controllers, thus causing an inconsistent state. This particular 'issue' (although it isn't really an issue, it's by design) has been resolved in Server 2012 which now means you can snapshot or clone your DCs. 
    Sunday, August 12, 2012 3:24 PM
  • It is to do with the update sequence number (USN) getting out of sync with other domain controllers, thus causing an inconsistent state. This particular 'issue' (although it isn't really an issue, it's by design) has been resolved in Server 2012 which now means you can snapshot or clone your DCs. 
    But this only works with a Server 2012 Hyper-V Host and activated Domain Controller Option for the DC (VM Configuration) 
    Sunday, August 12, 2012 3:31 PM
  • @Dave Patrick - thanks for the links to the other sites, but the questions are really specific to differences between Hyper-V snapshots (taken whilke the machine is powered down) and other means of backing up and restoring domain controllers, not the "general" issues with restoring domain controllers...

    @StevenWH, Flose1984 - I am aware of the USN issue, but this happens anytime a DC is restored from a backup. What I am looking for (and neither find nor concieve of) is any difference between a Windows Backup/Restore (which would use VSS to improve consistency at the file system level) and a Snapshot *when the machine is OFF*.

    Also, (as posted originally), I am especially interested in the case where the entire domain (all DC's) is powered down (cleanly), then a series of snapshots taken. If at a later point the machines (All of them) are restored to the "synchronized snapshot" I do not see how the USN issue would exist at all.

    edit: Alas, 2012 is not an option at this point...


    Microsoft MVP / ALM Ranger


    Sunday, August 12, 2012 4:02 PM
  • If you go through the forum, you will be able to find lots of posts discussed the similar issue. Please check the following posts and blog to see whether it’s helpful.

    http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2virtualization/thread/2e913d5c-899b-46b2-88a2-a7bf7a0a584d
    http://social.technet.microsoft.com/forums/en-US/winserverhyperv/thread/e1dd6ab7-d8b9-4f7f-a760-72615490bcd1
    http://www.ronnipedersen.com/2011/08/building-a-virtual-lab-that-is-prepared-for-snapshots/
    Monday, August 13, 2012 1:49 AM
  • @Zephrhu, thanks for the time in posting the links, but everything I saw in them relates to general issues with restoring AD (or components dependent on AD) rather than specific information regarding snapshots taken while the machines are OFF, and then later restoring the system (not just one part) to a snapshotted condition.

    As far as I can tell, there is NO difference between this scenario, and simply turning the machines off for an extended period, or for restoring all of the machines to a set of Mindows Server Image backups. If my findings are correct, then I would say that there are not issues specific to snapshots (although there are many issues relating to a machine being turned off for an extended period, or restored from a backup; yet I still find information being posted that specifically recommends against using snapshots.


    Microsoft MVP / ALM Ranger

    Monday, August 13, 2012 2:04 AM
  • Hi,

    To understand the difference between recovery from Snapshots and from general system back, please understand USN rollback.

    Refer to these articles to understand USN and USN rollback:

    USN and USN Rollback
    http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10)#usn_and_usn_rollback
    Tracking Updates
    http://technet.microsoft.com/en-us/library/cc961798.aspx
    A Guide to Active Directory Replication
    http://technet.microsoft.com/en-us/library/cc137781.aspx

    According to AD definition in Previous Windows Server system (before Windows Server 2012), there is an AD attribute “invocationID” exist to distinguish general system recovery action, then follow up steps handle USN change. But there isn’t an AD definition for recovery from snapshots action, so it can’t handle USN change in this scenario, so it will cause USN rollback issue.

    Windows Server 2012 add an AD definition to distinguish Snapshots recovery, we can use Snapshots recovery Domain Controller in virtual machine which support this new AD field.

    > In a larger sense, if ALL of the machines in a domain are cleanly shutdown, then snapshots taken, are there
    > really any issues if the entire domain is restored back to the snapshots, even is significant (but less than
    > tombstone) time/operations had occured in the mean time???

    "Shutdown all machines in a domain”, I think you mean in lab environment (how can we shut all machines down in product environment?). Yes, if you can shut down all machines in a domain and recovery them with snapshots created at same time segment, I think it will not cause USN rollback.

    If you mean shutdown all Domain Controllers and recovery them from Snapshots which created at same segment, it will not cause USN rollback I think. However, it may cause user or computer authentication issue, since User account objects and Computer account objects may update after you take that snapshots.

    You can test my opinion in test environment; however, do not try to use Domain Controller Snapshots in production environment if you don’t run it in Windows Server 2012 system.

    For more information please refer to following MS articles:

    Things to consider when you host Active Directory domain controllers in virtual hosting environments
    http://support.microsoft.com/kb/888794
    Running Domain Controllers in Hyper-V
    http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10
    Active Directory Domain Services (AD DS) Virtualization
    http://technet.microsoft.com/en-us/library/cc961798.aspx

    Hope this helps!

    TechNet Subscriber Support

    If you are TechNet Subscription user and have any feedback on our support quality, please send your feedback here.


    Lawrence

    TechNet Community Support


    Monday, August 13, 2012 4:13 AM
    Moderator
  • On operating systems before 2012 it will not update the invocation number on the NTDS.DIT database and therefore USN rollback is likely to happen as neighbour DC's will think that they have already processed the USN for changes made.


    Carl Smith MCITP-EA

    Monday, August 13, 2012 1:56 PM
  • Carl - Thanks for the response, but I do not see how "neighbour DC's will think that they have already processed the USN for changes made." if ALL of the DC's are restored to the same point in time. Even if only one DC is restored, I still do not see how this would be DIFFERENT than if the machine in question was restored from a Windows Server Backup.

    Once again, I am looking for snapshot specific (with the restriction that the machine has been cleanly shut down and turned off when the snapshot is taken) issue, and not issues that are common with other restoration techniques (such as image backups).


    Microsoft MVP / ALM Ranger

    Monday, August 13, 2012 2:03 PM
  • Please do not consider a snapshot a backup.  It is not the case with Hyper-V (or XenServer, or any hypervisor that is not ESX).

    The whole thing with snapshotting domain controllers is that in the background you have security accounts that are all time sensative and they get re-negotiated over time. 

    DC's are not designed to go back in time (frankly) they are only designed to go forward.  Over time any machine joined to a domain will update its negotiated security key and if you return your domain to a state prior to the new key the domain membership is broken. 

    This is the reason why all the warnings and what not.  It has zero to do with the snapshot being running or not.  (not running always give a more consistent state by the way).

    The situation can be overcome, and in a small environment is not a big deal.  But in a corporate environment, it is a huge deal.

    If you have a small environment that is all VMs, snap the entire set at the same time, then return them all to the same point in time and all is well.

    Ben Armstonrg has told the story really well over time (for the enterprise):

    http://blogs.msdn.com/b/virtual_pc_guy/archive/2008/11/24/the-domain-controller-dilemma.aspx

    http://blogs.msdn.com/b/virtual_pc_guy/archive/2009/11/20/hyper-v-and-domain-controllers-demo-tips-and-tricks.aspx

    http://blogs.msdn.com/b/virtual_pc_guy/archive/2011/03/23/simultaneous-snapshot-trick.aspx


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    Monday, August 13, 2012 3:00 PM
    Moderator
  • @BrianEh, regarding "Please do not consider a snapshot a backup"...In and of itself, I agree. However if a snapshot is taken while the machine is OFF (ie there is no "state") then the (now immutable) VHD file can be easily backed up using any of a variety of techniques [not forgetting that the machine definition/configuration must also be backed up] even when the virtual machine is running.

    I completely agree with "DC's are not designed to go back in time (frankly) they are only designed to go forward", yet in any condition where the system (hardware or software) fails, it is necessary for them to go back in time, regardless of the means of restoring them...

    So if "you have a small environment that is all VMs, snap the entire set at the same time, then return them all to the same point in time and all is well" (where I believe small, means that the impact of snapshotting them all at the same time, either by them being off, or by Ben's "trick" is practical), then I have to conclude that the pervasive postings along the lines of "Never Snapshot your Domain Controllers" are simply wrong (or at the very least, misleading).


    Microsoft MVP / ALM Ranger

    Monday, August 13, 2012 3:12 PM
  • The 'official' guidance is to not do it. 

    For the same reason that you need to properly plan your AD deployment and frankly never-ever want all the DCs or BDCs to go down together.

    The primary reasoning behind the MSFT guidance is that, frankly, there are a great number of admins out there that don't understnad how AD works (they just use it) or, they will put AD in a VM, expect snapshots and backups to solve all their problems and not properly design the AD infrastructure itself.

    By telling folks to not do it, you at least force the discussion of designing a proper AD infrastructure.  This is the root of the recommendation / guidance.

    Things happen, AD breaks, hardware dies, etc. It happens.  But with a properly designed AD, you build a new controller, promote it, replicate, move on.  Becuase some portion of the DC's still existed, it was still alive and could be contacted.

    If anyone puts themselves into a situation where their entire AD infrastructure would need to be recovered from a backup, that would be painful.  Highly possible, but painful.

    In this case a running snapshot adds no more value than a powered off snapshot.  As any data in memory is (frankly) a mute point.

    Also, prior to Server 2012, snapshots are not merged live in the background.  Which means you take a running snapshot, back up, delete it, but you remain running in a snapshotted state.  (a differencing disk chain exists).

    This leads into a storage issue that many folks get themselves in to because they do this very thing and never power off the VM for the snapshot disks to merge.

    When you roll all this together, it is far safer to advise folks not to do it - than to explain all the ifs and what nots.

    It is not wrong, it is not misleading, it is (howerver) the safest path.

    Being in this forum since Hyper-V was in its origional beta, I have helped many folks through issues.  And speaking with many peers agree that it is far easier to advise folks to not do something than to help them out of the problem becuase they didn't take the time to fully understand the implications.  And, frankly, we can't always tell folks of all the implications all the time.  We too, learn those from experience.

    That is why I generally try to not give folks an answer in the forums, but rather describe things.  That way, if it was a jr. admin, he learned something, if it was a sr. admin she might already know, but her thinking is clarified.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    Monday, August 13, 2012 3:29 PM
    Moderator
  • @BrainEg - thanks for the detailed post, and I agree with nearly everything (especially "there are a great number of admins out there that don't understand how AD works" and the ramifications thereof).

    I also conceed that taking "the safest path" for guidance is the right way to go. [I happen to be one of the Microsoft ALM Rangers and am involved in the guidance for many TFS related issues]. I would definately not promote recommending snapshots for generally published material.

    That being said, I have spent many hours [well in excess of a man week] searching for specific conditions that applied to snapshots of domain controllers, which were unique - once all "general DC restore" issues and "general snapshot issues" (such as the "storage issue") which apply even to non-DC machines were eliminated.

    Based on everything I have read (including your valuable information), this appears to have been a complete waste of time, triggered by the phrasing of material found at multiple sites.


    Microsoft MVP / ALM Ranger

    Monday, August 13, 2012 4:21 PM
  • The best advice so far around this issue has been from the AD team itself and has finally been brought into one topic:

    http://technet.microsoft.com/en-us/library/dd363553(v=ws.10)

    There is an Operational considerations section.  But again, it is a list of "do nots" with some explanation in there.  (as you have become frustrated with and I don't blame you).

    Personally, This discussion will come back around again after the GA of Server 2012.  Since snapshots can merge while a VM continues to run.  Your scenario will be very (technically) viable, and a number of folks will desire it.

    But frankly, there is the Replica feature which would be a better fit.  As it promotes the act of moving forward and never allowing going backward.

    So, as MSFT does really listen to customers, they respond with features.

    Since you mention ALM - Lab Manager has that nifty feature of snapping the entire environment at once and storing it that way.  This kind of works around the issue by maintaining the integrity of time across all mahcines.  But again, time can move past and these things become stale.  Anything stale in RAM (due to the passage of time) will create a large number of issues.  And will be forced to reboot to clean up (at the least).

    And I make my generalization about the understanding of AD becuase I have run into far to many folks in IT that have no idea that AD objects have a lifetime to them beyond creation and deletion.  And that is the real key concept here.

    Thanks for the discussion David.  I am sure that someone will turn this up in search and find it useful.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.


    Monday, August 13, 2012 4:38 PM
    Moderator
  • @David Corbin

    I have read this entire thread as this is the exact question that has been circling my head and I have been looking for an answer too.
    Your questions / comments are exactly what I wanted answering and it seems the posts here can't seem to give a straight answer.

    The USN's are incremented in any care or scenario so a windows backup is useless. I totally understand it may make total sence of the server itself runs a DHCP server, CA server and similar but it makes no sense to backup AD alone as you can simply switch it on in DSRM mode, change the registry key into recovery mode and that's it. It will replicate as usual. So in answer to your cquestion, unless someone can answer with more details is yes, you can use offline VHD's or offline snapshots.

    Can anyone explain otherwise?

    Sunday, October 07, 2012 4:38 AM
  • @dqnet - thanks for the response. While having domain controller(s) on dedicated machines certainly has many advantages, I have found that for many small/mid sized shops this is not always an option (usually due to licensing costs). I have seen all sorts of services active (including SqlServer) on the DC.

    Microsoft ALM Ranger

    Sunday, October 07, 2012 12:57 PM
  • @David

    I totally agree and in those cases it makes perfect sense. I have a domain controller on a physical machine and the other virtulised and have searched countless hours on what-to-do if the virulised domain controller was to fail (FSMO role holder). All articles point to restoring from a Windows Backup and none 'really' explain why you shouldnt use a copy of a VHD you took a day before. Since our domain controller runs nothing but AD/DNS I have decided to simply copy the VHD from time to time to a local NAS.

    If something was to occour to the virtulised domain controller, I am under the impression that I can just switch the copied one back on and boot straight into DSRM mode. Flick the registry switch and boot it normally. The other 'physical' domain controller should simply straight replicating the latest data it has across to it.  I know this may not be the correct method but I really can't understand why not?

    Unless I am understanding wrong, this is the question you are ultimately asking?

    Sunday, October 07, 2012 4:56 PM
  • @David Corbin

    I have read this entire thread as this is the exact question that has been circling my head and I have been looking for an answer too.
    Your questions / comments are exactly what I wanted answering and it seems the posts here can't seem to give a straight answer.

    I dunno - let's take the DC out of the equation since it's, to me, mostly a red herring given this SPECIFIC scenario:  the virtual machine/guest OS is POWERED OFF when the snapshot is taken.

    At that point, how is the HyperV snapshot different from any other backup?  That's where I would like to understand.  I can't see any way that it's different.  If I have to restore a DC from backup tape, to me the same procedures for re-syncing AD should apply if I restore from an old snapshot.  I think this is what the original poster was driving at (if I missed there, please correct me!)

    If there is no difference, then the fact that the guest OS is also a domain controller is extra unnecessary information.  If there is a difference, then what exactly are the differences between other backup solutions and a snapshot of a VM that is powered down from the perspective of the guest VM?  And finally, coming back to the DC question - is there anything in those differences that would affect a domain controller that are snapshot specific and not just part of the normal DC restore process?

    I totally get where Brian Ehlert is coming from on best practices, official guidance, etc.  But I think the fact that we are in this thread having a (hopefully) low-level discussion would indicate that much of that guidance is overly conservative and unnecessary :o)

    Monday, October 08, 2012 7:34 PM
  • I dunno - let's take the DC out of the equation since it's, to me, mostly a red herring given this SPECIFIC scenario:  the virtual machine/guest OS is POWERED OFF when the snapshot is taken.

    That is my question in a nutshell, is "Never Snapshot Domain Controllers" a red herring? Everything I have found relates to issues with restoration of "old data", which is independent of the mechanism.

    Now as to "At that point, how is the HyperV snapshot different from any other backup?", there is one difference (which is important to me) and that is the amount of time it takes to make the backup. Consider a DC with a moderate [100GB] size disk in use. Making a copy of this disk will take some time. But creating a snapshot is virtually instant as it does not involve any copying of the data, and the virtual machine restarted. Once the snapshot is made, the now "immutable" base material can be copied to another site


    Microsoft ALM Ranger

    Monday, October 08, 2012 8:20 PM
  • My understanding is that the backup process is not the issue, its the restore process.

    When you restore using 'application aware' backup software (like windows backup) once a DC is restored the backup software 'tells' the server that it has been restored and as a result other DC's with be used to update the newly restored (out of date) DC.

    If one simply restores a snapshot without telling AD that it has been restored All DC's will believe they have the most up to date USN and refuse to replicate with each other.

    Additionally i would suggest that one NEVER restores a DC when they have other working DC's, simply perform a fresh install and run DC promo.

    Friday, February 08, 2013 11:11 AM
  • George,

    thanks for the response [although the issue has been quiet for months].

    Unfortunately it (like most of the others here) have not answered questions related to the specific scenarios I am discussing.

    SCENARIO #1... There are many machines inlcuding multiple DC's that are being backed up. As part of either a real disaster or a simulation it is necessary to restore the ENTIRE environment......  What are the actual differences between the following two conditions:

        1) The machines are phsyical. Nightly they are shut down, and then the drive image backups taken. During the DR copies of these drives will be used.
        2) The machines are virtual. Nightly they are shutdown, and then snapshots taken. During the DR copies of these snapshots will be used to restore the VM's


    Microsoft ALM Ranger

    Friday, February 08, 2013 11:33 AM
  • In short,

    i don't think anything is wrong with that if ALL DC's are restored using the same timeframe snapshots.

    Issues arise when one restores individual DC's from snapshots.

    What i personally do is attach an additional VHD to each DC, Run a (guest level) system state windows backup to that VHD and then run snapshot capable backup software to backup that DC (including the separate VHD).

    That way i could even restore an individual DC using the below process (although still recommend fresh install and DC Promo if possible):

    1. Restore the entire VM but do NOT boot it up
    2. Boot the DC in ADRM and restore the system state backup.
    3. Away you go!

    I'm not even sure its an issue if snapshots are taken when server is online as i believe its solely the restore which can cause issues, although its probably best to shut them down to be extra safe as you suggest.

    Sorry for posting on an old thread. Its just something i've looked into recently!


    Friday, February 08, 2013 12:14 PM
  • The key here is this; snapshots (with hyper-v) are NOT equivalent to backups.

    with VMware they are treated this way, but not with hyper-v. That is the big key point.

    they are fine for short durations but never intended for long duration, backup, or DR situations.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    Friday, February 08, 2013 2:36 PM
    Moderator
  • I'm not sure that is the key for the OP?

    I don't think (hope) anyone is recommending using snapshots as a backup method, i think the question is more a theoretical 'Why can't i snapshot DC's?'. - At least that was the question i was trying to answer.

    Anyway...when using VM backup software which isn't application aware (Trilead VM Explorer for example) the restore process is essentially like restoring a snapshot, so the question has some validity beyond curiosity.

    I speak from VMWare camp rather than Hyper V but i expect the points are still somewhat valid.

    Friday, February 08, 2013 3:49 PM
  • The key here is this; snapshots (with hyper-v) are NOT equivalent to backups.

    with VMware they are treated this way, but not with hyper-v. That is the big key point.

    they are fine for short durations but never intended for long duration, backup, or DR situations.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    Although tengential to the question posted...

    There are two methods of backup...internal to the VM and from the host. Lets consider the latter. If one does an export of the VM, and then copies the fileset to another loction, it can indeed be considered a backup [ignoring the DC issues that are the focus of the discussion]. These exports may be restored to completely different machines at any arbitrary point in time. If one goes a step further and considers a true imaging backup solution [where the computer OS is shutdown and a complete image of the HD is made] there is another valid scenario.

    The "problem" with both of these is that they take significant time where the machine in question is not running. Shutting down a [hyper-v] VM, taking a snapshot and then looking at only the base VHD is equivilant. [This has been tested many times, and the VHD's created by this process are identical to both of the methods mentioned above]. The advantage is that the downtime of the machine is often under 2 minutes [compated with up to an hour for other approaches].

    Given that the first two methods ARE backups, and the third method produces EXACTLY the same artifacts, I faily to see how if can NOT be considered equally valid.


    Microsoft ALM Ranger

    Friday, February 08, 2013 7:19 PM