none
Planning for Failed C: Drive RRS feed

  • Question

  • In a typical Hypoer-v and DPM environment, the Primary array holds the C: drive and in our case, a V: drive with the VHDs. The backup drives hold the DPM storage pool and a WSB backup of the C: drive.
     
    In theory, if the primary drives fail, all data can be restored from the backup arrays.
     
    What is not clear is how to recover the most recent data:
    1. Although the VMs are synchronized every 15 minutes, a recovery point is only created at the end of the day. Is it possible to force the creation of a recovery point using only the synchronized data if the VHD fails on the primary array in the middle of the day?
    2. Since the WSB backup only happens once a day in the middle of the night, its DPM database will not know about any recovery points created after it is taken. If the DPM Database has to be restored from that WSB Backup, can the most recent replicas still be used somehow to recover the most recent data?
    3. In the event the DPM database is completely lost, is there a way for DPM to scour through the storage pool to rebuild the recovery point history?
     
    Thanks,
    Bob.





    • Edited by BobH2 Monday, November 28, 2011 12:54 PM
    Monday, November 28, 2011 12:49 PM

Answers

  • <snip>
    Q1) The “Replica” volumes seem to be the only ones we need. Would we have any use for the “DiffArea” volumes in a manual recovery situation?

     

    A1) Both volumes are required to be available (online) in order to recover data from a shadow copy. The Diffarea volume holds the previous version data, and without it, the replica volume will be taken offline for shadow copy protection mode and you will not have access to even the replica data.

     

    Q2) Similarly, would there be any use for the “Incremental” folders in a manual recovery of Application Shadows?

     

    A2) The incremental folders are required if you need to bring a database up to current state. Basically, you need the last express full backup of the database and all incremental data to be applied to that DB to bring the DB into lastest state.
    >snip<

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, December 2, 2011 6:08 PM
    Moderator

All replies

  • Hi,

    See [MJ] inline comments.

    <snip>
    What is not clear is how to recover the most recent data:

    1. Although the VMs are synchronized every 15 minutes, a recovery point is only created at the end of the day. Is it possible to force the creation of a recovery point using only the synchronized data if the VHD fails on the primary array in the middle of the day?

      [MJ] You should be protecting the VM's using host level backups, not file system backups.  Host level backups are express full backups and are fully restorable, you can take several a day if needed.  If you need to protect data inside a guest more often than that just install an agent inside the guest and protect the data in a seperate protection group.
    2. Since the WSB backup only happens once a day in the middle of the night, its DPM database will not know about any recovery points created after it is taken. If the DPM Database has to be restored from that WSB Backup, can the most recent replicas still be used somehow to recover the most recent data?

      [MJ]  You should protect the DPMDB as a sql workload and you can do several express full backups per day.  Should the DB get corrupted or you loose the C: Drive, you can get the latest copy directly off the replica volume.  

    3. In the event the DPM database is completely lost, is there a way for DPM to scour through the storage pool to rebuild the recovery point history?
      [MJ] No - DPM needs a database to recover, else it's start completely over from scratch, so take the advice of setting up a dedicated PG just for the DPMDB and make frequent recovery points.

     

    >snip<


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, November 28, 2011 10:27 PM
    Moderator
  • Mike,

    Thanks for your response, but I must be missing something.

    VM Host Level Backup
    I haven't been able to find a proper definition of "host level" backup as it applies specifically to VM's although the term seems to be used all over the place. Could you please offer one?

    I assume PG is Protection Group.

    SQL and Exchange
    We already do explicit SQL and Exchange backups on each server where applicable, including the DPM host server.

    What's weird about this is that the Exchange and SQL backups create recovery points every 15 minutes, but the SQL DPMDB database is only once per day. I assume that that means its tied to the "Application Recovery Point" Why's that? I guess that's why you suggested a separate PG running multiple Application Recovery Points several times per day just for DPMDB? Can't the DPMDB backups be made to work at the same frequency as other SQL loads without creating a different PG?

     

    DPMDB Recovery
    The problem remains though, even if we run multiple backups during the day with a separate PG, the DPM database is in the Storage Pool and we can't recover it until we have the DPM database recovered! So this leaves either a WSB backup or a dpmbackup.exe backup somewhere. So I guess a more specific question would be, how do we get the most recent DPM database out of the storage pool if we only have a storage pool backup and lose the C: drive?

    And does it matter? In other words, is it possible to somehow restore the last replica of the DPM database without having the latest DPM database available?

    It doesn't sound like it is possible, even with a separate PG. Sounds like we need to take a dpmbackup.exe backup every 15 minutes (to match the latest Exchange and SQL backups) and save it on a Storage Pool disk partition. Did I miss something?

     

    VM Backups
    Agents are installed in all VMs. We are using "Backup using Child Partition Snapshot". They seem to backup and restore VHDs with multiple partitions just fine.

    If by Host level, you mean adding each VM individually, as if it were just a machine on the network, instead of Child Partition Backups, we have tried these. To restore them, the VHDs have to be created manually, and mounted locally on the DPM/Hyper-V server to restore. This is far more complicated (have to track all VHD files, their sizes, partitions, partition cluster sizes and alignments, etc. manually). PLUS, DPM warns that a separate System State backup is required, which increases backup size.

    There isn't enough room on the tape to store more than one backup per VM. (Multiple tapes is not an option.)

    This doesn't seem to be a good option. I must be missing something.

    Thanks again,
    Bob.



    • Edited by BobH2 Wednesday, November 30, 2011 1:55 AM
    Tuesday, November 29, 2011 11:22 PM
  • Hi,

    I will do my best to answer these questions in order.

    Yes PG = Protection group, sorry for not clarifying that, just a habit  8-). 

    VM Host Level Backup
    <snip>
    We are using "Backup using Child Partition Snapshot". They seem to backup and restore VHDs with multiple partitions just fine.
    >snip<

    [MJ] OK - these are referred to as "host level"  backups because DPM Agent is installed on the hyper-V host and is protecting the virtual machines.  I was confused by your statement that said:

    "Although the VMs are synchronized every 15 minutes, a recovery point is only created at the end of the day. Is it possible to force the creation of a recovery point using only the synchronized data if the VHD fails on the primary array in the middle of the day?"   

    [MJ] Hyper-V host level backups are always express full backups, so they do not get synchronized every 15 minutes.  So, to protect yourself, you may want to consider more frequent express full backups.

    SQL and Exchange
    <snip>
    What's weird about this is that the Exchange and SQL backups create recovery points every 15 minutes, but the SQL DPMDB database is only once per day. I assume that that means its tied to the "Application Recovery Point" Why's that? I guess that's why you suggested a separate PG running multiple Application Recovery Points several times per day just for DPMDB? Can't the DPMDB backups be made to work at the same frequency as other SQL loads without creating a different PG?
    >snip<

    [MJ] The DPM SQL DB is in Simple recovery mode, meaning we cannot take incremental backups for it, only express full (EF) backups are possible when a DB is in simple recovery mode.  This is why I recommended a seperate PG, so you could take more frequent EF backups. 

    DPMDB Recovery
    Windows supports 512 application snapshots, and every DPMDB backup using EF uses one snapshot.  This means you could take an EF every 15 minutes and use 96 shadows / day and have 5 days protection.  In my opinion that is overkill, but if you didn't want to risk not being able to restore latest exchange data that is synchronized every 15 minutes, then that is what would be required.  Now, the absolule last express full would be on the replica volume like you stated, and should the C: drive fail, you can rebuild the O.S., Import the dynamic disk(s), install DPM, install the DPM Qfe, then manually assign a drive letter to the replica volume containing the DPMDB and copy out the .mdf, ldf filles to c:\temp.  Then run dpmsync -restoredb -dbloc=c:\temp\msdpm2010.mdf  - followed by dpmsync -sync - that will get you back to where you were before the c: drive failure.   Worst case is the c: drive failed in the middle of an express full, and now that DB on the replica is inconsistent.  In that case, you would need to use diskshadow.exe to mount a previous shadow copy for the replica volume containing the DPMDB and copy the DPMDB files out of the shadow copy.

    The only thing you need ahead of time is the volume GUID of the replica volume so you know what volume to assign the drive letter to.  You can get the path from the "path to replica" link for that data source under the protection tab, then run mountvol.exe and match the path to get the volume GUID.  Then use mountvol.exe to assign it a drive letter.  mountvol x: \\?\Volume[guid]

    OR - Like you stated, it might be just as easy to schedule a dpmbackup -db then copy it to a network share someplace every 15 minutes, but that would take extra disk space if you want unique versions of the DB - maybe last 8 hours worth or someting like that.   

    Hoperfully I covered all your questions.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, November 30, 2011 12:57 PM
    Moderator
  • Mike,

    Thanks for the info. I also took a good look at the Modify Protection Group dialogs and reviewed customer systems.

    So I guess there are three frequencies at work here:

    1. Synchronizations that happen every 15 minutes. Exchange will also be triggered to create recovery points if ESE.DLL and ESEUTIL.exe are installed correctly on the DPM server. SQL can also create recovery points with during synchronizations if we turn off simple recovery mode. (I found two customers with SQL 2008 R2 Express that I would have thought were installed exactly the same way. One is getting backups once a day, the other every 15 minutes. If you have a handy link for how to adjust that I’d appreciate it - otherwise I’ll search for it.)
    2. Express Full that occurs once per day, by default, and takes care of Microsoft database loads such as SQL and Exchange, and VM’s.
    3. File Recovery Points that occur three times per day, by default, and work on Volumes and Folders on both the Host and VM’s. 

    In terms of preparing for a failed C: drive, and or VHD partition, I think the simplest approach would be the following:

    1. Run a DPMDB backup to a known partition on the backup array as often as every 15 minutes in order to always be able to restore the very last Exchange and SQL backup. We can keep say, the last five backups. I’ll experiment with the performance of the machines and see if this is doable. (This is absolutely critical to being able to restore the latest recovery points in the event of a failure of the C:\ Drive unless the backup frequency of DPMDB is increased, either with more frequent Express Full backups or turning off simple recovery mode somehow.)
    2. It would be prudent to record the GUID of the DPMDB replica ahead of time just in case. I am going to practice recovering the DPM DB in this manner too.
    3. Consider adding critical VM folders to the File Backups, so they are backed up three times a day instead of one time per day.
    4. Experiment with running more frequent Express full backups

     

    Excellent! Thank you Mike I think I’ve got it now.

    Now back to one of my original questions - I'll try to be more specific: Since the Storage Pool has accumulated synchronizations every 15 minutes during the day, and since in theory, we can now restore the latest copy of the DPM database, is there any way to force the creation of a recovery point, even if the original VHD is lost or corrupted? One of the options in File Backups for example is to create a Recovery Point without Synchronization. If this can work for file synchronizations, might there be some way to impose this on a VHD?

    Thanks again,
    Bob.

     

    Thursday, December 1, 2011 1:12 AM
  • HI,

    Link on how to change SQL from simple recovery mode that only support express full backup, to full recovery mode that support incremental (synch) backups.

    Changing the Recovery Model of a Database
    http://technet.microsoft.com/en-us/library/bb808756.aspx

    <snip>
     Since the Storage Pool has accumulated synchronizations every 15 minutes during the day, and since in theory, we can now restore the latest copy of the DPM database, is there any way to force the creation of a recovery point, even if the original VHD is lost or corrupted?
    >snip<

    Remember, virtual machine Backup using Child Partition Snapshot are always express full and therefore always have the latest data after one is completed.  There are no intermediary synchronizations that occur.  You will notice when you right-click a VM data source, your only option is to make an express full backup, there is no option to synchronize like there is for file data.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, December 1, 2011 1:36 AM
    Moderator
  • Mike,

    I think I noticed at one point that VM’s are not synchronized during the day. I have etched that into my understanding now.

    I have also set up a batch file to run on customers’ systems to create a text file with a list of the mount points and to create a DPMDB backup. It seems to execute in a range of 15 seconds to 1 minute. I’ve skewed the start time to start 12 minutes after a sync to be sure the sync is finished before the backup is taken.  I’ll monitor the times a bit to maybe reduce the skew.

    The SQL Database was already coded correctly, according to your link. If I can’t get that going I will follow up with a separate post.

    I figured out how to use mountvol information to view the contents of a specific replica. It looks like the replicas actually contain exact, well, replicas!

    Question: If the DPM server crashes during synchronization, Express Full, etc., I would imagine that the replica would then be in a crash consistent state. That is, part of the replica would contain old data, and part would contain new data. So pulling files out of a replica in that state should only be used in a worst case scenario. For example if someone had finished an hour’s work on a file, the server crashed, and we had to attempt to find the file in the replica manually, the file may or may not be consistent.

    This could be true for volume or folder data, SQL and Exchange data, and Hyper-V data. Even running a Create Recovery Point without synchronizing would not help.

    However, if the DPMDB database is intact, it should be able to restore the last recovery point.

    Am I correct so far?

    So another original question again, and this time more specifically worded: If the very last DPMDB is not available, and the replicas were undergoing a synchronization or creation of a recovery point, can that last available DPMDB (that was backed up prior to the sync) restore at least the previous recovery point reliably? In other words, if a DPMDB database does not know about a subsequent sync or recovery point, can it (a previous DPMDB) still be used to recover data reliably from replicas that underwent syncs and recoveries that it doesn't know about?

    Thanks again,
    Bob.

    Thursday, December 1, 2011 6:11 PM
  • Hi,

    Yes, you are correct, if a synchronization (or even express full) failed for any reason, then the replica is left in an inconsistent state until a consistency check is ran.  So, if the DPM server dies unexpectedly while data sources were being updated by any type of backup, you would not want to restore data directly from the replica, but from a shadow copy.

    Yes, you can recover using an older DPMDB, but as noted the only recovery points available to restore from will only be those that were available at the time the DPMDB was backed up.  So if today at 12:00pm you take a DPMDB backup, and tomorrow at 10:00AM the DPM server dies.  If you rebuild the DPM server and restore the DPMDB from 12:00pm backup, all recovery points created after 12:00PM will not be available for recovery from DPM UI. 

    See: http://technet.microsoft.com/en-us/library/ff399388.aspx

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, December 1, 2011 6:43 PM
    Moderator
  • Mike,

    I appreciate the detailed answers. I think I’m getting there.

    From some more reading about this, I gather that synchronization data is copied to somewhere other than the replica at first, but during the creation of a recovery point, it is merged into the replica, so the replica is a true replica. During the merge, the changed replica data from the previous good replica, is stored as a shadow copy somewhere.

    I reviewed the volumes listed by mountvol and I looked in the storage pool partitions via C:\Program Files\Microsoft DPM\DPM\Volumes. I did see at one point, some smaller copies of some of the protected data files. So my next question is, are DPM recovery points and their shadow copies a truly DPM thing, and is DPM the only entity that can pull shadow copy replicas out? In other words, in a worst case scenario, and we’re assuming a replica is inconsistent, is there a manual technique that could go around DPM and access its recovery point shadow copies?

    Thanks,
    Bob. 

    Friday, December 2, 2011 2:55 AM
  • Hi,

    There is no "temporary" storage for synchronizations, they get applied directly to the replica volume.

    DPM uses shadow copies for recovery points and are accessible in two ways outside of DPM.

    1) For file server date (volumes / shares) they are acccessible by sharing out the replica volume then accessing it via a network share and the "previous versions" tab will expose the shadow copies.

    2) For application data (sharepoint, SQL, Exchange, Hyper-V) - the shadow copies can be mounted using diskshadow.exe utility using the shadowcopy ID from the "vssadmin list shadows" output.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, December 2, 2011 3:08 AM
    Moderator
  • Mike,

    Ha! That’s beautiful.

    Ok, so to summarize:

    Even if we do not have the DPM database available, we can still get access to the replicas, which we cannot count on being consistent, but we might attempt to use to retrieve some types of critical files. We always have access to the shadow copies from previous recovery points and they will always be consistent.

    If we do have a current copy of the DPM database, we can use DPM to perform recoveries, which is just easier.

    So some more questions:

    The “Replica” volumes seem to be the only ones we need. Would we have any use for the “DiffArea” volumes in a manual recovery situation?

    Similarly, would there be any use for the “Incremental” folders in a manual recovery of Application Shadows?

    Thanks,
    Bob. 
    Friday, December 2, 2011 3:11 PM
  • <snip>
    Q1) The “Replica” volumes seem to be the only ones we need. Would we have any use for the “DiffArea” volumes in a manual recovery situation?

     

    A1) Both volumes are required to be available (online) in order to recover data from a shadow copy. The Diffarea volume holds the previous version data, and without it, the replica volume will be taken offline for shadow copy protection mode and you will not have access to even the replica data.

     

    Q2) Similarly, would there be any use for the “Incremental” folders in a manual recovery of Application Shadows?

     

    A2) The incremental folders are required if you need to bring a database up to current state. Basically, you need the last express full backup of the database and all incremental data to be applied to that DB to bring the DB into lastest state.
    >snip<

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, December 2, 2011 6:08 PM
    Moderator