none
Storagepool in iSCSI SAN lost when synchronising one particular data set RRS feed

  • Question

  • DPM is working fine and and backs up many shares, disks and VMs. However, the backup  of one particular disk of one particular Virtual Machine stopped a few weeks ago, as the iSCSI storagepool (Synology SAN) was disconnected.  The SAN needed to be "hard reset" in order to get the iSCSI connection to work again. Everything then works fine until I run a consistency check of the disk in question. After 1h35 minutes (just after the consistency check completes as confirmed by the "resolved warning alert"),  the SAN is disconneced again, and I need to hard-reset, etc...

    I can repeat this as many times as I want, but the result is allways the same.  I tried removing and adding the dataset to the protection group, and even  migrated the Storagepool to a new SAN (by moving the disks), but the result is allways the same. 

    Did anyone experience a similar problem and solved it? Other suggestions?


    • Edited by sewellia Saturday, June 23, 2012 10:42 PM
    Saturday, June 23, 2012 10:35 PM

Answers

  • It was a problem triggered by the replica of the disk in question. I deleted the replica and lost my recovery point on the way, but now, after I build a new replica, everything works fine again :-)

    • Marked as answer by sewellia Wednesday, June 27, 2012 10:59 PM
    Wednesday, June 27, 2012 10:59 PM

All replies

  • Hi,

    Trying to understand the configuration, can you confim my belief that you are protecting a Virtual machine by installing a DPM agent inside the Guest, then protecting a volume inside the guest.  That protected volume is an iscsi attached SAN disk, and after a successful synchronization, or consistency check, the iscsi attached SAN disk gets disconnected from the guest, and the only way to get it reconnected is to reset the SAN ? 

    Are there any event messages inside the Guest in either the system or application event log detailing problems with the iscsi attached SAN disk ?

    If you run vssadmin list providers from an administrative command prompt inside the guest, is there more than one provider listed ?


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, June 24, 2012 4:04 PM
    Moderator
  • Hi Mike, 

    No, the situation is a bit different. 

    I use an Synology SAN connected (by iSCSI) to the DPM as my storage pool.

    The Virtual machine is protected by a DPM agent installed inside the Guest system, and protecting all 4 volumes inside the guest. 

    Volumes C, F and G are protected, but volume H is not protected due to inconsistent state. When I run the consistency check, the STORAGE POOL of the DPM server is lost, as soon as the consitency check completes. 

    The VM and all volumes remain active but the DPM server is effectively stopped (all volumes, all VMs) due to "Missing disk" status for the storage pool. 

    I hope this explains the situation a bit better? Thanks for you consideration. 

    \jos

    Sunday, June 24, 2012 8:44 PM
  • Hi,

    OK - so the whole DPM storage pool dynamic disk goes missing.  If you go into windows disk management, is the dynamic disk present ?  What about in device manager, is the physical disk preseent.  What about in the Iscsi Initiator, does the disk show connected ?    Do you have a dedicated NIC for the iscsi attached SAN ?    Any events in the system or application even logs on the DPM server detailing a disk disappearing ?


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, June 24, 2012 11:58 PM
    Moderator
  • Hi Mike

    Correct. The entire storage pool disk goes missing, from the DPM server console, from device manager and from disk management.

    The iSCSI iniotiated shows "inactive" and when I try connect, it returns "Failed".

    After a hard reset of the iscsi disk, everything is OK again. All synchronizations, recovery points and  consistency checks go as planned as long as I do not start a new consistency check of the particular volume that causes the problem. 

    From the system event log:

    Error 24.06.2012 00:12:55 iScsiPrt 9 None Target did not respond in time for a SCSI request. The CDB is given in the dump data.
    Error 24.06.2012 00:12:55 iScsiPrt 49 None Target failed to respond in time to a Task Management request.
    Error 24.06.2012 00:12:35 iScsiPrt 39 None Initiator sent a task management command to reset the target. The target name is given in the dump data.
    Warning 24.06.2012 00:12:35 iScsiPrt 129 None "The description for Event ID 129 from source iScsiPrt cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event.
    The following information was included with the event: 
    \Device\RaidPort4

    the message resource is present but the message is not found in the string/message table"

    -jos

    Monday, June 25, 2012 6:54 AM
  • Hi,

    OK, DPM is the victim not the cause of the iscsi timeout. You will need to work with the SAN vendor to take a iscsi trace to determine why the device is not responding. 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, June 25, 2012 6:34 PM
    Moderator
  • It was a problem triggered by the replica of the disk in question. I deleted the replica and lost my recovery point on the way, but now, after I build a new replica, everything works fine again :-)

    • Marked as answer by sewellia Wednesday, June 27, 2012 10:59 PM
    Wednesday, June 27, 2012 10:59 PM
  • Hi,

     

    Thanks for the update, but I'm guessing down the road you may run into the issue again, if so, work with the storage vendor because no disk IO done by any applications should make storage disappear from the server.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, June 27, 2012 11:29 PM
    Moderator
  • Thanks very much for posting this, we appear to be facing the exact same problem. Do you mind me asking what type of iSCSI NAS/SAN target you have? We are using a QNAP TS-879U-RP with firmware 3.6.1 Build 0302T. The NAS seems to go AWOL at this point, we can't even remotely restart it, we have to physically power cycle it. It has been working fine for weeks, and then this suddenly happens. A couple of protection groups could be suspect...
    Wednesday, August 1, 2012 7:16 AM
  • Apologies, I read a bit closer and see that you "use an Synology SAN connected (by iSCSI) to the DPM as my storage pool."
    Wednesday, August 1, 2012 8:09 PM
  • I have the same exact situation.
    I too use a Synology SAN and anytime the connection is lost to the iSCSI SAN, whether it be the SAN rebooted, the server lost network connectivity or simply a server reboot, it seems to cause corruption and fail all my protection groups.

    Recreating the replicas is not an option for me because of how much data I protect so I would like to know more and see if anyone else has an actual solution and not a workaround...

    I don't see how losing network connectivity would cause my protection groups to fail all the time as long as the iSCSI volumes were reconnected.

    Monday, November 18, 2013 4:56 PM
  • Hi Adam,

    There are a multitude of bad things that can happen when NTFS files systems are suddenly yanked away from a running operating system due to storage connectivity issues and especially if you have active snapshots which will always be the case with DPM servers that have protection groups configured.

    1)Windows NTFS is a caching file system, and writes to NTFS volumes get cached for a short period of time.  If the Volume disappears before the flush, you get lost delayed writes and that can lead to file / file system corruption.

    2) Windows VSS maintains snapshots by monitoring the volume that was snapshotted (DPM Replica) and performs copy-on-write (COW) to the recovery point volume as DPM does backups.   If a COW is not able to complete, then VSS places the replica in shadow copy protection mode and prevents writes to the replica until the recovery point volume comes back online.  In some circumstances, windows cannot clear that state and if VSS metadata gets corrupted (due to condition 1 above)  then you could lose all snapshots.    Look for System event log messages from VOLSNAP detailing shadow copy protection events.  If you find any, you may need to pen a support case to help clear or reset the snapshot volume so new backups can be taken.

    3) Windows Server 2012 (and R2) has a code defect that can cause VSS to loose it's storage volume association after a storage fault.  Hotfix is in the works - see this forum thread for details, but this can also lead to lost snapshots.

    http://social.technet.microsoft.com/Forums/en-US/a39af676-2c51-41e3-82de-aae8b73550d8/vss-error-id-12289-in-the-event-log?forum=dataprotectionmanager 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Monday, November 18, 2013 6:26 PM
    Moderator
  • Hi Adam,

    There are a multitude of bad things that can happen when NTFS files systems are suddenly yanked away from a running operating system due to storage connectivity issues and especially if you have active snapshots which will always be the case with DPM servers that have protection groups configured.

    1)Windows NTFS is a caching file system, and writes to NTFS volumes get cached for a short period of time.  If the Volume disappears before the flush, you get lost delayed writes and that can lead to file / file system corruption.

    2) Windows VSS maintains snapshots by monitoring the volume that was snapshotted (DPM Replica) and performs copy-on-write (COW) to the recovery point volume as DPM does backups.   If a COW is not able to complete, then VSS places the replica in shadow copy protection mode and prevents writes to the replica until the recovery point volume comes back online.  In some circumstances, windows cannot clear that state and if VSS metadata gets corrupted (due to condition 1 above)  then you could lose all snapshots.    Look for System event log messages from VOLSNAP detailing shadow copy protection events.  If you find any, you may need to pen a support case to help clear or reset the snapshot volume so new backups can be taken.

    3) Windows Server 2012 (and R2) has a code defect that can cause VSS to loose it's storage volume association after a storage fault.  Hotfix is in the works - see this forum thread for details, but this can also lead to lost snapshots.

    http://social.technet.microsoft.com/Forums/en-US/a39af676-2c51-41e3-82de-aae8b73550d8/vss-error-id-12289-in-the-event-log?forum=dataprotectionmanager 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.



    For the answer please see the link in Mike's post. It worked for me perfectly and saved me a ton of time and effort.
    Tuesday, November 19, 2013 3:11 PM