none
Data Protection Manager 2019 (DPM) General Purpose File Server Cluster Protection Failure RRS feed

  • 질문

  • Hi,

    (See OUR PROBLEM, later below)

    OUR CONFIG:

    Our IT team has a Windows General Purpose File Server Cluster "ENGRFILER1" (active/passive) that is sharing a highly used volume (U: drive) which is 2 TB (915 GB currently in use).  The volume is NTFS with deduplication enabled at the file server level.  The file server cluster nodes are running Windows Server Core.

    File server cluster node config (using MSINFO32):

         OS Name    Microsoft Windows Server 2019 Datacenter
         Version    10.0.17763 Build 17763
         Other OS Description     Not Available
         OS Manufacturer    Microsoft Corporation
         System Name    ENGR-FILER1N4
         System Manufacturer    Microsoft Corporation
         System Model    Virtual Machine
         System Type    x64-based PC
         System SKU    Not Available
         Processor    Intel(R) Xeon(R) Gold 6144 CPU @ 3.50GHz, 3492 Mhz, 3 Core(s), 6 Logical Processor(s)
         BIOS Version/Date    Microsoft Corporation Hyper-V UEFI Release v4.0, 3/13/2019
         SMBIOS Version    3.1
         BIOS Mode    Not Available
         BaseBoard Manufacturer    Microsoft Corporation
         BaseBoard Product    Virtual Machine
         BaseBoard Version    Hyper-V UEFI Release v4.0
         Platform Role    Not Available
         Secure Boot State    Not Available
         PCR7 Configuration    Not Available
         Windows Directory    C:\Windows
         System Directory    C:\Windows\system32
         Boot Device    \Device\HarddiskVolume2
         Locale    United States
         Hardware Abstraction Layer    Version = "10.0.17763.831"
         User Name    Not Available
         Time Zone    Eastern Daylight Time
         Installed Physical Memory (RAM)    Not Available
         Total Physical Memory    24.0 GB
         Available Physical Memory    11.4 GB
         Total Virtual Memory    27.5 GB
         Available Virtual Memory    11.8 GB
         Page File Space    3.50 GB
         Page File    C:\pagefile.sys
         Kernel DMA Protection    Off
         Virtualization-based security    Not enabled
         Device Encryption Support    Not Available
         A hypervisor has been detected. Features required for Hyper-V will not be displayed.    

    ( Note:  The file server cluster nodes are virtualized, running hyper-converged within a Storage Spaces Direct SOFS cluster.  The file server cluster node NTFS U: drive mounted volume share is actually a VHDX file inside a ReFS CSV on the storage spaces pool.  Deduplication is NOT enabled at the CSV S2D level. )

    We have built a Microsoft Data Protection Manager 2019 server for backing up our file server volumes.

    The DPM version is:  2019 - UR1 (10.19.260.0)

    The DPM SQL Sever version info:

        SQL Server Management Studio                        15.0.18330.0
        SQL Server Management Objects (SMO)                        16.100.37971.0
        Microsoft Analysis Services Client Tools                        15.0.19040.0
        Microsoft Data Access Components (MDAC)                        10.0.17763.1
        Microsoft MSXML                        3.0 6.0
        Microsoft .NET Framework                        4.0.30319.42000
        Operating System                        10.0.17763

    The server that DPM is running on is:  Windows Server 2019 Datacenter (1809, 17763.1131)

    DPM sever config (MSINFO32):

        OS Name    Microsoft Windows Server 2019 Datacenter
        Version    10.0.17763 Build 17763
        Other OS Description     Not Available
        OS Manufacturer    Microsoft Corporation
        System Name    ENGR-BACKUP1
        System Manufacturer    Dell Inc.
        System Model    PowerEdge R610
        System Type    x64-based PC
        System SKU    Not Available
        Processor    Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz, 2394 Mhz, 4 Core(s), 8 Logical Processor(s)
        Processor    Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz, 2394 Mhz, 4 Core(s), 8 Logical Processor(s)
        BIOS Version/Date    Dell Inc. 6.1.0, 10/18/2011
        SMBIOS Version    2.6
        BIOS Mode    Not Available
        BaseBoard Manufacturer    Dell Inc.
        BaseBoard Product    0F0XJ6
        BaseBoard Version    A13
        Platform Role    Not Available
        Secure Boot State    Not Available
        PCR7 Configuration    Not Available
        Windows Directory    C:\Windows
        System Directory    C:\Windows\system32
        Boot Device    \Device\HarddiskVolume1
        Locale    United States
        Hardware Abstraction Layer    Version = "10.0.17763.1131"
        User Name    Not Available
        Time Zone    Eastern Daylight Time
        Installed Physical Memory (RAM)    Not Available
        Total Physical Memory    96.0 GB
        Available Physical Memory    90.2 GB
        Total Virtual Memory    110 GB
        Available Virtual Memory    103 GB
        Page File Space    14.0 GB
        Page File    C:\pagefile.sys
        Kernel DMA Protection    Off
        Virtualization-based security    Not enabled
        Device Encryption Support    Not Available
        Hyper-V - VM Monitor Mode Extensions    Yes
        Hyper-V - Second Level Address Translation Extensions    Yes
        Hyper-V - Virtualization Enabled in Firmware    No
        Hyper-V - Data Execution Protection    Yes

    The DPM server has all patches and is up to date as of:  4/21/2020

    The DPM server and file server cluster are members of our AD domain.

    The DPM storage pool is:

       * 50 TB (25 TB currently free) on a 10 Gbit iSCSI connected SAN disk utilizing a SSD flash front-end tierd disk subsystem.

       * Formatted as "Modern Backup Storage" (MBS) ReFS without depduplication enabled.

    The connectivity of the DPM server to the file server cluster is 10 Gbit throughout.

    The above config is all BOG standard Microsoft DPM and storage config, out-of-the-box, vannilla stuff.

    ---------------

    OUR PROBLEM

    When we use DPM to backup the U: drive volume on our file sever cluster we are seeing the exact same problem as outlined in the following thread:

        https://social.technet.microsoft.com/Forums/en-US/52a4a412-3e78-447e-968a-3f2b291fc15b/fileserver-protection?forum=dpmfilebackup

    The error message we have is:

    U:\ on ENGRFILER1FS4.EngrFiler1.domain.x.y
       C:\Program Files\Microsoft System Center\DPM\DPM\Volumes\Replica\f553adae-8a21-403e-b5d1-9a1a5e0b1a31\5564c443-64e0-45ff-b8bb-1a4adec0bba2\Full\
    DPM failed to clean up data of old incremental backups on the replica for Volume U:\ on ENGRFILER1FS4.EngrFiler1.domain.x.y.
    Synchronization will fail until the replica cleanup succeeds.
    (ID 30134 Details: Cannot create a file when that file already exists (0x800700B7))

    This error message occurs every time we backup the volume.

    For troubleshooting, we have rebuilt the entire DPM server from scratch and still get the same error.

    For troubleshooting we have disabled Windows defender on the DPM server, and we do not run any other antivirus products, nor do we have defender enabled on our file server cluster nodes.

    We have not had trouble with DPM when backing up our other servers for our IT environment.

    We seem to only have trouble with the volumes exposed by our file server cluster.

    Running a "synchronization job with consistency check" seems to fix the problem for a short time, then the problem will re-appear within a few hours, or a day.

    I have a guess that this problem is somehow caused by the U: drive volume we are backing up is being deduplicated, but I can't confirm this, as others have stated that turning off dedup didn't solve their problem.

    This page says a lot about running deduplication with the DPM storage pool, but it says very little (if anything at all that I can find) about the whether there will be problems when backing up when the volume being backed up is deduplicated:

        https://docs.microsoft.com/en-us/system-center/dpm/dpm-support-issues?view=sc-dpm-2019

    The page needs more info for consumers if there are issues with backing up deduplicated volumes (on file server clusters or otherwise).

    Any help would be greatly appreciated!

    Thank you.

    2020년 4월 21일 화요일 오후 4:41

답변

  • Update on this post.

    I was able to contact Microsoft support and have a support incident created to help solve this problem.

    The problem did result in a bug in the DPM software related to backing up volumes that are being deduped.

    A private fix was issued to me to solve the problem.

    The private fix already existed, and apparently was to be incorporated into a future DPM update rollup patch.

    No date was given by Microsoft when the next rollup patch would be released, or whether the fix would be a part of the patch.

    So if anyone is using DPM, and backing up a cluster volume that is also being deduped, you may need the private fix if you are having issues.  You will need to contact Microsoft and create a support incident to get the private fix, or wait until the update rollup is released with the fix in it.

    • 답변으로 표시됨 Rodney Dyer 2020년 5월 12일 화요일 오전 3:49
    2020년 5월 12일 화요일 오전 3:46

모든 응답

  • Hi,

    The configuration appears to be OK, so you have the DPM agent installed on each file server cluster node and you are basically protecting the cluster volume under the "cluster object"?

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    2020년 4월 21일 화요일 오후 9:46
  • Leon:  "The configuration appears to be OK, so you have the DPM agent installed on each file server cluster node and you are basically protecting the cluster volume under the "cluster object"? "

    Yes, that is correct.

    We have the latest DPM agent (10.19.260.0) installed on all cluster nodes without issue.

    For our "EngrFiler1" cluster volume, we have created a DPM protection group that drills down the member tree to the volume to be protected as follows:

       Domain --> EngrFiler1 (Cluster) --> EngrFiler1FS4 --> "All Volumes" --> "U:\"  [checked]

    Note:  Under the DPM "EngrFiler1" (Cluster) tree, the name "Cluster Group" doesn't expand (it never has).  Clicking on it waits for a second or two (spinning cursor), then does nothing, without error.  That's why we had to choose the cluster "Role" name "EngrFiler1FS4", which is the "Client Access Name" (under our Failover Cluster Manger) for our filer cluster.

    For reference:

    The cluster name is "EngrFiler1"

    The cluster nodes are "Engr-Filer1N1", "Engr-Filer1N2", "Engr-Filer1N3", "Engr-Filer1N4".

    The cluster roles are:  "EngrFiler1FS1", "EngrFiler1FS2", "EngrFiler1FS3", "EngrFiler1FS4".

    The cluster "Client Access Names" are named the same as the role names.

    We have several SMB share names off of the U:\ volume.  Some have "Continuous Availability" enabled, some don't.

    I should also note here that we have file server node VSS snapshots enabled on the U:\ volume so that our users can restore from previous versions.  The snapshots run about twice a day for 30 days.

    We did NOT need, or want the ability of our users to restore from our DPM disk based backup, so that was not configured during DPM setup.  Our DPM backup is for disaster recovery only.

    We have verified that, during the backup process, the cluster roles are NOT moving, either because of problems, or manually otherwise.

    I should make a note here that the replica creation seems to go along just fine, and seems to complete (there is a lot of data that has been moved over), but it is when DPM tries to create the first recovery point is where we are seeing the problem.

    Also, an extra note here is that the DPM replica creation process is slow, then speeds up quite a bit near the end of the creation process.

    Further help is appreciated, thanks.

    2020년 4월 22일 수요일 오전 3:11
  • Shares hosted on SoFS are not supported at least, but the volume itself should be.

    In this situation, I would start by checking the DPM log for any clues, let's see if we can find anything that could lead us to the root cause of the issue.

    You'll find the DPM log over here:

    DPM server

    • %ProgramFiles%\Microsoft System Center\DPM\DPM\Temp\DPMRACurr.errlog


    Protected server

    • %ProgramFiles%\Microsoft Data Protection Manager\DPM\Temp\DPMRACurr.errlog

    Please don't paste the log file here, upload it to a Microsoft OneDrive or Google Drive and share the link (remember to hide any sensitive information!)


    Blog: https://thesystemcenterblog.com LinkedIn:


    • 편집됨 Leon Laude 2020년 4월 22일 수요일 오후 1:23
    2020년 4월 22일 수요일 오후 1:22
  • We are not backing up shares (or volumes) on the SOFS.  As stated previously, we are backing up a volume that is SMB shared by our "General Purpose File Server" (GPFS) cluster "ENGRFILER1" that is (the nodes of which are virtualized) running hyper-converged on top of a storage spaces direct (S2D) cluster.

    It's a GPFS cluster on top of a SOFS S2D cluster, where the ENGRFILER1 file server cluster essentially sees its storage as VSAN.  So technically, the shared U: space from the file server cluster is an NTFS volume that is back-ended by a VHDX (sitting in a CSV of the SOFS), and that should not cause any problems for DPM.  A least nothing I have read anywhere at Microsoft suggests that would be an issue.

    We could have just as easily setup the GPFS cluster on a real iSCSI SAN back-end, but we decided storage spaces direct would offer more performance.  For file serving, its been great, but we need backups.

    Some of the volumes we can backup without issue.  With other volumes we have the problem.

    I ran another synchronization and consistency check test on the protection group for the U: volume again.

    Here are the DPM server and DPM "Protected Server" logs that occured during my test ...

    Server:  https://drive.google.com/file/d/1rwKx61PwykVm7Xrp977AWOypSlhQO4Hp/view?usp=sharing

    Protected Server:  https://drive.google.com/file/d/1rPg7g8utjozPwJQDQLkrakpGYtoprKjP/view?usp=sharing

    Your help is appreciated.  Thank you much.

    2020년 4월 22일 수요일 오후 5:55
  • Adding an extra notes on this post.

    After looking at the DPM monitoring "All jobs" filter, and scrolling down to the initial replica creation for the U: drive volume, I see the following job status:

        Type:    Replica creation
        Status:    Completed
        Description:    The job completed successfully with the following warning:
            An unexpected error occurred while the job was running. (ID 104 Details: Cannot create a file when that file already exists (0x800700B7))
        End time:    4/15/2020 2:40:50 PM
        Start time:    4/15/2020 12:19:09 PM
        Time elapsed:    02:21:41
        Data transferred:    880,707.00 MB
        Cluster node    ENGR-FILER1N4.domain.x.y
        Source details:    U:\
        Protection group:    ENGRFILER1FS4 U_Drive Sofware Src Sys System Windows


    This first error is odd because:

    1. The replica creation was a success.

    2. The description states that there's a warning.

    However that warning is causing all subsequent synchronizations and recovery point creations to fail, as in the following under DPM montoring "All Alerts" --> Critical (Severity) message which reads:

        Affected area:    U:\
        Occurred since:    4/22/2020 12:10:37 PM
        Description:    The replica of Volume U:\ on ENGRFILER1FS4.EngrFiler1.domain.x.y is inconsistent with the protected data source. All protection activities for data source will fail until the replica is synchronized with consistency check. You can recover data from existing recovery points, but new recovery points cannot be created until the replica is consistent.

            DPM failed to clean up data of old incremental backups on the replica for Volume U:\ on ENGRFILER1FS4.EngrFiler1.domain.x.y. Synchronization will fail until the replica cleanup succeeds. (ID 30134 Details: Cannot create a file when that file already exists (0x800700B7))

    Recommended action:    This may happen if the replica volume is being accessed from outside of DPM. Please see the detailed error, resolve the issue with respect to the replica volume path, and retry the job. Otherwise, DPM will synchronize with the changes during the next scheduled synchronization job.
        Synchronize with consistency check.
        Run a synchronization job with consistency check...


    So this whole problem begins with the initial replica creation, which works, but then causes all further jobs to fail because of some file that already exists.

    In the DPM server log "DPMRACurr.errlog" I can see the 0x800700B7 error, but can't really make out what problem occured since I am not the developer of the software.

    I am adding an extra DPM server log here called "MSDPM2.errlog" which was written about 9:00pm the evening of the initial replica creation (4/15) for the U: volume.  This is a portion of the log around the point where the 0x800700B7 error intially occured:

        https://drive.google.com/open?id=1T5DIIVN9Z9zgH3gd5YjmN95a_TzAzCun

    I hope this can help.

    I would appreciate further support.  Thanks.


    2020년 4월 23일 목요일 오후 9:44
  • Update on this post.

    I was able to contact Microsoft support and have a support incident created to help solve this problem.

    The problem did result in a bug in the DPM software related to backing up volumes that are being deduped.

    A private fix was issued to me to solve the problem.

    The private fix already existed, and apparently was to be incorporated into a future DPM update rollup patch.

    No date was given by Microsoft when the next rollup patch would be released, or whether the fix would be a part of the patch.

    So if anyone is using DPM, and backing up a cluster volume that is also being deduped, you may need the private fix if you are having issues.  You will need to contact Microsoft and create a support incident to get the private fix, or wait until the update rollup is released with the fix in it.

    • 답변으로 표시됨 Rodney Dyer 2020년 5월 12일 화요일 오전 3:49
    2020년 5월 12일 화요일 오전 3:46
  • Thanks for sharing.

    Blog: https://thesystemcenterblog.com LinkedIn:

    2020년 5월 12일 화요일 오전 7:53