none
Hyper-V Cluster Backup problem: Unexpected failure. Error code: 48F@01000003 RRS feed

  • Question

  • We have several 2012R2 datacenter Hyper-V hosts with several CSV's to a HP 3PAR FC SAN, managed with SCVMM 2012R2 and backed up with DPM 2012R2. We have been running fine and doing backups with agents in each VM. 

    All hosts are regularly updated with Cluster Aware Updating.

    Recently I thought I'd set up Hyper-V host backups because I hope it will make DPM easier to manage, and I believe it uses far less bandwidth and I/O's. Initially, I selected about 15 vm's to backup from the cluster node in DPM. This managed to crash 2 hosts and I've searched everywhere for a solution. One link suggested it is a 3PAR ODX issue, but I disabled ODX by this:

    http://flemmingriis.com/dpm-2012-r2-windows-server-2012-r2-disable-odx/ and the issue persists.

    Specifically, the error is: Event 1 VDS Basic Provider, Unexpected failure. Error code: 48F@01000003.

    These occur several times then the CSV goes offline and all the VM's crash. Needless to say, this is catastrophic and a bullet proof solution needs to be found.

    I'd appreciate any help :)

    Thursday, March 27, 2014 12:47 AM

All replies

  • Hi Eugene

    The Hosts are completely up-to-date. I've been through that link and also the cluster updates. One thing I haven't tried is installing the 3PAR hardware VSS driver on the hosts which is what I intend to do next. 

    Parallel backups are still at the default 3, but I'd be very surprised if that is the problem. I'll give it a go anyhow.

    Thanks for your help

    Thursday, March 27, 2014 8:49 AM
  • also look at
    Hyper-V: Update List for Windows Server 2012 R2

    Have a nice day !!!

    Thursday, March 27, 2014 10:48 AM
  • We have the same Problem. It is an issue with the 3PAR OS when ODX is enabled, zeroing blocks which are not to be zeroed. Contact HP 3PAR SPS to get the update mentioned here, or disable ODX on every 3PAR connected host. Every host Needs to be rebooted to really disable ODX!!!
    Friday, April 4, 2014 8:44 AM
  • Hi Matthias

    I have already disabled ODX in registry, but haven't done the patch yet. My understanding is that if ODX is disabled, it is no longer an issue. Our problem remains the same.

    Have you applied the patch, and if so, did it fix your problem? 

    Friday, April 4, 2014 10:31 AM
  • Hi there

    We have exactly the same issue with our environment.
    We are using Hyper-V 2012 attached to a FC 3Par. We already installed Patch 30 on our 3Par. ODX is disabled since autumn last year.
    In addition to the "VDS Basic Provider" Error, we encounter disk errors, which say: "The IO operation at logical block address 353031e1 for Disk 1 was retried". This error is logged for multiple disks.
    Last we also have CSV Errors, which randomly warn, that the CSV is offline: "Cluster Shared Volume 'Volume3' ('Volume3') is no longer available on this node because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished."

    Some VM's crash randomly or the cluster becomes unresponsive.

    Windows is up to date and all the recommended Hyper-V patches are installed.

    Any help would be greatly appreciated.

    Regards,
    Reto


    Monday, April 28, 2014 6:39 AM
  • Try ...

    HP 3PAR and support for ODX

    Upgrade the HP 3PAR OS on the HP 3PAR StoreServ Storage to 3.1.2 MU2 or later if running a lower HP 3PAR OS version. Next apply the patch as follows:

    • For 3.1.2 MU2 and 3.1.2 EMU2, apply Patch 11 followed by Patch 36.

    • For 3.1.2 MU3, apply Patch 30.

    HP 3 PAR http://h20566.www2.hp.com/portal/site/hpsc/public/psi/advisoriesResults/?sp4ts.oid=5335712


    Have a nice day !!!

    Monday, April 28, 2014 6:54 AM
  • We already have 3Par OS 3.1.2 MU3 Patch 30 installed..
    Monday, April 28, 2014 6:57 AM
  • We also have the same problem and we have IBM "Scale Out Fileservers" as storage.

    ODX is disabled.

    All Server are 2012 r2 with the latest update.

    Wednesday, April 30, 2014 10:15 PM
  • Hello,

    Same issue here. 3Par 7200 running OS 3.1.3, parallel backup default 3. Also the timeout error that Reto gets above and complete node crash. All updates and hotfixes are on that I could find. Will disable ODX for now and see how that goes.

    Regards,

    Peter

    Tuesday, May 6, 2014 10:51 PM
  • We opened a MS Case and they sent us a list with Hotfixes to install.

    Do you already have some of them installed? And did it fix something?

    http://support.microsoft.com/kb/2929869  - CSV snapshot file is corrupted when you create some files on the live volume in Windows

    http://support.microsoft.com/kb/2913695  - OffloadWrite is doing PrepareForCriticalIo for the whole VHD in a Windows Server 2012 or Windows Server 2012 R2 Hyper-V host

    http://support.microsoft.com/kb/2871085  -  I/O failures occur when you create a snapshot for a large volume in Windows Server 2012 or Windows Server 2008 R2 SP1

    http://support.microsoft.com/kb/2842111   -   "Delayed Write Failed" error when an I/O stress test runs against a Windows Server 2012 failover cluster

    http://support.microsoft.com/kb/2889784 - Windows RT, Windows 8, and Windows Server 2012 update rollup: November 2013

    http://support.microsoft.com/kb/2934016 -  Windows 8, and Windows Server 2012 update rollup: April 2014

    Kind Regards,

    Reto Gobat

    Wednesday, May 7, 2014 6:12 AM
  • We have tha same problem with CSV on IBM FC Storage. But error in DMP appears only for one VM, others backed up successesfuly. Error about "Unexpected failure. Error code: 48F@01000003" appears in log many times.

    All hotfixes installed



    Thursday, May 15, 2014 9:02 AM
  • Hey Reto,

    Did installing the hotfixes resolve your issue?  We have similar issues on a FC CSV running EMC storage.

    For our customer, it looks like DPM is causing the lock on the VHD, and it then becomes unresponsive.  Previously this was resolved by rebooting the host.  However, we have found re-running the DPM clears the lock on the VM and we're then able to start it and log on.

    Shane

    Thursday, June 5, 2014 5:26 AM
  • Hey Shane

    No, after installing all Hotfixes provided by Microsoft and also upgrading the Firmware of our BladeServers / FlexFabric components, we had another crash last week.

    All the affected CSV's are FC attached.

    We don't have any locks on the vhd or VM, the CSV just goes offline and all VM's on this specific CSV will crash. Afterwards, we can boot the VM's normally.

    Reto

    Thursday, June 5, 2014 6:02 AM
  • This seems to be an age old issue with backing up CSV's....we used 2008R2, 2012, 2012R2 cluster with different storage and faced cluster crash when tried to do host level backups... MS support still don't acknowledge that its a problem..

    At the moment we are on R2, we still get CSV dropouts during backups but it has been bit more stable, haven't had a crash in couple of months...

    Tuesday, June 10, 2014 12:00 AM
  • We are getting the same error. Some VMs - usually the same ones - stick in backing up state when backed up by DPM 2012 R2. Can't cancel the backups. Hard powering off DPM sometimes resolves. Today DPM hung powering off and caused additional issues.

    Do we think this is a Windows Server VSS issue or a DPM issue?
    Tuesday, June 17, 2014 9:33 AM
  • Are you running FEP on your hosts?  FEP doesn't like any backup products.  We have an open incident with MS about this, but putting in an exclusion for DPMRA.exe seems to fix the issue for DPM.
    Tuesday, June 17, 2014 7:45 PM
  • Hi,

    We have got the same issue here with IBM Blade Hyper-V host Windows 2012 R2 installed on them and DPM 2012 R2.

    We don't have any antivirus installed, but all the above mentioned fixes installed on windows hosts.

    Any solution would be appreciated.

    Tuesday, June 24, 2014 3:34 PM
  • Hi all,

    is there a solution? We have Veeam as a Backup Solution doing a good job for month, but getting also this f... error! So i do not think it is a DPM Specific Error!

    We have FC attached NetApp in a 6 Node Cluster. And yesterday 2 Nodes Crashed and 1 VM was corrupt.

    After reboot and Restore(!) of the VM all fine ;((

    Please let us know if someone has an idea!!??

    thx Stefan


    Azure Pack rocks -> ACP Cloud Services Austria - www.acpcloud.rocks

    Wednesday, August 5, 2015 9:55 AM
  • Yea.. same here, I have Veeam, and it was fine then all the sudden this error message, and all my VMs crashed.. I spoke with MS before, and they think it's one of the node and asked me to down it and rebuild it, and it was fine for 2 months, until yesterday...

    Really it is something has to do with VSS inside Windows, but MS not confirming it... 

    Sunday, September 13, 2015 5:19 PM
  • Also sam issue here. We've tried everything (BIOS, firmware, drivers, updates, hotfixes...voodoo) but problem still remains. We are receiving on a daily basis events (Unexpected failure. Error code: 48F@01000003;

    'STATUS_UNEXPECTED_NETWORK_ERROR(c00000c4)';

    'STATUS_NO_SUCH_DEVICE(c000000e)'.

    Tuesday, October 20, 2015 11:34 AM
  • Same here, with all latest updates
    Tuesday, November 1, 2016 9:31 AM
  • Are you guys having ODX enabled? From what I see in support cases (3rd party backup vendor here ;-)) if this isn't solved with patches, it is 9/10 solved by disabling ODX...

    <a href="http://www.veeam.com/microsoft-hyper-v-server-backup-recovery-replication.html?ad=technet-sign-mike">#1 Hyper-V Backup</a> – award-winning solution from Veeam Download free

    Wednesday, November 2, 2016 8:44 AM
    Moderator
  • As per: http://www.aidanfinn.com/?p=14231

    I definitely have ODX disabled

    And this error shows in log over & over & over again

    Monday, January 23, 2017 10:47 AM
  • Hello, we are faced with a similar problem:

    We have four 2012R2 datacenter Hyper-V hosts with two CSV's to a HP MSA 2040 FC SAN and backup with Backup Exec 2015FP5 with all patches.

    All hosts are have all updates and odx is disabled.

    When BE beginning to create a snapshot on the host hyper-v in the event log on host is an error code: 48F @ 01000003 and CSV usually to short time interval terminates any operation I/O. If it is a not short time interval in the event log: System Error 5120 Cluster Shared Volume "CSV Name" has entered a paused state because of '(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished. These occur several times then may VM's crash with BSOD "no system disk".


    Friday, February 10, 2017 10:08 AM
  • Hi!

    Similar problems here. 3x2012R2 hosts Hyper-V Cluster with CSVs on PureStorage //m20 SAN connected with FC.

    I also get 48F @ 01000003 when backups are running but even worse, the cluster is going crazy without any active backup-jobs. Getting errors

    • 5120 CSV entered paused state (c000020c)
    • 5142 CSV no longer accessible (1460)
    • ntfs - disk has been surprise removed
    • and subsequent 1069 cluster resource failed, error 0x3 ('The system cannot find the path specified.').

    After cluster-restart I need to restart several VMs because after VMs crashing and auto-restarting they went up without access to their disks residing on the CSVs.

    The Cluster was running w/o troubles for ~4 month - problems suddenly startet 10 days ago after host-reboot (incl. updates). Rolled back the updates but problems remain.

    I just disabled ODX today after reading this thread. Need to reboot hosts asap.

    Monday, February 27, 2017 11:52 AM
  • Hi!

    Update: Since disabling ODX no further problems - seems to have solved it for me.

    Regards,
    Stefan

    Monday, March 6, 2017 10:19 AM
  • Hi!

    Update: Since disabling ODX no further problems - seems to have solved it for me.

    Regards,
    Stefan

    I have disabled ODX on all my servers too

    Get-ItemProperty -Name FilterSupportedFeaturesMode -Path HKLM:\system\currentcontrolset\control\filesystem 
    Set-ItemProperty -Name FilterSupportedFeaturesMode -Path HKLM:\system\currentcontrolset\control\filesystem -Value 1

    Waiting for next backup to happen.


    Regards, Ilkin

    Wednesday, March 8, 2017 11:02 AM
  • :-( Didn't help.

    Regards, Ilkin

    Thursday, March 9, 2017 8:55 AM
  • Not sure if this will help anyone or not, but we suffered from weekly outages of CSVs and saw this amongst other errors. We are using Veeam to backup VMs. The hosts themselves are not backed up. They are a "Template" in VMM so a rebuild is not too difficult.

    What finally stopped our outages was to group backup jobs by "application aware" and "non-application aware". So all "application aware" VMs (i.e. Sharepoint, SQL, AD, Exchange) were consolidated into a single job. All other "non-application aware" VMs in another. We actually have several "non-application aware" jobs  (i.e. a job for web server VMs, job for file server VMs, job for admin app VMs, etc.) as they don't seem to cause an issue even running at the same time.

    We also made sure not to mix any VMs in a single job that had "dissimilar storage", meaning one VM on CSV, one on local disk, for example.

    Not sure that was an issue, but do know for sure that the "Application Aware" jobs are what was causing the major crashing. By consolidating those jobs/VMs into one job the involved VSS writers remained open and active while those VMs were backed up and then closed properly once the job completed, no chance of another "application aware" job trying to keep it open or close it prematurely which can cause false notices to VSS causing it to hang, which then trickles back through the VSS layers and can cause "paused I/O" on the CSV and finally a crash of the CSV. This configuration also helps eliminate intermixing of different VSS writer layers being active at one time within a single job/session/stream, which can lead to cluster confusion. Application aware uses more layers of VSS, each "proxying" for one another to reach the final destination (i.e. Host, Cluster, Hyper-V, VM, Hyper-V agent, application) if any part of that hiccups it can backlash all the way back to the host/cluster.

    If you have large "application aware" VMs or a large number of them, this may not be ideal or even feasible, but it is what worked for us...

    Anyway... Hope it helps...


    Sean

    Friday, March 10, 2017 1:32 AM
  • Using Veeam since ever. Never have any crashes, backup always finishes fine, yet log is littered with

    Unexpected failure. Error code: 48F@01000003

    which honestly, stopped bothering me

    Friday, November 10, 2017 10:56 AM