locked
Hyper-v Replication Health Critical RRS feed

  • Question

  • I Have Hyper-V 2012 R2 on primary and replica server, and everything was working fine (I copied a seed) and the replication started and finish successfuly within a minute every five minutes, also the test failover was great.

    This morning, I see the Replication Health is Critical and
    "Last successful replication for virtual machine "???"  was more than 10 minutes ago, replication might be encountering problems", and more than 20 % of replication have been missed.
    The Status still increasing and saying "Receiving changes (??%)", but it's very very slow for more than 3 hours as i know, where in normal case it took a minute.

    There is no big change happened on the replicated VM on primary server last night "normal day",
    I have windows backup running on the primary server "The Host", started at 9 PM finish about 11 PM every day successfully.
    I have slow upload bandwidth (10 Mbps download and 1 Mbps Upload), but in normal case it was fine.

    I clicked view events (under Hyper-V-VMMS /Admin): I noticed that there is

    Error 32552 " Hyper-V coudl not Replicate changes for virtual machine"???" because the replica server refused the connection. this may be because there is a pending replication operation in the replica server for the same virtual machine which is taking longer than expected or has an existing connection "

    Followed by
    Warning 32315  " Hyper-V failed to replicate changes for virtual machine '???" Hyper-V will retry replication after 5 minutes".

    Togther four times in different times last night ( at 9:30 PM, 1:27 AM, 5:20 AM and 5:26 AM).


    • Edited by Sam_Kar Thursday, April 3, 2014 4:09 PM
    Thursday, April 3, 2014 4:07 PM

Answers

  • Hi, Sam,

    Do you have Dell software installed in your environment? For instance, Dell HitKit might break the Hyper-V replication. If this is the case, then, you can keep the HitKit installed, but un-register the storage provider on each Hyper-V host.

    Also, what number is set in "Additional Recovery Points" options of the parent host? With additional snapshots in place, you might face the situation when replica server is occupied with “snapshot merging” activity, not allowing replication task to proceed.

    As to our own experience, we used to have similar problems, which we sort of bypassed, using scheduled PS script that resumed hung replication tasks. This way, we at least didn't have to babysit stuck replication, manually resuming it, etc.:

    Get-VMReplication * -Computername “Name of your hosts” | Where {$_.state -ne "Replicating"} | Resume-VMReplication

    However, we finally decided to implement Veeam as our primary backup and disaster recovery solution, and we are quite happy with it.

    Kind regards, Leonardo.

     

     

     



    • Edited by Leonardo Muller Friday, April 4, 2014 7:56 AM Typo
    • Marked as answer by Sam_Kar Friday, October 14, 2016 4:30 PM
    Friday, April 4, 2014 7:51 AM

All replies

  • Hi, Sam,

    Do you have Dell software installed in your environment? For instance, Dell HitKit might break the Hyper-V replication. If this is the case, then, you can keep the HitKit installed, but un-register the storage provider on each Hyper-V host.

    Also, what number is set in "Additional Recovery Points" options of the parent host? With additional snapshots in place, you might face the situation when replica server is occupied with “snapshot merging” activity, not allowing replication task to proceed.

    As to our own experience, we used to have similar problems, which we sort of bypassed, using scheduled PS script that resumed hung replication tasks. This way, we at least didn't have to babysit stuck replication, manually resuming it, etc.:

    Get-VMReplication * -Computername “Name of your hosts” | Where {$_.state -ne "Replicating"} | Resume-VMReplication

    However, we finally decided to implement Veeam as our primary backup and disaster recovery solution, and we are quite happy with it.

    Kind regards, Leonardo.

     

     

     



    • Edited by Leonardo Muller Friday, April 4, 2014 7:56 AM Typo
    • Marked as answer by Sam_Kar Friday, October 14, 2016 4:30 PM
    Friday, April 4, 2014 7:51 AM
  • Thank you for your assistance, My servers are Dell, but the only application from dell I had Dell System Detect and i removed it anyway.

    For the Time being i still choos to Maintain only the latest recovery point, with no aditional recovery points, till i check the stability.

    I thought that when windows Backup running might cause the issue, but i stopped it for couple days and the issue still exist.

    The point is that the issue is fluctuated, it comes and goes. I'm watching the Primary server internet throughput right now.

    thank you for the script, is this run automatic when the Replicatd VM on the Primary server status is Paused, and it just resume it?

    Monday, April 7, 2014 3:21 PM
  • Sorry, I just decided to keep revisiting my TechNet forum.

    yes, the replication process getting better in general, just some times need to be resumed due losing internet for short period, restart the host, failover between clusters nodes, backup.... sometimes it lost it's certificate and simply need to re add it from the settings of the VM on HyperV only.

    I used your script command as well in some servers... all works fine.

    thank you

    Friday, October 14, 2016 4:36 PM