none
SQL Server Virtual Machine Live Migration Too Slow RRS feed

  • Question

  • Hi, I have a two node, Hyper-V Cluster with about a dozen VMs. As time went by migrating between nodes have slowed. Specifically a SQL Server with about 300 GB disk takes about four minutes to move from one node two the other, and during the process I see it's status as "shutting down" making it offline for a noticeable while.

    The servers are Dell PowerEdge 530s with a Compellent (Dell) SC2020 with low latency. I recently connect to 10 GB NIS together peer to peer as a cluster only network in hopes to reduce migration time. It did not seem to make any difference.

    Anyone have ideas on why the migration is slow and how to speed it up?

    Thanks,

    Ken


    Ken

    Monday, September 16, 2019 7:32 PM

Answers

  • Hi All,

    The issue was corrupt VM configuration files. The solution was to create a new VM attaching it to the old VM's disk.

     Thanks for your help,

    Ken


    Ken

    • Marked as answer by Ken Travis Monday, October 7, 2019 2:00 PM
    Monday, October 7, 2019 2:00 PM

All replies

  • Hi,

    Thanks for posting in our forum!

    Try to update your NIC firmware and drivers, and BIOS.

    During the live migration, make sure that no other application is running, and I noticed that your disk has 300G, so it's normal that it takes a little longer.

    Cheers,

    Daniel


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, September 17, 2019 8:42 AM
    Moderator
  • Size of the VM's disks should not have any impact on the time it takes to perform a live migration unless you are including storage migration, but in a cluster, that is pretty meaningless unless you are trying to rebalance storage.  In that case, though, you could perform storage migration as a completely separate step.

    Memory volitivity is a primary factor in determining how long a live migration takes. If memory is changing rapidly while the live migration is taking place, then it will take more passes to get all the changed memory across.  Memory size of the VM is the first factor for how long it takes.  Have you happened to increase the size of the SQL VM's memory?

    Also, you state that the SQL VM goes offline.  That means it is not performing a live migration but a quick migration.  If it had been performing a live migration and is now performing a quick migration, something changed in your environment.  And, yes, a quick migration will likely take longer than a live migration because the VM's memory must be written to disk, the VM ownership moved to the other node, the memory content read into the new host, and the VM restarted.


    tim

    Tuesday, September 17, 2019 1:26 PM
  • Hi Tim,

    I did increase the memory slightly a while ago. The amount of memory is small compared to what must people commit to their SQL servers - 6 Gb (not dynamic).

    I can think of nothing that has changed in my environment that might have caused live migration to stop working. How can I check for and fix it?

    Thanks,

    Ken


    Ken

    Tuesday, September 17, 2019 1:52 PM
  • Hi Daniel, Everything is up to date except the NIC drivers. A Dell tech told me not to update the NIC drivers because they where having cases where that broke the cluster in a way that's hard to fix. Not really happy with that, but don't want to break my cluster.

    Thanks,

    Ken


    Ken

    Tuesday, September 17, 2019 1:59 PM
  • Do you have the VM CPU configured for compatibility mode?  I have seen instances where tow hosts with the same CPU were considered 'different' because of different stepping levels.  Live migration would not occur until CPU compatibility was set.  As a general rule, I tend to set all VM CPUs to compatibility mode in a cluster to avoid something like this, and it rarely, if ever, impacts performance.

    tim

    Wednesday, September 18, 2019 1:08 PM
  • Hi Tim,

    I set the CPU in compatibility mode, and retested live migration - same thing. I was mistaken about it saying "shutting down", it says "stopping". Stopping takes the majority of the time, then it says "starting" which doesn't take long.

    Ken


    Ken

    Thursday, September 19, 2019 2:23 PM
  • Yes, stopping a SQL instance does take the majority of time.  Need to figure out why it is shutting down.

    Off the top of my head, I don't know what would cause the VM to stop.  Since it is a SQL VM, you will find more SQL HA experts over in their forum - https://social.technet.microsoft.com/Forums/en-US/home?forum=sqldisasterrecovery.  Maybe they would have some better ideas.  If you post over there, be sure to include information about versions of OS and SQL in use.  Also if there is any difference in the hosts of the cluster, i.e. same CPUs, purchased at same time, etc.


    tim

    Friday, September 20, 2019 2:16 PM
  • Hi All,

    The issue was corrupt VM configuration files. The solution was to create a new VM attaching it to the old VM's disk.

     Thanks for your help,

    Ken


    Ken

    • Marked as answer by Ken Travis Monday, October 7, 2019 2:00 PM
    Monday, October 7, 2019 2:00 PM
  • Great Job!


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, October 8, 2019 1:10 AM
    Moderator