none
Extremely slow BITS deployment of new VM from library using SCVMM 2008 R2.

    שאלה

  • Our configuration:

    • 2 x Windows 2008 R2 Storage Servers in a Windows Cluster with clustered file server CIFS share. Both servers are connected to the network with 1GB connections, and 2 x 1GB iSCSI connections to the back-end cluster storage. This server's CIFS share is used as the library, and performance to all CIFS folders on this clustered file server are pretty fast over the network.
    • 1 x Windows 2008 R2 server running SCVMM 2008 R2 SP1 (+ hot fixes) w/ SQL Express backend. Server is connected to the network with a 1GB connection.
    • 7 x Windows 2008 R2 Core servers running in a Hyper-V cluster. All servers are connected to the network with the following: 1 dedicated 1GB network connection for management. 1 dedicated 1GB network connection for cluster, CSV, and Live Migration functions. 1 dedicated 1GB network connection for guest VMs. 2 shared 1GB network connections for iSCSI access (shared means the hosts and guests have virtual NICs on them - and this works really well and speedy).

    Currently when we try to deploy a new virtual machine using a template in the SCVMM library with a 70GB fixed size VHD, it takes over 2 hours (closer to 3) for the BITS file transfer of the 70GB VHD to complete. We disabled the encryption requirement for BITS, and that didn't seem to help.

    The network utilization on all NICs (host, iscsi, cluster, whatever) accross all systems (clustered storage servers and the Hyper-V cluster nodes) is 10% or less. Processors and memory on all systems are barely being used.

    We verified there are no GPO settings at all for BITS.  We also verified there are no corrupted/stale BITS jobs on the target Hyper-V cluster nodes that are taking forever

    We don't think it is any TCP offloading issues because everything else, including standard file copies, work at the normal expected speeds.

    We are stumped as to why BITS is taking hours to do something we can do in 1/10th the time with a standard file copy. We really need to shave some time off VM deployments as they seem to be taking longer every time we deploy one.

    Does anyone have any suggestions on how to kick non-encrypted BITS into high gear?

    Thanks in advance for the suggestions!

    יום שלישי 12 יוני 2012 22:04

תשובות

  • BITS is being used as a background job to avoid flooding, it can run in the foreground like a user file copy but SCVMM does not set this as it can flood the NIC.

    A 70Gb fixed disk will take a while. 

    One option is to use a dynamic VHD in the Library and convert it to fixed after the VM is deployed (if you are using fixed to avoid SAN thrashing).  This will reduce the data copied over the wire.  This is a sneaky way to handle it, but it is very effective.

    If your Hyper-V Server does not have a dedicated managment NIC you will also impact / reduce your throughput when it comes to SCVMM Template provisioning and 1Gb NICS.  I have seen reductions as much as 40% just because the Management NIC is shared on an External Virtual Switch.

    Quite honestly, those are my two big tips.  I think you will get the most bang out of suggestion one.  The impact of suggestion two is greater the other the OS (as behavior has improved with newer releases).


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    יום רביעי 13 יוני 2012 17:09
    מנחה דיון
  • To close the loop on this, we narrowed the primary issue down to a slow copy form our iSCSI based SAN to the VM hosts. Using BITS seems to compound the slowness problem by making the copy a background process which seems to double the amount of time required to perform the copy.

    We have upgraded our LeftHand SAN to the latest OS, including the updated HP DSM drivers, and through that effort and all of the other tweaks we think we have gotten file copies back to a reasonable speed of ~60MB/s. While this is by no means amazing speed over a 1GB datacenter switch infrastructure (we will continue tweaking and testing), it is working for the purposes of VM deployment as we fully understand where the choke point was/is, and that BITS is only slowing things down more.

    I would recommend to anyone else dealing with slow VM deployment to start with trying to manually copy the VHD out of your VMM library to determine if it is a basic Windows file copy performance issue like ours was/is.

    We are going to take Brian Ehlert's excellent suggestion (which is why I am going to mark his post one of the answers to our issues to give him credit) of switching our VM template to using a dynamic VHD because it really doesn't make any sense to copy over a very large and very empty file over the network.

    Thank you to all for the excellent suggestions and assistance, especially Brian Ehlert.

    יום חמישי 31 ינואר 2013 18:16

כל התגובות

  • BITS is being used as a background job to avoid flooding, it can run in the foreground like a user file copy but SCVMM does not set this as it can flood the NIC.

    A 70Gb fixed disk will take a while. 

    One option is to use a dynamic VHD in the Library and convert it to fixed after the VM is deployed (if you are using fixed to avoid SAN thrashing).  This will reduce the data copied over the wire.  This is a sneaky way to handle it, but it is very effective.

    If your Hyper-V Server does not have a dedicated managment NIC you will also impact / reduce your throughput when it comes to SCVMM Template provisioning and 1Gb NICS.  I have seen reductions as much as 40% just because the Management NIC is shared on an External Virtual Switch.

    Quite honestly, those are my two big tips.  I think you will get the most bang out of suggestion one.  The impact of suggestion two is greater the other the OS (as behavior has improved with newer releases).


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    יום רביעי 13 יוני 2012 17:09
    מנחה דיון
  • @Kristian Nese - PLEASE do not mark something an answer on a forum post that someone else started 2 days ago. Please allow the origniator of the post some time to respond and potentially mark something as an answer if they think so. IMHO Moderators should only be marking something as an answer if it was a direct answer to the question AND the post seems abandoned, neither of which was the case here.

    @BrianEh - Understood a 70GB VHD will take a little while. I'm thinking more like 30 minutes as opposed to 3 hours on a local LAN where all the connections are 1GB or higher. Something appears to bewrong with the performance in our situation, and I am starting to suspect the Windows Storage Servers and their connectivity to the iSCSI voumes. The file copy test I mentioned earlier was to a local hard drive on the Storage Server, not to the iSCSI LUN, and testing the iSCSI performance on the Windows Storage Servers seems to not perform as expected for 2x 1GB iSCSI NICs in an MPIO configuraiton on a local LAN.

    We double checked all of the iSCSI NIC and switch port settings, and updated the NICs on the Windows Storage Servers to the latest drivers and utlities to no avail. Our Windows Platforms DSE suggested we look at the following hotfix which we are going to test out ASAP: http://support.microsoft.com/kb/2675785/

    The concept of using a dynamic VHD and expanding it later is a very slick idea! While that doesn't address the performance issues we are seeing, I do think that could potentially speed things up (the file copy at a minimum) even more once the perofrmance issue is worked out.

    Our Hyper-V server does have a 100% dedicated management NIC.

    Thanks for the 2 suggestions, and thank you for responding as your advice is well respected on these forums!

    I will post back here whatever conclusion we come to.

    יום חמישי 14 יוני 2012 17:30
  • I spent some quality time with improving provisioning time when using SCVMM.

    The dynamic VHD in storage and expanding it after deployment solved the copy over the wire issue and met the requirement of reducing the potential for SAN thrashing (SAN specific, but well known).

    Beyond that, you can also parallelize your provisioning.  Not deploying more than 2 VMs to any one host at the same time.

    So, if you have a cluster - you can safely deploy one VM to each host at the same time.  Knowing that your storage will then be the bottleneck.  The number of hosts becomes your factor to divide your time by to get the savings in this case.

    My scenario was large VDI deployments, and I used SCVMM 2012 which has a number of scalability enhancements - but the rules apply in all situations.


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    יום חמישי 14 יוני 2012 17:42
    מנחה דיון
  • I went back and reviewed the last couple of virtual machine deployments through SCVMM 2008 R2, and confirmed they were taking 2&1/2+ hours to deploy a 70GB VHD (just the BITS task in a VM deployment job). On a local 1GB LAN that is simply not acceptable.

    We did deploy http://support.microsoft.com/kb/2675785/ as per our Micorosft Premiere Support Platforms DSE, and also http://support.microsoft.com/kb/2524478 since that is a recent addition to the Windows Cluster recommended hotfixes in http://support.microsoft.com/kb/2545685 (we had the others already applied to our Windows Storage Server cluster).

    This morning I went back to run another create VM test, and the same templated VM with a 70GB VHD took ~55 minutes to copy versus the 2&1/2+ hours we were seeing previously. Since those two hot fixes are the only things that changed since my last tests where the BITS VHD copy process was taking 2&1/2+ hours, we are assuming the TCPIP.SYS update had an improvement on the iSCSI performance and possibly the CIFS performance of the server. Additionally copying a 3GB ISO from the VMM Library share to my local desktop was averaging 10-11MB/second before the hot fixes, where now it's averaging 25-26MB/second.  And just to be sure I double checked with my network team and they didn't change anything since I ran my 10-11MB/second & 2&1/2+ hours tests.

    If memory serves me correctly, deploying the VM template from the VMM library when it was on local disk storage of the SCVMM server took ~45 minutes. So an additional 10 minutes when moving the library to a iSCSI volume front ended by a CIFS cluster isn't great but it isn't horrible. Definitely an improvement over the 2&1/2+ hours we were seeing, and we haven't deployed the TCPIP.SYS update on the Hyper-V servers (which might improve performance as well).

    I think we will still follow up on the dynamic size VHD template idea BrianEh had, thanks again for that idea, but we are back to provisioning virtual machines in about an hours so we are much happier than before.

    I will try this again on Monday ~1PM when our user load as at its peak just to verify the performance improvements stick and that today being Friday wasn't a fluke. Although I can't see how it being Friday could cause the performance to improve to the point where the time to deploy was reduced by 3/5ths.

    Keeping my fingers crossed.


    יום שישי 15 יוני 2012 15:16
  • I tried another 70GB VM deployment on Sunday, which should be the lowest point of utilization on everything, and I was getting a 4 hour deployment timeline which is just plain pathetic. I even tried switching the iSCSI NICs from Broadcom NICs to Intel NICs and I seem to get inconsistent performance there as well.

    I have to imagine the issue is with the VMM library and its lack of performance at this point in time. Since the Windows 2008 R2 Storage Edition cluster servers and iSCSI SAN are from the same vendor, I am going to open up a performance ticket with the vendor and will post back the results here.

    יום שני 25 יוני 2012 18:43
  • You wrote:

    >> We don't think it is any TCP offloading issues because everything else, including standard file copies, work at the normal expected speeds.

    I know that this post is quite old, but...

    Have you found the answer why your connection is so slow?

    I've faced the same problem:

    Dell Servers with Broadcom NICs.

    BITS over HTTP showed maximum 20% utilization of 1GigE and it lasted until we disable TOE, TCP Connection Offload (IPv4) and TCP Connection Offload (IPv6) on our NICs. After this change network utilization become 88-92%. Excellent!

    The problem was: BITS over HTTP is extremelly slow while TOE is enabled (but TOE doesn't have such huge influence on standard file copies)

     

    Sorry for my bad English :-)


    • נערך על-ידי B.Denis יום שישי 09 נובמבר 2012 13:54 images are too small
    יום שישי 09 נובמבר 2012 13:49
  • Thank you for reminding me to come back to this post. I have leaving ones dead/stale without some sort of closure.

    I have been unable to improve the performance of our 2 x 1 GB iSCSI connection from our file server to the iSCSI SAN. I believe that is where the choke point is in our enivornment, because when I try to copy a 3GB file from our File server iSCSI LUN to another host, the network throughput sucks. However if I copy that file to the local C: drive mirror first and then copy it another host, performance is about what I expect it to be.

    I seem to remember disabling all offloading on the iSCSI NICs to make sure they weren't the source of the poor I/O performance and I don't think that made a difference. I will ask my hardware guys to try it again though and if I get different results I will post it back.

    Question for you on the NICs you tweaked, were those your iSCSI NICs or were they for something else?

    Thansk again for the follow up!

    יום שישי 09 נובמבר 2012 15:43
  • try to install this KB http://support.microsoft.com/kb/2517329

    יום שני 12 נובמבר 2012 06:21
  • That hot fix is already installed on all of our Hyper-V servers.

    יום שני 12 נובמבר 2012 16:40
  • Just to back up a bit here and describe related things a bit.

    The BITS copy runs in a background mode, specifically to not overwhelm the network cards.  It can be forced into a foreground mode, but only if you drive it directly, not with SCVMM.

    The size of the VHD that you copy has great impact on the copy time.  Always store Dynamic VHDs in the SCVMM Library and convert them to fixed after deployment of the VM.  This can have a huge and significant impact by itself.

    The Hyper-V Server should have a physicla NIC dedicated as the management NIC.  If the management NIC is a virutal NIC that is shared with an Enternal Virtual Switch the throughput will be reduced, noticably.  This is best described as the managmeent OS getting a lower priority than the VMs it is managing.  All BITS traffic happens to the Hyper-V Server management NIC.

    Have no more than two concurrent deployments to a single Hyper-V Server at the same time (when on 1GB).

    Disable TCP Checksum offload, Large Send Offload and other related settings.  These need to be set in a granular way.  No idea what "Conneciton Offload" is, but it is not normally mentioned. This only needs ot be set on the Managment NIC of the Hyper-V Server, and also on the NIC of the SCVMM Library Server.

    Make sure that your Hyper-V Server only has one management NIC.  (not multi-homed)


    Brian Ehlert
    http://ITProctology.blogspot.com
    Learn. Apply. Repeat.
    Disclaimer: Attempting change is of your own free will.

    יום שני 12 נובמבר 2012 16:50
    מנחה דיון
  • To close the loop on this, we narrowed the primary issue down to a slow copy form our iSCSI based SAN to the VM hosts. Using BITS seems to compound the slowness problem by making the copy a background process which seems to double the amount of time required to perform the copy.

    We have upgraded our LeftHand SAN to the latest OS, including the updated HP DSM drivers, and through that effort and all of the other tweaks we think we have gotten file copies back to a reasonable speed of ~60MB/s. While this is by no means amazing speed over a 1GB datacenter switch infrastructure (we will continue tweaking and testing), it is working for the purposes of VM deployment as we fully understand where the choke point was/is, and that BITS is only slowing things down more.

    I would recommend to anyone else dealing with slow VM deployment to start with trying to manually copy the VHD out of your VMM library to determine if it is a basic Windows file copy performance issue like ours was/is.

    We are going to take Brian Ehlert's excellent suggestion (which is why I am going to mark his post one of the answers to our issues to give him credit) of switching our VM template to using a dynamic VHD because it really doesn't make any sense to copy over a very large and very empty file over the network.

    Thank you to all for the excellent suggestions and assistance, especially Brian Ehlert.

    יום חמישי 31 ינואר 2013 18:16