High Availablity and Hyper-V issues
- I have setup a 2 node cluster in Server 2008. I am using Hyper-V and have 1
VM. The VM has 2 dynamically expanding VHDs associated with it
The 2 physical nodes are connected to an iSCSI SAN. I have made available 2
LUNs on the SAN to the 2 nodes. They are presented to the nodes as the Q:
and F: drive. The Quorum resides on the Q: drive and the VHDs reside on the
F: drive.
When I failover the VM to another node, I get the following message in the
System Log:
Log Name: System
Source: vhdparser
Date: 8/8/2008 10:32:23 AM
Event ID: 1
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: VS-NODE1.domain.lcl
Description:
The parent virtual hard disk appears to have been modified without using the
differencing virtual hard disk located at
"...\\VM_Disk2_8B00376C-FFEA-4B6A-9100-31AD08694698.avhd". Modifying the
parent virtual hard disk may result in data corruption. It is strongly
recommended that you lock the parent virtual hard disk to prevent this in the
future. If you recently changed time zones on your computer, you can safely
continue using this virtual hard disk.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="vhdparser" />
<EventID Qualifiers="32773">1</EventID>
<Level>3</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2008-08-08T14:32:23.455Z" />
<EventRecordID>9251</EventRecordID>
<Channel>System</Channel>
<Computer>VS-NODE1.adamscounty.lcl</Computer>
<Security />
</System>
<EventData>
<Data>
</Data>
<Data>...\\DocIMG_Disk2_8B00376C-FFEA-4B6A-9100-31AD08694698.avhd</Data>
<Binary>00000000020030000000000001000580000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>
Also, the when I failover the VM to another server, the VM is still listed in hyper-V Manager as "Saved" on the server that does not host the VM. I get the following in the System Event Log:
Log Name: System
Source: Microsoft-Windows-Hyper-V-High-Availability
Date: 8/12/2008 9:34:04 AM
Event ID: 21502
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: VS-NODE2.domain.lcl
Description:
'Virtual Machine Configuration Server1_VM' failed to unregister the virtual machine with the Virtual Machine Management Service.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Hyper-V-High-Availability" Guid="{64e92abc-910c-4770-bd9c-c3c54699b8f9}" />
<EventID>21502</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x2000000000000000</Keywords>
<TimeCreated SystemTime="2008-08-12T13:34:04.442Z" />
<EventRecordID>8243</EventRecordID>
<Correlation />
<Execution ProcessID="3432" ThreadID="3168" />
<Channel>System</Channel>
<Computer>VS-NODE2.domain.lcl</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">Virtual Machine Configuration Server1_VM</Data>
<Data Name="ResourceGroup">'Virtual Machine Configuration Server1_VM' failed to unregister the virtual machine with the Virtual Machine Management Service.</Data>
</EventData>
</Event>
答案
- Okay...
You have two NICs physically connected to a network. You intend to use both for iSCSI.
Then set up the iSCSI initiator and bind it to these two NICs.
Now, you will need to physically connect another NIC to dedicate host maangement traffic on.
You will need to connect yet a fourth NIC to dedicate to VM traffic
All assuming your origional configuration.
From the NIC configuration you describe, you already have a virtual switch (not created by failover clustering).
The NIC with only the Microsoft Network Switch Protocol is being used by a virtual switch. Do not modify its protocol bindings. However this _is_ the physical NIC where you would disable TCP Offloading (if you need to / desire to).
You manage virtual network switches under Virtual Network Manager which is available from within the Hyper-V manager console.
You will have to check your iSCSI setup, see what your failover is for the iSCSI connections.
and, you will want TCP Offloading running on your iSCSI NICs.
Your heatbeat and management traffic can run around between any NICs that are commonly exposed (where the hosts can 'see' each other). This can happen in any multi-homed server.
The only way to prevent this is a combination of manual IP configuration and disabling the Host network adapter that is created for External virtual network switches.
I know, this is a ton of stuff to absorb, especially if it is all new.
Brian Ehlert (hopefully you have found this useful)- 已标记为答案Mike Sterling [MSFT]MSFT, 所有者:2008年8月14日 17:29
全部回复
- Here is the key.
The rule is one VM per host LUN.
This means that one storage LUN can only house one VM (configuration, VHD, snapshot files, etc.)
There is a Failover clustering patch that may modify this behavior that can be found here:
http://support.microsoft.com/?id=951308
Second:
Your entire VM must be housed on the LUN that is being failed over.
Hyper-V (by default) seperates the snapshots and configuration files from the base VHD. You (as an admin) have to consiously tell Hyper-V to put all the bits of a VM together in one place for HA.
This can be done after VM creation:
http://itproctology.blogspot.com/2008/05/versatility-of-hyper-v-export-import.html
Third:
Make sure that you don't have any ISO files attached to your VMs.
Failover Clustering frequently barfs on this, as the vmguest.iso is held in the Host system root and frequently prevents failing over.
There are bunches of threads about Failover Clustering with Hyper-V.
the biggest thing to remember is the one VM per LUN rule, and that the LUN fails with the VM workload. That is how MSFT Failover Clustering works.
The patch I mentioned above might modify this behavior but I have not explored it yet to verify.
Also some other items to check out:
http://itproctology.blogspot.com/2008/05/hyper-v-plus-failover-clustering.html
http://itproctology.blogspot.com/2008/05/how-do-i-stop-failover-clustering-long.html
Brian Ehlert (hopefully you have found this useful) Tanks for the information. I do have it setup so there is one VM on 1 LUN. I created 2 LUNs and presented them to the host server as a 2GB Q:\ drive (Quorum) and a 500GB F:\ drive (VHD)
When I created the cluster in 2008, it assigned the 2GB Q:\ drive as the Quorum. The 500GB F:\ drive that the Host server sees, it used to store my 2003 VM. The guest 2003 VM has 2 VHDs attached to it (which are both stored on the F:\ drive on the Host server)
It looks like this...
Dell MD3000i-----iscsi-------NODE2(Server2008 Hyper-V cluster)
| |
iscsi |
| |
| |
NODE1 (Server2008 Hyper-V cluster)----LUN1(Q:[Quorum]), LUN2(F:[VHDs])
|
|
|
Server 2003 R2 (C:\ [VHD1, D:\[VHD2])
That is the correct configuration, right? Hope that makes sense.
- Yep, that looks right
(good ASCII art BTW)
I am guessing that your issues is that your snapshot files are not being stored with the VHD.
(the avhd files are on the Host system partition hidden under %programdata% and not with the VM itself).
this is where the article I sent about using Export comes into play.
The Export process takes all the bits of a VM (no matter where they may reside now) and puts them all togehter in the same folder (this is the important thing for your case). Then it fixes all the bits up (differencing disks, config files, etc.).
If your case, power down your VM.
Export your VM to the root of F:\ (the Export process will create a folder using the VM's name)
Once the Export is done, delete your origional VM (this removes the configuration file, but nothing else)
then Import selecting the folder that Export created.
After doing this your VM should fail back and forth just fine.
Brian Ehlert (hopefully you have found this useful) I changed the default snapshot path to F:\{VMname}before I took a snap. I then took a snap and it stored it under {VMname}\Snapshots on the F:\ drive.
I did take notice on NODE2 in Hyper-V manager, the Snapshot path was still the default whereas on NODE1 it was showing the clustered disk path F:\{VMname}. I guess it was having issues with the snapshot file when it was failing over?
I'm applying that hoftix and seeing if everything is going to play nicely. If not, I'll do the export/import method you said above.
PS...I'm happy the was the ASCII art came out, I have trouble drawing stick figures in real life :)- 已编辑Daveyd123 2008年8月12日 18:42add stuff
- Another question. Looking at the sketch above, NODE1 and NODE2 (both Dell 1950s) and connected to the MD3000i (iSCSI) with 4 Broadcom NICs, 2 in each 1950.
From my research, it looks like I should disable "Large Send Offload" on all physical Broadcom NICs. Is that correct? If so, should I also disable it on the Virtual Switch that the cluster created as well as the physical Intel NIC that the VM uses? - Disabling TCP Offloading (not just large send, you will find many TCP Offloading options on some NICs).
Is srtictly for VM perfomance on Virtual Switches.
In your case only do this on the Physical NICs used by Virtual Switches.
You will not be creating virtual switches on your iSCSI NICs or your management NIC, only those NICs used for virtual machine traffic.
Your iSCSI NICs will be dedicated to iSCSI traffic, your management NIC will be dedicated to mgmt (and heartbeat traffic).
I am assuming that you know how to configure a Windows Server to isolate traffic over specific NICs ;-)
Brian Ehlert (hopefully you have found this useful) - I have my 4 iSCSI NICs (2 in each server) connected to a Cisco 3750. The MD3000i and the 2 Host server iSCSI NICs are the only thing attached.
When I created the cluster, it removed all bindings on one of my Intel NICs and left only Virtual Machine Network Services and Microsoft Network Switch Protocol. It created another Network Connection, which I had to name in the cluster creation. The properties of that contain all the IP settings, Client for Microsoft Networks, etc. I assume that is the Virtual Switch?
So, I would disable TCP Offloading just on the physical NIC and not the new one the cluster created, correct? I do have the option of disabling it on the NIC the cluster created.
Also, I am wondering if I should also disable it on the iSCSI NICs because I have a weird issue...When a host reboots, the "Favorite Targets" in the iSCSI initiator change. I have confirmed all iSCSI settings to the Target are correct before a reboot. The iSCSI NICs (192.168.130.x, 192.168.131.x) are on a different subnet than the other physical NICS (192.168.0.x, 10.0.0.x). The MD3000i (192.168.130.x, 192.168.131.x) is also on the same subnet as the iSCSI NICs. After a reboot the Favorite Targets change source IP addresses from the iSCSI NICs to the Heartbeat NIC(10.0.0.1) or the LAN NIC(192.168.0.9).
One more thing...Is it normal for the VMs network connection to be showing as 10.0 Gbps? I just found that kinda cool, even though I'm not on and 10 gig- 已编辑Daveyd123 2008年8月13日 15:42add stuff
- Okay...
You have two NICs physically connected to a network. You intend to use both for iSCSI.
Then set up the iSCSI initiator and bind it to these two NICs.
Now, you will need to physically connect another NIC to dedicate host maangement traffic on.
You will need to connect yet a fourth NIC to dedicate to VM traffic
All assuming your origional configuration.
From the NIC configuration you describe, you already have a virtual switch (not created by failover clustering).
The NIC with only the Microsoft Network Switch Protocol is being used by a virtual switch. Do not modify its protocol bindings. However this _is_ the physical NIC where you would disable TCP Offloading (if you need to / desire to).
You manage virtual network switches under Virtual Network Manager which is available from within the Hyper-V manager console.
You will have to check your iSCSI setup, see what your failover is for the iSCSI connections.
and, you will want TCP Offloading running on your iSCSI NICs.
Your heatbeat and management traffic can run around between any NICs that are commonly exposed (where the hosts can 'see' each other). This can happen in any multi-homed server.
The only way to prevent this is a combination of manual IP configuration and disabling the Host network adapter that is created for External virtual network switches.
I know, this is a ton of stuff to absorb, especially if it is all new.
Brian Ehlert (hopefully you have found this useful)- 已标记为答案Mike Sterling [MSFT]MSFT, 所有者:2008年8月14日 17:29
Server 2008 clustering and Hyper-V is pretty new to me. I have a 2 node Virtual Server 2005 Host cluster that has 6 guests up and running for quite awile now. That;s why I got a little confused about the creation of the Virtual switch that Hyper-V creates. I don't have one in VS 2005.
Each physical host has 4 NICs. 2 embedded Broadcom and a dual port Intel NIC. In each server, the 2 embedded Broadcom NICs are being used for iSCSI NICs. They are connected to a dedicated iSCSI Gigabit switch (Cisco 3750) One port of the Intel dual port NIC is being used for the cluster Heartbeat. The other port is being used for LAN communication/Host Management
Dell MD3000i
[Controller0], [Controller1]
| | | |
| | | |
(0,0) (0,1) (1,0) (1,1)
| | | |
| | | |
Cisco 3750(dedicated iSCSI)----------------------------------------------
| | | |
| | | |
| | | |
NODE1[Brdcom1], [Brdcom2], [Intel1], [Intel2] NODE2[Brdcom1], [Brdcom2], [Intel1], [Intel2]
| | | |
| |__________cluster heartbeat_________ |______|
| |
| |
| |
Cisco 3750 (Public LAN)--------------------------------------------------
So, I should leave TOE enabled for Brdcom1,2 on both servers and disable it from Intel1 on both servers?
BTW, exporting and then importing the VMs seemed to help with the vhdparser errors :)
Hre is the link I was referring to about TOE... http://forums.technet.microsoft.com/en-US/winserverhyperv/thread/9e3a057b-34e2-4430-af96-9ba863cc4a71/- Anyone know how this was resolved if it ever was? I have the same exact problems with almost the same exact hardware. I even have the funny iscsi "weird issue" mentioned above.
When creating a cluster of Hyper-V hosts, it is the responsibility of the administrator to create a virtual network on each host. Any / All virtual network switches must be named identically across all hosts.
Failover Clustering actaully manages the failover of the VMs, however Failover Clustering does not manage the virtual network switches, it simply looks for a virtual network of an identical name on the host that it is to fail to.
You will get a configuraiton error when attempting to use one of the failover clustering wizards if this is not the case.
In regards to the TOE on Broadcom NICs: Drilling into it appears to be related to disabling large send offloading. Actually, disabling all TCP Offload functions might smooth things out (as it generally does) in situations of networking weirdness.Exporting and then importing the VMs forces all of the VM components into a nice, neat, single folder, that resides on a shared volume. It fixes many HA vm issues.
Brian Ehlert (hopefully you have found this useful)- There is only 1 virtual network switch. It is named the same on both nodes.
I'll start disabling all offloading on the iscsi NICs once the VM problems are gone. The network weirdness is minor at present.
Why is this necessary for quick migration to work?
By the way I do not use drive letters for my LUNs. Only GUID. - Why is Failover Clustering necessary for Quick Migration?? (and Live Migration in R2)
Hyper-V is just taking advantage of an existing feature of Windows Server (Failover Clustering) to enable a feature for Hyper-V. Re-use of an existing feature in a new way. (why rebuild code that exists)
Failover Clustering already has all of underlying smarts to keep applications Highly Available, why not VMs as well... Quick migration is just the name for invoking a failover of a HA vm from one node to another.
In turn Hyper-V gets the benefits - one big one is that the Hyper-V nodes can be geographically seperate (they do not have to reside in the same datacenter - of course proper planning and design is necessary)
In R2, Clustering comes into play again by enabling Cluster Shared Volumes, and Live Migration.
Actually, Hyper-V takes advantage of many features from many other product groups, and is in turn helping those other prodcut groups advance into other areas. It touches the Windows Networking, Windows Storage, Clustering, teams to name just a few off the top of my head.
Brian Ehlert (hopefully you have found this useful) - No you misunderstand.
Why is it necessary for the customer to use the export/import feature to get this to work correctly for a product my company paid $ for? I don't see this documented as being necessary. In fact I followed a Microsoft document for HyperV and Clustering and setting it up and it made no mention of this. It made no mention of having to have everything in one location. I followed all the steps and it failed the planned failover. It shuts down the VM instead of saving state. I get the same errors as the person above.
- And did the export, then the import as mentioned. Then Failover Clustering couldn't start it, it failed. So I removed it from Failover Clustering. VM now gone and I can't reimport again. Another manual install of Windows now. Few more hours of my time here. If only this worked without having to do an export/import in the first place. But no of course not. It doesn't work out of the box. You have to do all kinds of "admin magic" to get 2 Microsoft products to work together. Heck make it 3 since the operating system is Microsoft as well. The Clustering, the VM software and the operating system all from the same company. This thing should be bulletproof. Nice intro to all this I'm having.
- It isn't required, it isn't necessary.
However, it has been discovered to be a good work around for situations when a VM is created and not all the tweaks are done to it that need to be done.
I do believe that the documentation does cover that the VHDs and the snapshots of hte VM must reside on shared storage, and one LUN per VM.
Shutting down, instead of saving...
That sounds more like it is a function of the OS in the VM.
Also, a saved state can only be passed between similar processors (because the saved memory state must be restored).
There are behavior changes coming in the R2 release to address some of hte specific things that you mention.
The first question we alsways ask is: Is the Hyper-V host fully patched? Were the Integration Components within the VM updated following patching the host?
Their revisions need to match for everything to be happy.
Brian Ehlert (hopefully you have found this useful) - I cannot comment on your personal experiences.
However, Hyper-V is a v1 product. It still needs refinement.
Does it work? yes
Are there tricks? yes
Will there be fewer tricks moving forward? yes
I think back to all the tricks I used to do with WinFrame to get applications to run (Windows NT 3.5.1) many moons ago. Then Terminal Services in NT4. Today, folks don't think twice about having to do the things that we used to do back then.
Hyper-V is still evolving. And will continue to do so.
Where are you resetting yourself back to?
Rebuilding the hosts and the cluster?
Rebuilding the VM?
If rebuilding the VM, put the VHD on its shared LUN, before installing the OS, change the snapshot location to the same folder. (this is the big setting that most folks miss, that ends up causing problems later on).
Be sure the LUN that the VHD is on is being managemd by Failover Clustering (since you already have a cluster).
SCVMM also fits here, as it enforces many of thees little rules and tricks to give a better experience.
Brian Ehlert (hopefully you have found this useful) - It looks like I need to rebuild the VM as when I deleted it from Failover Clustering it removed the whole VM from the LUN on the SAN. And as mentioned you can't reimport from the export again. So yes this import/export has set me back another day.
It's been days troubleshooting this. We're not doing anything difficult here. We're trying to use quick migration (I would think a pretty important part of this whole thing) working. The machines are identical Dell 2900 servers using a Dell MD3000i SAN. Configured very much like the original poster. Both servers have Windows 2008 enterpise. Both servers have been patched. The VM wasn't created until all patchin was done. No other misc software is on nodes. The machines pass the cluster verification tests.
Previously the VHD and snapshots were on the one LUN. Only thing that wasn't was the config I believe. Not sure if I go through all this again how I can prevent from having to do this again.
We have SCVMM but I'm not even close to bothering with that and all it's requirements. Trying to get basic quick migration working first. - I easily recreated the VM. Just overwrote the created VHD with the one I had. The config is stored on the same LUN. I have not created a snapshot yet. Therefore that shouldn't be an issue even though I haev snapshots set up, actually by default, to be stored on the same location as the VHD. I use GUID notation for the LUn and no drive letter when specifying a path.
- The only other issues that come to mind is that Hyper-V is at the RTM level (patch KB950050 is installed, or SP 2 is installed).
Not installing this patch gives you the beta for Hyper-V, not the final release (yes, very confusing).
There are also some Failover Clustering patches specific to Hyper-V..
Here is a comprehensive list of patches:
http://technet.microsoft.com/en-us/library/dd430893(WS.10).aspx
And I assume that you have been all over this document as you have been pulling your hair out..
http://technet.microsoft.com/en-us/library/cc732181(WS.10).aspx
Brian Ehlert (hopefully you have found this useful) - These machines were both installed with Windows 2008 Enterprise SP2 already applied directly downloaded from the MS eOpen site. This was downloaded in May.
- Everything works fine now. :) The difference I see this time is that there is only 1 config showing for the VM under Failover clustering. Previously there was a configon top, then a VM then another config with a (2) after it under the VM. Like 2 configs for the same VM. I thought that looked odd.
Here's what I did.
1. Recreated VM via wizard on one node. Same name as previous one. Had it create a VHD which was same name as previous VHD.
2. Copied old VHD (one from import that I had put in temp directory) ovewriting one created during wizard in step 1. This saved OS install again
3. Went thru failover wizard to make application highly available.
4. Brought VM online in failover clustering. It was in saved state. (this was because default Hyper is to save state on shutdown and I had shut down a note to get LUN associated with other node where I had created VM)
5. Tested quick migration that now works in 15 seconds if that and properly saves state instead of shutting down (crashing) VM.
Will do further testing.
Looks like basically I just needed to recreate the VM. - Glad you got it going.
Brian Ehlert (hopefully you have found this useful) - If you could provide a link or expand further on traffic isolation that would be helpful. I think I have it right but I'm not sure.
Greg Rowley - I am using 2008 R2. I have followed this config and have iSCSI working, and can perform live migrations. The only difference is that I have a dedicated private network for the cluster. Everything is working except I am randomly losing internet connectivity on my publicly accessible NIC's. This is a major problem. I am currently testing this in a lab environment but my organization want to move forward with deployment which is impossible until we get this issue resolved. I have no idea what is going on. Does anyone have any suggestions?
- Traffic isoaltion in the virtual worls is no differnt than traffic isolation in the physical world.
I have been practicing it for years, but never found a good document about it. (Maybe Bill knows of a good source - he is pretty sharp on the networking)
In your case, I would check your gateway addresses if you think you have a routing issue.
Be sure that a machine (Host or VM) only has one default gateway. And that it resides on the network that you want its traffic to reside on.
In regards to strange packet loss problems...
Disable TCPOffload features (checksum, large send, etc) on the Physical NIC of the Hyper-V host (the one the Virtual Network is using)
Check that the VM is not multihomed (more than one interface with a default gateway).
This is mostly troublesome when you get two insterfaces on a single subnet, but can also play when you have multiple NICs and multpile subnets.
Brian Ehlert (hopefully you have found this useful) - I have 6 networks configured for my iSCSI SAN and Hyper-V with failover clustering. 3 of the networks are dedicated to the SAN, 2 for iSCSI and one for managing the SAN. I have 1 dedicated network for the cluster. These networks are private and only contain IPv4 addresses and subnets, no gateways, no DNS. The subnets are all separate. The 5th network is for my VM's and is on the same subnet as my 6th network which is used for remote management. Is this possibly the problem?
If it is, how do I configure my VM's to publicly accessible and still maintain remote access on one subnet?
Also, who is Bill? - Bill is another MVP who frequents this forum..
You state:
The 5th network is for my VM's and is on the same subnet as my 6th network which is used for remote management. Is this possibly the problem?
If you are using R2:
For 5th network - virtual network - uncheck the box that the host should have a managment interface on this network
6th network - I am assuming no virtual network is attached - dedicated to Host.
Also, on the 5th Network - it is the host Physical NIC associated with this virtual network where I would disable TCPOffload settings.
Brian Ehlert (hopefully you have found this useful) - I have done all of those things. I have just removed the Hyper-V role and destroyed the cluster to see if I get internet/remote connectivity back on the management interface. Then I'll just go one step at a time until I hit the point where it breaks. I welcome any other suggestions.
P.S. I am still having the problem with Hyper-V and clustering uninstalled. Obviously it is unrelated to this forum. - Hi
This is Venkat From India and I have problem with GAL which is how can i updated global address list to all users at time in MS Exchange server 2007
please help me .
thanks ,
venkat