2008 R2 Hyper-V iSCSI problems
I am in the process of testing a Hyper-V cluster setup with 2008 R2 RTM. I am having problems that seem related to the iSCSI connection to my SAN. I've been experimenting with the RC release and have really had no problems for quite awhile, until now. My environment includes the following
HW:
2 x Supermicro SuperServer 8045-3RB4x Intel 7330 quad core Processors
16x 4GB 667mhz RAM = 64GB
2x 300GB 15k rpm SAS drives mirrored on a LSI MegaRAID controller
2x onboard Intel 82575eb NICs
2x Intel Pro 1000 PT quad port NIC low profile (one for iscsi, one for VMs)
All NICs running Intel ProSet 14.3 drivers
Windows Server 2008 R2
1 x Winchester SX2388R iSCSI SAN
This model has two controllers for redundancy each with 4x Gigabit Host ports
12x 7200rpm 1TB SAS drives
2x 4 drive RAID5 LUNS
1x 1GB (intended for quorum disk)So at this point I've installed the OS. Added Hyper-V. Installed Failover clustering, MPIO, and SNMP. I established four connections to my SAN from one quad port NIC (two to each SAN controller trunk using MCS). At this point I setup my cluster using one supermicro box and another highend intel workstation. I did this only very temporarilyso my production VMs would remain running on one supermicro box using Hyper-V 1.0 with VMs on the local disk until I could migrate them over to the cluster, then reformat with R2 and and join the cluster. After creating the cluster and enabling CSV on one LUN, I moved 10(but not the most important) of my VMs to the CSV and ran them for 4 days. My VMs are not "highly available" yet so that they would not fail over to my workstation. So today, I come in and see all my VM status at Saved-Critical.
I go straight to my event viewer and fine many iSCSIPrt errors and warnings. These include:
1. Target did not respond in time for a SCSI request. The CDB is given in the dump data.
2. Initiator sent a task management command to reset the target. The target name is given in the dump data.
3. The description for Event ID 129 from source iSCSIPrt cannot be found. Either the component that raises this event is not installed or is corrupt...The following information was included with the event: \device\RaidPort1
These are repeated over and over every minute for hours.
There are also a bunch of Failover cluster errors that talk about physical disk not able to be brought online. These are all related to the dropped iscsi connections.
I also had a few of these errors:
4. The computer has rebooted from a bugcheck. The bugcheck was: 0x0000009e (0xfffffa80327ad1c0, 0x00000000000004b0, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 081909-50107-01.
So within the iSCSICPL I show connections to both my SAN controllers. I show devices associated with the connections. When I go to Disk Manager I only see my c: drive and my DVD drive, no iSCSI drives. No amount of iscsi service resets, reboots, or reconnects have made my disks reappear. I ran for a few days flawlessly and was ready to start my move to full production when this happened. The 10 VMs that I was running are very low load with the exception of an ORION npm monitoring VM and a System Center Essentials VM. Has anyone else had problems with iSCSI connections in R2? Could this be related to my temporary cluster setup that I was using to transition me to production? Any help here would be appreciated. I know there can't be too many people trying Hyper-V clustering with R2 RTM and iSCSI but I had so much success earlier that I was ready to jump right in anyhow.
As a final update while I was writing this, I reset my SAN and finally had my iSCSI drives show up. Does this point to a problem with my SAN, or a possible iSCSI initiator compatibility problem?
Thanks in advance,
EPLtech
Answers
Hi,
According to the error message, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.
To obtain the phone numbers for specific technology request please take a look at the web site listed below:
http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607
Hope the issue will be resolved soon.
Vincent Hu
- Marked As Answer byVincent HuMSFT, ModeratorMonday, August 24, 2009 2:52 AM
All Replies
Hi,
According to the error message, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.
To obtain the phone numbers for specific technology request please take a look at the web site listed below:
http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607
Hope the issue will be resolved soon.
Vincent Hu
- Marked As Answer byVincent HuMSFT, ModeratorMonday, August 24, 2009 2:52 AM
- In my experience with Hyper-V, iSCSI has to be carefully thought out.
Any significant I/O load, will quickly overwhelm budget Gbps switches, resulting in frequent volume disconnects and BSODs.
Using high-end Gbps switches (like Cisco 3750) will cure the problem.
Good luck,
Vic - Interesting, what type of switches have you seen iSCSI problems on? How much I/O load is likely to overwhelm it? The switches I'm using are probabily "budget". I am using two dedicated HP Procurve 1800-24Gs. The fact that I cannot find too much in the way of specs on them probabily indicates budget equipment. I will open a case with Microsoft if the problem occurs again so that I have a fresh dump file, but so far so good...
Hi,
There is only a normal home used switch in my test environment and the cluster works fine. However, you may need a good switch in production environment for the performance purpose.Vincent Hu
We have same issue as OP, with same error messages in event log. It has been identified that if 3 - 4 vms are booted at the same time, the NIC being used by MS iSCSI initiator drops the link several times until the VMs reach Windows logon screen. We actually went live and had to dump all the VMs back to hyper-v r1 due to issues.
Hosts are connected directly to Infortrend iscsi SAN to eliminate switches and network problems. Issue is reproducable every time. SAN vendor unable to help as they have not yet certified on 2008 R2.
This definately seems to be pointing to iSCSI initiator.
- Hmm, couldn't you delay the start of some of the VMs to help allievate the problems. This option is under Automatic start action in each VM's settings. If you stager them a minute apart the disk I/O shouldn't overwhelm the iSCSI connections. Its painful to be among the firsts using a new product, but I think I'm nearing ready to move to production on R2. I would like to see some attention given to the new iSCSI initiator though. I have had many problems properly breaking connections on serveral types of equipment in R2. Even with disks offline I still cannot drop the iSCSI connection, even after reboots. This problems occured with both my production Winchester SAN and a consumer ReadyNAS Pro.
- I'm having a very similar problem with almost an identical setup, a SuperMicro server with 64GB of RAM and running Server 2008 R2. The onboard NICs are used for networking, and an additional Intel Pro 1000 PT dual port card is used for the SAN connection. Basically the system seems fine until I boot up a VM, then it drops the SAN connection. Have you found a fix for the issues you were having?
Beau
- We are having a similar problem also with identical setup on a HP ProLiant DL140 G3 server using an Intel 1000 PT NIC to the SAN.
The VM's are booting fine, but the iSCSI connection gets dropped when VM's attempt to do backups using VSS which place the iSCSCI connection under heavier load. These backups are not set to run at the same time, and don't always course the iSCSI connection to be dropped.
The same setup has worked fine under Windows 2008 SP1 with Hyper-V 1 for over 18 months with no issues. The issue has only started happening since reinstalling this server with Windows 2008 R2 with Hyper-V 2. Other servers connected to the same SAN using Windows 2008 SP1 under much heavier load are not having their connections dropped.
It would seem there is problems with iSCSI connections with Windows 2008 R2 that courses the iSCSI connection to drop under load. - I started a post http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2general/thread/5f457f8b-174a-454d-b263-350222c9b3d4 about a similar issue before finding this.Wondered if anyone has any updates on this issue.
- In my case the problems went away completely when I removed McAfee from the Hyper-V host system. Everything seems fine now. I've also re-built the host systems with a core install.Beau
- Hyper-V hosts should always be dedicated hypervisors without any other roles or software installed (with the exception of System Center agents). Windows 2008 R2 Core (Enterprise or Data Center) edition with the Hyper-V role only or Hyper-V Server 2008 R2 should be used in production environments.
Mark - What version of firmware do you have on your array?
AFAIK firmware version can be a problem, since few models require fw upgrade. For example A16G-2130-4 requires fw version 3.6x to suppoort Windows 2008.

