Error, Validate Disk Failover
- I have a two node win2008 failover cluster, attached to SAN, and for two weeks ago I disovered that the Cluster Validation doesnt pass anymore. It stops on one of the shared disks (LUNs) and the error is "Failed to offline cluster disk 0 from node servername.domain, failure reason: More data is available." ??
It is only the Validation test that fails, it is no problem to move the resource from one node to the other
I have read the "ValidateStorage.log" and the only error I can find is
"CprepDiskOnline: Failed to get volume name for device \\?\GLOBALROOT\Device\Harddisk4\Partition1\, status 3"
but I get that for all the for all "Harddisks" even on disks where the test is ok?
I have run chkdsk without any errors
Is the failover test the problem or?
Appreciate all help I can get
//NPan
Antworten
- it is very little data you give us to work with here.
But, in these cases you should involve your storage vendor to look at this
I know, a kind-a generic answer, sorry about that
rgds,
Edwin.- Als Antwort markiertChuck Timon [MSFT]MSFT, ModeratorSamstag, 6. Februar 2010 13:55
Alle Antworten
- it is very little data you give us to work with here.
But, in these cases you should involve your storage vendor to look at this
I know, a kind-a generic answer, sorry about that
rgds,
Edwin.- Als Antwort markiertChuck Timon [MSFT]MSFT, ModeratorSamstag, 6. Februar 2010 13:55
- Thanks for the reply anyway
-The error message "Failed to offline cluster disk 0 from node servername.domain, failure reason: More data is available." make me belive that I should be able to find more information somewhere, can anybody point me any direction were to look for that information?
-Have anybody else seen the message "CprepDiskOnline: Failed to get volume name for device \\?\GLOBALROOT\Device\Harddisk4\Partition1\, status 3" in the file "ValidateStorage.log" when doing a successfil "Cluster Validation"??
//NPan - Apply SP2 for W2K8 and re-run validation.
Chuck Timon Senior, Support Escalation Engineer (SEE) Microsoft Corporation - I will do that and tell you how it goes
thanks
//NPan - And?
Chuck Timon Senior, Support Escalation Engineer (SEE) Microsoft Corporation And, what does the error mean and what is in SP2 that fixes this?
I have the same problem and I get the same errors and I already have SP2.
Where it fails in the Validation Report:
=========================
Getting partition table for cluster disk 9 from node USSECAMPSQ2K801.us.na.ey.net
Arbitrating for cluster disk 9 from node USSECAMPSQ2K801.us.na.ey.net
Bringing online cluster disk 9 on node USSECAMPSQ2K801.us.na.ey.net
Failed to online cluster disk 9 from node USSECAMPSQ2K801.us.na.ey.net, failure reason: The system cannot find the file specified.
Failed to online cluster disk 9 from node USSECAMPSQ2K801.us.na.ey.net, failure reason: The system cannot find the file specified.
and
PhysicalDrive9 {49af0f87-6ddf-4e50-8705-f99b08932999} clustered. Disk partition style is GPT. Disk partition type is BASIC. Disk is Microsoft MPIO based diskWhat I find in the storage log:
=====================
0000325c.00003398::23:26:22.782 CprepDiskOnline: Failed to get volume name for device \\?\GLOBALROOT\Device\Harddisk9\Partition2\, status 3
0000325c.00003398::23:26:23.796 CprepDiskOnline: Volume guid for partition \\?\GLOBALROOT\Device\Harddisk9\Partition2\ is \\?\Volume{aab383ec-3072-4c0b-bd4e-28529be15dc4}\C:\Windows\system32>net helpmsg 3 - The system cannot find the path specified.
I see 2s & 3s around these in the cluster log
C:\Windows\system32>net helpmsg 2 - The system cannot find the file specified.
Although it successfully finds the failing disk's signature in the cluster log and in the validation report.David Morgan, Lowly Microsoft PFE. :-)
- Hi Npan and David
Did you have a word with your storage vendor for this. Which particular storage test it fails in validation.
Gaurav Anand - Yes. HP Is all over it. They've escalated to their Tier 4 but everything they look at seems to be ok as far as hardware is concerned. They've been engaged and involved for about three weeks now. This has also been looked at by 4 Microsoft engineers and one Cluster Program Manager. They all concurr this is a hardware issue but if that's the case a whole bunch of people can't locate it.
Fails during Validate Disk Failover. Passes disks 0 thru 8 and fails on 9. Have seen it fail on other disk numbers but rarely.
Do you know if the numbers used in the test are the actual physical disk numbers or does the test just report I did the first dis (0), I did the second disk (1), etc.
Do you know where this GUID resides that the test is looking for?
Failed to get volume name for device \\?\GLOBALROOT\Device\Harddisk9\Partition2\ - Hi David
http://screencast.com/t/ZDI0YTg0NDY [for guid query]
for me they are the actual physical disk numbers in the validation test i just ran. you can correlate by opening the mounted devices key in regsitry and cluster validation.log [list all disks]
Gaurav Anand- Als Antwort vorgeschlagenGaurav.Anand Freitag, 22. Januar 2010 11:10
- Of all those log entires the second one does appear in my cluster logs right before the GUID noit found errors which cause the validation to fail and when the disks fail to come online on node 1. We know this isn't a PR issue. I know there is a copy of the GUID in the registry for4 this disk as I've verified signatures and GUIDs several times over. I 'think' this GUID also lives on the physical disk and I 'think' Partition Manager is the one looking for it on the disk. If I'm correct, I've not found anyone anywhere who knows the answer to that question, then Partition Manager is able to find the signature and not the GUID which fails that disk resource.
Past troubleshooting actions included deleting the MountedDevices and the clusdisk signature records and allowing the OS to rebuild them on reboot and failover. I've verified that all the signatures are there and match the other nodes record and also the signatures on the physical disks themselves.
So if Partition Manager is seeking those GUID and can't find them who is preventing that. It's mighty confusing that PM can find the signatures whiich tells us it can see the disk but can't seem to see the GUID. Even more confusing is taht we have self healing in 2008 wher if a 1034 event is generated the OS can write a new signature if it can find a unique disk ID and match it in the registry and then it could bring it online. Well we see the signature so why does the GUID matter anyway if it's trying to find it on the disk???
Aiieeeeeee this is driving me nuts. :-)