Answered by:
Physical disks have status "In Maintenance Mode" on restart of node in S2D cluster

Question
-
Restart of nodes in s2d cluster results in some of the physical disks having status 'In Maintenance Mode"...I have only been able to retire the drives, remove from the storage pool, and add back in.
S C:\Users\administrator.PAO2K> $pd=get-physicaldisk | where {$_.operationalstatus -eq 'in maintenance mode'} PS C:\Users\administrator.PAO2K> $pd FriendlyName SerialNumber CanPool OperationalStatus HealthStatus Usage Size ------------ ------------ ------- ----------------- ------------ ----- ---- PAOVIRT10_1I:3:3 PDNLH0BRH8N1M5 False In Maintenance Mode Warning Auto-Select 894 GB PS C:\Users\administrator.PAO2K> $pd | set-physicaldisk -usage 'retired' PS C:\Users\administrator.PAO2K> get-storagejob Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal ---- ---------------- ----------- -------- --------------- -------------- ---------- Repair False 00:13:24 Running 0 Repair False 00:00:00 Completed 100 Repair False 00:11:26 Running 0 Repair True 00:00:02 Suspended 0 0 228975443968 Repair True 00:00:06 Suspended 0 0 289910292480 Repair True 00:00:02 Suspended 0 0 252329328640 Repair True 00:05:30 Suspended 0 0 74088185856 PS C:\Users\administrator.PAO2K> get-storagejob Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal ---- ---------------- ----------- -------- --------------- -------------- ---------- Repair False 00:13:47 Running 5 Repair False 00:00:00 Completed 100 Repair False 00:11:47 Running 14 Repair True 00:00:05 Running 6 7257718784 115156713472 Repair True 00:00:09 Running 14 20581187584 144944660480 Repair True 00:00:18 Running 15 19383189504 126164664320 Repair True 00:05:46 Suspended 0 0 74088185856 PS C:\hp_scripts> get-virtualdisk | repair-virtualdisk PS C:\hp_scripts> remove-physicaldisk -physicaldisks $pd -storagepoolfriendlyname s2dflash Confirm Are you sure you want to perform this action? Removing a physical disk will cause problems with the fault tolerance capabilities of the following storage pool: "S2DFlash". [Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"): y PS C:\hp_scripts> get-storagejob Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal ---- ---------------- ----------- -------- --------------- -------------- ---------- Repair False 00:00:00 Completed 100 Repair False 00:00:00 Completed 100 Repair False 00:00:00 Completed 100 Repair False 00:00:00 Completed 100 RemovePhysicalDisk False 00:00:00 Completed 100 PS C:\hp_scripts> get-storagejob PS C:\hp_scripts> $pd=get-physicaldisk -canpool $true PS C:\hp_scripts> $pd FriendlyName SerialNumber CanPool OperationalStatus HealthStatus Usage Size ------------ ------------ ------- ----------------- ------------ ----- ---- HP LOGICAL VOLUME PDNLH0BRH8N1M5 True OK Healthy Auto-Select 894.22 GB PS C:\hp_scripts> get-storagepool s2dflash | add-physicaldisk -physicaldisk $pd PS C:\hp_scripts> $pd | set-physicaldisk -newfriendlyname "PAOVIRT10_1L:3:3" PS C:\hp_scripts> get-storagejob Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal ---- ---------------- ----------- -------- --------------- -------------- ---------- AddPhysicalDisk False 00:00:00 Completed 100 PS C:\hp_scripts> start-scheduledtask "optimize storage pool" PS C:\hp_scripts>
Tuesday, September 5, 2017 12:50 PM
Answers
-
See this KB article in reference to this issue:
Thanks!
Elden- Proposed as answer by Anne HeMicrosoft contingent staff Wednesday, September 13, 2017 2:42 AM
- Edited by Elden ChristensenMicrosoft employee Sunday, October 1, 2017 4:52 PM
- Marked as answer by Elden ChristensenMicrosoft employee Sunday, October 1, 2017 4:52 PM
Wednesday, September 13, 2017 12:44 AM -
problem and solution documented here. Note that the kb article referenced does not seem to be for server core installations (which is mine).
https://bcthomas.com/2017/09/bug-when-applying-kb4038782-september-cu-to-storage-spaces-direct-clusters/
Thursday, September 21, 2017 3:25 PM
All replies
-
Hi jeffyb,
We are researching on this issue now, if we get any progress, we'll feedback as soon as possible.
Best Regards,
Anne
Please remember to mark the replies as answers if they help.
If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.Wednesday, September 6, 2017 9:12 AM -
I also have this issue after a reboot of one of the nodes.
All disks on that node are "In Maintenance" and repair jobs are suspended.
This is a 2 node cluster and I'm now unable to set this "failed" node in maintenance mode through VMM because of "A clustered space is in a degraded condition and the requested action cannot be completed at this time"
The urgent issue now is that this server needs to be stopped tomorrow because of a hardware error..
Any quick fix for this?
Have tried Optimize-StoragePool, Repair-VirtualDisk with no luck.
I found that there are a parameter to the command "Repair-ClusterStorageSpacesDirect" called "DisableStorageMaintenanceMode" but the explanation and impact of this is quite unclear.
_____________________________________ /Michael R
Wednesday, September 6, 2017 2:12 PM -
Here's the impact:
HP LOGICAL VOLUME PDNLH0BRH8P95V False {Stopping Maintenance Mode, In Maintenance Mode} HP LOGICAL VOLUME PDNLH0BRH8P95V False {Stopping Maintenance Mode, In Maintenance Mode} OCZ Z-Drive 6000 3200GB E83A_9710_0015_2401. True {Stopping Maintenance Mode, OK} 0XXXXX1190020740OCZ000Z63000004T00035000 E83A_9710_0014_7B01. True OK PAOVIRT13_1I:3:3 PDNLH0BRH3265C False OK PAOVIRT10_1I:3:3 PDNLH0BRH8N1M5 False OK PAOVIRT13_1I:3:2 PDNLH0BRH3265C False OK PAOVIRT11_NVMe CVF8543300381P6BGN-1 False OK PAOVIRT12_NVMe CVF8543300771P6BGN-1 False {Stopping Maintenance Mode, OK} PAOVIRT12_NVMe CVF8543300771P6BGN-2 False {Stopping Maintenance Mode, OK} PAOVIRT11_NVMe CVF8543300381P6BGN-2 False OK PAOVIRT10_NVMe CVF85474000Q1P6BGN-2 False OK PAOVIRT10_1I:3:1 PDNLH0BRH8N1M5 False OK PAOVIRT12_1I:3:2 PDNLH0BRH8P95V False {Stopping Maintenance Mode, OK} PAOVIRT12_1I:3:4 PDNLH0BRH8P95V False {Stopping Maintenance Mode, OK} P
Neat, huh?
If i get it cleaned up i'll post how.
Wednesday, September 6, 2017 6:48 PM -
okay, odd. When i ran repair-clusters2d -disablestoragemaintenancemode, it just sort of hung, and after a few minutes (see above; only a subset of drives are shown, with the goofy status) i punted and killed the command and restarted that node (paovirt12). It restarted and went through the repair, rebalance, optimize storage jobs ok (as is normal on a node shutdown), but the drive status was still goofy. I waited for all the storage jobs to finish and then repeated the repair command, and voila, all is well, including the original 2 drives that i had trouble with (the two "hp logical volume" drives. beats retiring, removing, adding, and re balancing. Maybe i didn't wait long enough before punting... Next time.
- Proposed as answer by Anne HeMicrosoft contingent staff Wednesday, September 20, 2017 2:16 AM
Wednesday, September 6, 2017 7:35 PM -
Hi, I found a "quick fix" myself that worked in our case.
As I mentioned before I was unable to put the node in maintenance mode using VMM or Pause Drain Roles using Failover Cluster Manager because of a degraded space.
What I ended up doing was "Pause - Do Not Drain Roles" through FCM.
As soon as the node was successfully paused I just resumed the node.
The physical disks were no longer "In Maintenance Mode" and the repair started.Easy when you find the correct procedure and hopefully this could help someone else in the same situation.
/Michael R
- Edited by Michael Rosén Thursday, September 7, 2017 5:35 AM
- Proposed as answer by Anne HeMicrosoft contingent staff Thursday, September 7, 2017 5:54 AM
Thursday, September 7, 2017 5:34 AM -
Hi jeffyb,
1. When the issue occurs, please verify if the cluster nodes are in a normal status, please run Cluster Validation Wizard to check if it pasts all tests;
2. Please check if the storage pool's capacity is enough, we may also try if evict the issued cluster node, and re-add the node to the cluster once the physical disk and virtual disk re-balance are completed could work.
Best Regards,
Anne
Please remember to mark the replies as answers if they help.
If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.- Proposed as answer by Anne HeMicrosoft contingent staff Tuesday, September 12, 2017 6:15 AM
- Unproposed as answer by jeffyb Tuesday, September 12, 2017 12:18 PM
Friday, September 8, 2017 7:51 AM -
Hi jeffyb,
Just to check if the above reply could be of help, and do you get any progress with your issue, welcome to feedback.
Best Regards,
Anne
Please remember to mark the replies as answers if they help.
If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.Tuesday, September 12, 2017 6:15 AM -
cluster nodes are normal, plenty of free pool space.
Wednesday, September 13, 2017 12:28 AM -
See this KB article in reference to this issue:
Thanks!
Elden- Proposed as answer by Anne HeMicrosoft contingent staff Wednesday, September 13, 2017 2:42 AM
- Edited by Elden ChristensenMicrosoft employee Sunday, October 1, 2017 4:52 PM
- Marked as answer by Elden ChristensenMicrosoft employee Sunday, October 1, 2017 4:52 PM
Wednesday, September 13, 2017 12:44 AM -
We are having a similar issue. 3 node S2D cluster, Paused 1 node to install Monthly Windows Update for September.
After install of Sept update and reboot the disks for that server are in Maintenance Mode. One of the VDs is Unhealthy with Operational status - No Redundancy.
Because of the unhealthy VD we cannot Pause - Drain Roles.
It has been like this for 24 hours so this doesn't appear to be something that clears with time.
MichaelR's suggestion of Pause - Do Not Drain succeeded in Pausing the server, but after Resume the disks still show In Maintenance Mode.
Todd Hunter
- Edited by TodddHunter Thursday, September 14, 2017 9:45 AM
Thursday, September 14, 2017 9:41 AM -
I had disks stuck in Maintenance Mode after applying KB4038782.
This was only happening on one node, so ran the following, as per the suggestions above:
Repair-ClusterStorageSpacesDirect -DisableStorageMaintenanceMode -Node <node1>
Now when I check Get-PhysicalDisk everything is healthy.
Get-StorageJob has also started to run all the jobs again.
Pausing and resuming the node in powershell or failover cluster manager didn't help resolve this.
Thursday, September 14, 2017 10:13 AM -
Running Repair-ClusterStorageSpacesDirect -DisableStorageMaintenanceMode did reset the Physical Disks to Healthy. Currently all the physical disks are Healthy.
1 of the 3 VDs is still Unhealthy with No Redundancy.
I ran Optimize-StoragePool and Repair-Virualdisk but the VD still shows same state.
The server has been rebooted.
No success getting the VD back to Healthy status yet.
Todd Hunter
Thursday, September 14, 2017 11:20 AM -
This worked for me too.
Found an easy way to link physical disks to a node...put one node in to maintenance mode, then run this:
Get-PhysicalDisk | ? OperationalStatus -eq "In Maintenance Mode" | Set-PhysicalDisk -Description "host name that is in maintenance mode"
Then resume that host, pause another and do this again until you've added the host name all the disk descriptions. Now when you run get-physicaldisk you can easily know what host they belong to.
- Edited by Brian at Strathcona Tuesday, September 19, 2017 4:36 PM
Tuesday, September 19, 2017 4:18 PM -
Hi jeffyb,
You may help to mark the useful resolutions in this thread as answer, so that the useful information can be highlighted.
Best Regards,
Anne
Please remember to mark the replies as answers if they help.
If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.Wednesday, September 20, 2017 2:30 AM -
problem and solution documented here. Note that the kb article referenced does not seem to be for server core installations (which is mine).
https://bcthomas.com/2017/09/bug-when-applying-kb4038782-september-cu-to-storage-spaces-direct-clusters/
Thursday, September 21, 2017 3:25 PM -
See the KB article, and yes it does apply to Server Core
Thanks!
EldenSunday, October 1, 2017 5:09 PM