locked
Physical disks have status "In Maintenance Mode" on restart of node in S2D cluster RRS feed

  • Question

  • Restart of nodes in s2d cluster results in some of the physical disks having status 'In Maintenance Mode"...I have only been able to retire the drives, remove from the storage pool, and add back in.

    S C:\Users\administrator.PAO2K> $pd=get-physicaldisk | where {$_.operationalstatus -eq 'in maintenance mode'}
    PS C:\Users\administrator.PAO2K> $pd
    
    FriendlyName     SerialNumber   CanPool OperationalStatus   HealthStatus Usage         Size
    ------------     ------------   ------- -----------------   ------------ -----         ----
    PAOVIRT10_1I:3:3 PDNLH0BRH8N1M5 False   In Maintenance Mode Warning      Auto-Select 894 GB
    
    
    PS C:\Users\administrator.PAO2K> $pd | set-physicaldisk -usage 'retired'
    PS C:\Users\administrator.PAO2K> get-storagejob
    
    Name   IsBackgroundTask ElapsedTime JobState  PercentComplete BytesProcessed BytesTotal
    ----   ---------------- ----------- --------  --------------- -------------- ----------
    Repair False            00:13:24    Running   0
    Repair False            00:00:00    Completed 100
    Repair False            00:11:26    Running   0
    Repair True             00:00:02    Suspended 0               0              228975443968
    Repair True             00:00:06    Suspended 0               0              289910292480
    Repair True             00:00:02    Suspended 0               0              252329328640
    Repair True             00:05:30    Suspended 0               0              74088185856
    
    
    PS C:\Users\administrator.PAO2K> get-storagejob
    
    Name   IsBackgroundTask ElapsedTime JobState  PercentComplete BytesProcessed BytesTotal
    ----   ---------------- ----------- --------  --------------- -------------- ----------
    Repair False            00:13:47    Running   5
    Repair False            00:00:00    Completed 100
    Repair False            00:11:47    Running   14
    Repair True             00:00:05    Running   6               7257718784     115156713472
    Repair True             00:00:09    Running   14              20581187584    144944660480
    Repair True             00:00:18    Running   15              19383189504    126164664320
    Repair True             00:05:46    Suspended 0               0              74088185856
    PS C:\hp_scripts> get-virtualdisk | repair-virtualdisk
    PS C:\hp_scripts> remove-physicaldisk -physicaldisks $pd -storagepoolfriendlyname s2dflash
    
    Confirm
    Are you sure you want to perform this action?
    Removing a physical disk will cause problems with the fault tolerance capabilities of the following storage pool:
    "S2DFlash".
    [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): y
    PS C:\hp_scripts> get-storagejob
    
    Name               IsBackgroundTask ElapsedTime JobState  PercentComplete BytesProcessed BytesTotal
    ----               ---------------- ----------- --------  --------------- -------------- ----------
    Repair             False            00:00:00    Completed 100
    Repair             False            00:00:00    Completed 100
    Repair             False            00:00:00    Completed 100
    Repair             False            00:00:00    Completed 100
    RemovePhysicalDisk False            00:00:00    Completed 100
    PS C:\hp_scripts> get-storagejob
    PS C:\hp_scripts>  $pd=get-physicaldisk -canpool $true
    PS C:\hp_scripts> $pd
    
    FriendlyName      SerialNumber   CanPool OperationalStatus HealthStatus Usage            Size
    ------------      ------------   ------- ----------------- ------------ -----            ----
    HP LOGICAL VOLUME PDNLH0BRH8N1M5 True    OK                Healthy      Auto-Select 894.22 GB
    
    
    PS C:\hp_scripts> get-storagepool s2dflash | add-physicaldisk -physicaldisk $pd
    PS C:\hp_scripts> $pd | set-physicaldisk -newfriendlyname "PAOVIRT10_1L:3:3"
    PS C:\hp_scripts> get-storagejob
    
    Name            IsBackgroundTask ElapsedTime JobState  PercentComplete BytesProcessed BytesTotal
    ----            ---------------- ----------- --------  --------------- -------------- ----------
    AddPhysicalDisk False            00:00:00    Completed 100
    
    
    PS C:\hp_scripts> start-scheduledtask "optimize storage pool"
    PS C:\hp_scripts>

    Tuesday, September 5, 2017 12:50 PM

Answers

All replies

  • Hi jeffyb,

    We are researching on this issue now, if we get any progress, we'll feedback as soon as possible.

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, September 6, 2017 9:12 AM
  • I also have this issue after a reboot of one of the nodes.

    All disks on that node are "In Maintenance" and repair jobs are suspended.

    This is a 2 node cluster and I'm now unable to set this "failed" node in maintenance mode through VMM because of  "A clustered space is in a degraded condition and the requested action cannot be completed at this time"

    The urgent issue now is that this server needs to be stopped tomorrow because of a hardware error..

    Any quick fix for this?

    Have tried Optimize-StoragePool, Repair-VirtualDisk with no luck.

    I found that there are a parameter to the command "Repair-ClusterStorageSpacesDirect" called "DisableStorageMaintenanceMode" but the explanation and impact of this is quite unclear.


    _____________________________________ /Michael R

    Wednesday, September 6, 2017 2:12 PM
  • Here's the impact:

    HP LOGICAL VOLUME                        PDNLH0BRH8P95V       False   {Stopping Maintenance Mode, In Maintenance Mode}
    HP LOGICAL VOLUME                        PDNLH0BRH8P95V       False   {Stopping Maintenance Mode, In Maintenance Mode}
    OCZ Z-Drive 6000 3200GB                  E83A_9710_0015_2401. True    {Stopping Maintenance Mode, OK}
    0XXXXX1190020740OCZ000Z63000004T00035000 E83A_9710_0014_7B01. True    OK
    PAOVIRT13_1I:3:3                         PDNLH0BRH3265C       False   OK
    PAOVIRT10_1I:3:3                         PDNLH0BRH8N1M5       False   OK
    PAOVIRT13_1I:3:2                         PDNLH0BRH3265C       False   OK
    PAOVIRT11_NVMe                           CVF8543300381P6BGN-1 False   OK
    PAOVIRT12_NVMe                           CVF8543300771P6BGN-1 False   {Stopping Maintenance Mode, OK}
    PAOVIRT12_NVMe                           CVF8543300771P6BGN-2 False   {Stopping Maintenance Mode, OK}
    PAOVIRT11_NVMe                           CVF8543300381P6BGN-2 False   OK
    PAOVIRT10_NVMe                           CVF85474000Q1P6BGN-2 False   OK
    PAOVIRT10_1I:3:1                         PDNLH0BRH8N1M5       False   OK
    PAOVIRT12_1I:3:2                         PDNLH0BRH8P95V       False   {Stopping Maintenance Mode, OK}
    PAOVIRT12_1I:3:4                         PDNLH0BRH8P95V       False   {Stopping Maintenance Mode, OK}
    P

    Neat, huh?

    If i get it cleaned up i'll post how.

    Wednesday, September 6, 2017 6:48 PM
  • okay, odd. When i ran  repair-clusters2d -disablestoragemaintenancemode, it just sort of hung, and after a few minutes (see above; only a subset of drives are shown, with the goofy status) i punted and killed the command and restarted that node (paovirt12). It restarted and went through the repair, rebalance, optimize storage jobs ok (as is normal on a node shutdown), but the drive status was still goofy. I waited for all the storage jobs to finish and then repeated the repair command, and voila, all is well, including the original 2 drives that i had trouble with (the two "hp logical volume" drives. beats retiring, removing, adding, and re balancing. Maybe i didn't wait long enough before punting... Next time. 
    Wednesday, September 6, 2017 7:35 PM
  • Hi, I found a "quick fix" myself that worked in our case.

    As I mentioned before I was unable to put the node in maintenance mode using VMM or Pause Drain Roles using Failover Cluster Manager because of a degraded space.

    What I ended up doing was "Pause - Do Not Drain Roles" through FCM.
    As soon as the node was successfully paused I just resumed the node.

    The physical disks were no longer "In Maintenance Mode" and the repair started.

    Easy when you find the correct procedure and hopefully this could help someone else in the same situation.



    /Michael R


    Thursday, September 7, 2017 5:34 AM
  • Hi jeffyb,

    1. When the issue occurs, please verify if the cluster nodes are in a normal status, please run Cluster Validation Wizard to check if it pasts all tests;

    2. Please check if the storage pool's capacity is enough, we may also try if evict the issued cluster node, and re-add the node to the cluster once the physical disk and virtual disk re-balance are completed could work.

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, September 8, 2017 7:51 AM
  • Hi jeffyb,

    Just to check if the above reply could be of help, and do you get any progress with your issue, welcome to feedback.

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, September 12, 2017 6:15 AM
  • cluster nodes are normal, plenty of free pool space.

    Wednesday, September 13, 2017 12:28 AM
  • Wednesday, September 13, 2017 12:44 AM
  • We are having a similar issue.  3 node S2D cluster, Paused 1 node to install Monthly Windows Update for September.

    After install of Sept update and reboot the disks for that server are in Maintenance Mode.  One of the VDs is Unhealthy with Operational status - No Redundancy.

    Because of the unhealthy VD we cannot Pause - Drain Roles.

    It has been like this for 24 hours so this doesn't appear to be something that clears with time.

    MichaelR's suggestion of Pause - Do Not Drain succeeded in Pausing the server, but after Resume the disks still show In Maintenance Mode. 


    Todd Hunter


    • Edited by TodddHunter Thursday, September 14, 2017 9:45 AM
    Thursday, September 14, 2017 9:41 AM
  • I had disks stuck in Maintenance Mode after applying KB4038782.

    This was only happening on one node, so ran the following, as per the suggestions above:

    Repair-ClusterStorageSpacesDirect -DisableStorageMaintenanceMode -Node <node1>

    Now when I check Get-PhysicalDisk everything is healthy.

    Get-StorageJob has also started to run all the jobs again.

    Pausing and resuming the node in powershell or failover cluster manager didn't help resolve this.

    Thursday, September 14, 2017 10:13 AM

  • Running Repair-ClusterStorageSpacesDirect -DisableStorageMaintenanceMode did reset the Physical Disks to Healthy. Currently all the physical disks are Healthy.

    1 of the 3 VDs is still Unhealthy with No Redundancy.

    I ran Optimize-StoragePool and Repair-Virualdisk but the VD still shows same state.

    The server has been rebooted. 

    No success getting the VD back to Healthy status yet. 


    Todd Hunter

    Thursday, September 14, 2017 11:20 AM
  • This worked for me too.

    Found an easy way to link physical disks to a node...put one node in to maintenance mode, then run this:  

    Get-PhysicalDisk | ? OperationalStatus -eq "In Maintenance Mode" | Set-PhysicalDisk -Description "host name that is in maintenance mode"

    Then resume that host, pause another and do this again until you've added the host name all the disk descriptions. Now when you run get-physicaldisk you can easily know what host they belong to.

    Tuesday, September 19, 2017 4:18 PM
  • Hi jeffyb,

    You may help to mark the useful resolutions in this thread as answer, so that the useful information can be highlighted.

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, September 20, 2017 2:30 AM
  • problem  and solution documented here. Note that the kb article referenced does not seem to be for server core installations (which is mine).

    https://bcthomas.com/2017/09/bug-when-applying-kb4038782-september-cu-to-storage-spaces-direct-clusters/


    • Edited by jeffyb Thursday, September 21, 2017 3:25 PM
    • Marked as answer by jeffyb Thursday, September 21, 2017 3:25 PM
    Thursday, September 21, 2017 3:25 PM
  • Sunday, October 1, 2017 5:09 PM