none
Storage Spaces Direct (S2D) - Poor write performance with 5 nodes with 24 Intel P3520 NVME SSDs each over 40Gb IB network

    Întrebare

  • Need a little help with my S2D cluster which is not performing as I had expected.

    Details:

    5 x Supermicro SSG-2028R-NR48N servers with 2 x Xeon E5-2643v4 CPUs and 96GB RAM

    Each node has 24 x Intel P3520 1.2TB NVME SSDs

    The servers are connected over an Infiniband 40Gb network, RDMA is enabled and working.

    All 120 SSDs are added to S2D storage pool as data disks (no cache disks). There are two 30TB CSVs configured with hybrid tiering (3TB 3-way mirror, 27TB Parity)

    I know these are read intensive SSDs and that parity write performance is generally pretty bad but I was expecting slightly better numbers then I'm getting:

    Tested using CrystalDiskMark and diskspd.exe

    Multithreaded Read speeds: < 4GBps (seq) / 150k IOPs (4k rand)

    Singlethreaded Read speeds: < 600MBps  (seq) 

    Multithreaded Write speeds: < 400MBps  (seq) 

    Singlethreaded Write speeds: < 200MBps (seq) / 5k IOPS (4k rand)

    I did manage to up these numbers by configuring a 4GB CSV cache on the CSVs and forcing write through on the CSVs:

    Max Reads: 23GBps/500K IOPs 4K IOPS, Max Writes:2GBps/150K 4KIOPS

    That high read performance is due to the CSV cache which uses memory. Write performance is still pretty bad though. In fact it's only slight better than the performance I would get for a single one of these NVME drives. I was expecting much better performance from 120 of them!

    I suspect that the issue here is that Storage Spaces is not recognising that these disks have PLP protection which you can see here:

    Get-storagepool "*S2D*" | Get-physicaldisk |Get-StorageAdvancedProperty
    
    FriendlyName          SerialNumber       IsPowerProtected IsDeviceCacheEnabled
    ------------          ------------       ---------------- --------------------                   
    NVMe INTEL SSDPE2MX01 CVPF7165003Y1P2NGN            False                     
    WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
    NVMe INTEL SSDPE2MX01 CVPF717000JR1P2NGN            False                     
    WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
    NVMe INTEL SSDPE2MX01 CVPF7254009B1P2NGN            False                     
    WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.

    Any help with this issue would be appreciated.

    Thanks.

    joi, 12 iulie 2018 14:50

Toate mesajele

  • Hi,
    Based on the complexity and the specific situation, we need do more researches. If we have any updates or any thoughts about this issue, we will keep you posted as soon as possible. Your kind understanding is appreciated. If you have further information during this period, you could post it on the forum, which help us understand and analyze this issue comprehensively.
    Sorry for the inconvenience and thank you for your understanding and patience.
    Best Regards,

    Frank

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    vineri, 13 iulie 2018 07:59
    Moderator
  • Did  you ever resolve this?  We have struggled with poor S2D performance with SuperMicro servers, and I've just realized that all the Intel DC SSDs which have PLP are not recognized as such - return to query is the same as yu are seeing.

    Thanks.

    luni, 27 august 2018 19:38
  • Also checking in to see if this has been resolved yet, as I too am running into this issue with S2D not recognizing that my Intel DC S3610 have PLP. This is killing my lab deployment, which in turn is also killing my (would-be) prod deployment.

    sâmbătă, 27 octombrie 2018 06:09
  • I also have the same symthoms. My Virtual Disk is not using Cache(Jurnal) devices and advanced properties returns the same error.
    marți, 20 noiembrie 2018 10:01
  • I have the same problem. Poor IO and seeing the same error:

    Get-PhysicalDisk |? FriendlyName -Like "*SAMSUNG*" | Get-StorageAdvancedProperty
    ...
    WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.

    vineri, 18 ianuarie 2019 23:04
  • Anyone make progress on this?
    miercuri, 23 ianuarie 2019 20:50
  • I have some information for this thread. My app is highly single threaded sequential read 8 MB disk block size

    We have been planning and testing the S2D all NVMe solution for the past year working through different bottlenecks. Attending Microsoft and vendor conferences and bringing in Vendor experts as well as going onsite to vendors solutions centers for hardware testing.

    We are currently deploying our S2D system and here are our specs:

    3 x Dell R740xd per cluster

    2 x 12 3.0 Ghz

    768 GB Ram

    12 x 7.68 TB intel and 8 TB micron nvme drives

    2 x 25 GB Dual Port Cards

    Here is the issue I believe your experiencing. Your designing a high throughput environment but you are finding bottlenecks for s2d. For my environment we found our main bottleneck was network IO where pushing the 8 MB block size would max out a 100 GB network pipe and cause too much overhead. The other issue with 100 GB networking cards is PCIe Lanes. Each 100 GB card needs the full x16 for the bandwidth. Any solution going larger then the number of available lanes on your cpu is going through a PLX expander chip where the 24 NVME are sharing 16 lanes per cpu. During our testing of the 8 MB block size we tested 4 nodes vs 3 nodes and immediately had the network as the bottle neck. This was causing 70% hit.

    Our solution:

    Plan to run a minimum of 15 vm's per host with no over subscription initially and run our production workload. Once we determine what % of the system is being used up the vm count per host. Deploy small clusters of 3 nodes. Setup 2 disk group per host. Create vm's per disk group and set both disk group and vm preferred owner to the same node.

    Advantage: All disk reads and writes for the vm stays local to the node. s2d background will send the mirror writes over the SMB network but the network is no longer the bottleneck. The cluster can survive 1 node failure/maintenance cycle. Our app needs high throughput over iops.

    Our Results - 20 VM's per node running the test using the VMFleet software. 

    4k IOPS 100 % read - at ~7-8 NVMe you will max out the CPU. Download the vmfleet framework aka.ms/vmfleet and run the watch-cluster.ps1 -sets * I get ~2.5M IOPS but extremely high CPU (logical total very bottom section)

    https://1drv.ms/u/s!Ak1L4DUDEJ_5hO16ff_E8VtmxY632Q

    8 MB Sequential Read: As you can see the IOPS goes down but our bandwidth is 96.5 GB/s (98,795 MB/s) with a theoretical max of 108 GB/s and the CPU Average 24% Each VM pushing ~1.6 GB/s

    https://1drv.ms/u/s!Ak1L4DUDEJ_5hO15M4h0CRqLsnIjNQ

    These results are with default s2d settings. We have not done any tuning such as making the interleave size at 8 MB instead of 256 KB. MSFT best practice is to attempt to match the interleave with the IO workload on the virtual disk. Our testing reduced our CPU even lower because the SBL (the one below the cache) IOPS.


    • Editat de Ravinmiist miercuri, 6 februarie 2019 15:00
    miercuri, 6 februarie 2019 09:16