locked
Storage Space Direct TP5 - Write performance issue RRS feed

  • Question

  • Hello,

    We're actually evaluating Storage Space Direct in 2016 TP5. 

    Here is the setup, 3 same nodes, per node config :

    • Xeon E5 2.4 GHz
    • 128GB DRAM
    • 1x LSI 9300 8i
    • 2x 2 TB SSD Samsung Pro
    • 1x Mellanox ConnectX-3 56Gb/s RDMA NIC

    Vdisk settings : Resiliency, mirror. ProvisioningType, fixed. No tiering, 100% SSD.

    Read performance are OK for 6 SSDs :

    - 4k 100% read, 0% random, QD 1 -> 7000 IOPS, 30 MB/s
    - 4k 100% read, 0% random, QD 128 -> 45000 IOPS, 180 MB/s

    - 4k 100% read, 100% random, QD 1 -> 7000 IOPS, 30 MB/s
    - 4k 100% read, 100% random, QD 128 -> 50000 IOPS, 200 MB/s

    But write performance are just poor :

    - 4k 100% write, 0% random, QD 1 -> 75 IOPS, 0.3 MB/s- 4k 100% write, 0% random, QD 128 -> 3200 IOPS, 13.3 MB/s

    - 4k 100% write, 0% random, QD 1 -> 75 IOPS, 0.3 MB/s
    - 4k 100% write, 0% random, QD 128 -> 4000 IOPS, 16.5 MB/s

    Could an expert give us some advises ? What is the problem with writes ?

    Thanks !

    Lionel


    • Edited by hc-ch Wednesday, September 21, 2016 11:49 AM
    Tuesday, September 20, 2016 2:49 PM

All replies

  • Hi Lionel,

    Thanks for your post.

    Windows Server 2016 has dedicated flash buffer to acknowledge and aggregate incoming writes. This write buffer is called "Software Storage Bus Cache".

    http://blogs.technet.com/b/clausjor/archive/2015/11/19/storage-spaces-direct-under-the-hood-with-the-software-storage-bus.aspx

    Writes get moved to mirror tier with "lazy writer" then and only after that they may end on your parity tier.

    http://blogs.technet.com/b/clausjor/archive/2015/11/19/storage-spaces-direct-in-technical-preview-4.aspx

    http://blogs.technet.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-45-66/MultiResilientVirtualDisk.png

    Not sure if it's the cause for the slow write, since server 2016 is still preview version, it still hasn't be released officially. I'm afraid there's no much explanation for this in details. And since the resource limit I couldn't do the test for that. Maybe we could wait for the official explanation from Microsoft website about the unexpected behavior.

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, September 21, 2016 7:48 AM
  • Hello Mary,

    I know it is a TP but this is the last one and Claus from Microsoft published a lot of excellent Benchmark results with millions IOPS.

    Could Claus or any other expert take a look at my post ? Is it normal to have only 75 IOPS in 4k block 100% write, 0% random with queue depth 1 ? 

    If yes, S2D is just not usable :(

    Best Regards,

    Lionel

    Wednesday, September 21, 2016 3:26 PM
  • Hi Lionel,

    >Is it normal to have only 75 IOPS in 4k block 100% write, 0% random with queue depth 1 ? 

    It is really an unexpected behavior. I'm sorry that couldn't offer some helpful analysis for your issue based on my knowledge. It is also appreciated that the other members in our forum can share their experience with us about this scenario.

    And you also submit you idea in “ To improve Windows Server I suggest you ...” below. You might get feedback with others in similar scenario. And if it's really the unexpected behavior from the server itself, may be it will get fixed when it releases officially.

    Appreciate your support and understanding.

    https://windowsserver.uservoice.com/forums/295047-general-feedback

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Thursday, September 22, 2016 2:09 AM
  • Hello Mary,

    You seems to be the only one person interested about this issue...

    So some news ... After destroying the cluster and create "local pool SSD pool" on each node I've got same poor write performance with Windows 2016 TP5.

    FYI using Windows 2012 R2 and storage pool, I've got 9000 IOPS with 4k 100% write, 0% random, QD 1.

    Will someone of the Dev team of Windows 2016 give us a feedback about this behaviour ?

    Thanks,

    Lionel

    Tuesday, September 27, 2016 4:36 PM
  • Hi Lionel,

    Thanks for your reply and keep testing.

    Maybe you could also submit your findings to the link I post before for more people to focus on the behavior.

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, September 28, 2016 2:11 AM
  • What tool are you using for perf measurement and with what parameters?

    When you say "local pool" I assume you mean non-clustered local storage pool/space?

    Do you have the same numbers for one of these devices outside any pool?

    Regards,

    Tom Jolly

    ---

    Group Software Engineering Manager

    Windows Server High Availability & Storage team



    Monday, October 3, 2016 7:12 PM
  • Hello Tom,

    Happy to see someone with knowledge in storage interested about my issue here.

    I use IOmeter for all of my storage perf testing (NetApp, Equalogic, Nutanix, Compellent)

    Since last post, I found that all SSDs were configured without write caching policy enabled in device manager. Is it normal (I didn't change it before) ? Then I enabled write caching for each drives and the performance are now better but not as excepted.

    Here are the results :

    - 4k 100% write, 0% random, Queue depth 1 -> 1220 IOPS, 5.01 MB/s
    - 4k 100% write, 0% random, Queue depth 128 -> 28'000 IOPS, 114 MB/s

    So, what about supposed 4k write performances in queue depth 1 ? Could you please give me an idea ? For me, it is easy to reach millions of IOPS with a lot of SSDs, RDMA Adapter with big queue depth but what I need is minimum 8k-10k IOPS in queue depth 1 with this kind of small blocks.

    Storage Space Direct with Scale of file server is a very good solution for hoster like me but writes performance have to be sufficient, in any case better than our old NetApp SAN in SMB3.

    Regards,

    Lionel

    Tuesday, October 4, 2016 8:24 PM
  • We wanted to apply to this post. We are working with Lionel to clarify the details of his configuration to identify where the performance bottlenecks are. We will get back to this thread when it's worked though.

    Steven Ekren

    Program Manager

    Microsoft.


    This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, October 10, 2016 8:01 PM
  • We wanted to apply to this post. We are working with Lionel to clarify the details of his configuration to identify where the performance bottlenecks are. We will get back to this thread when it's worked though.

    Steven Ekren

    Program Manager

    Microsoft.


    This posting is provided "AS IS" with no warranties, and confers no rights.

    ??? Sorry Steven, but when did you came back to me and how ???

    Nobody from Microsoft contact me to locate or find the problem.

    Could someone from Microsoft Storage Engineering Team contact me or reply to this thread ? Actually Mary try to copy/paste some information and Tom just ask questions but never respond when I came back.

    Also, to be sure it was not an hardware or configuration problem I try to contact all of the official manufacturer compliant with Storage Space Direct for a POC but no one can share or let me try his product. The answer is always we are not ready yet... but Windows 2016 is now out !

    Please,  help me to prove Storage Space Direct is a good solution...

    Regards,

    Lionel

    Wednesday, October 12, 2016 1:22 PM
  • Hi Lionel,

    Did you have got an answer from Microsoft?

    The reason I ask is that I've just seen the same problem on our deployment of S2D, but was able to fix it by uninstalling the "Data Centre Bridging" feature from the cluster nodes!

    Strange I know since this should increase performance, but in our case, I've quadrupled the number of IOPS as soon as the feature was removed on all nodes...

    I thought I'd let you know in case the solution might work for you too.

    Cheers,


    Stephane

    Sunday, November 27, 2016 7:44 AM
  • Hello Lionnel, I know the post is old, but I try.

    I have the same problem as you, write speed less than 30MB/s on the s2d storage space direct.

    on a local drive it goes to 140 MB/s.

    this makes me fail on installing some application.

    thanks for sharing your fix.

    I use 2x SAS drives on two nodes in the cluster for now 9later I will add another one)

    thanks for the help

    M

    Thursday, December 20, 2018 11:30 PM