none
2019 - Parity Performance is terrible - Code issue? RRS feed

  • Question

  • I have an issue with 2019 with both RAID 5 and Parity.

    tl;dr version - under Hyper-V i've shown that 2019 performance is a fraction of previous server versions

    I have a test server with an Intel Core 15 8500, 32 GB RAM, a 500GB M.2 drive for the OS, and 4x 4TB drives.

    I pulled these drives from my old test server which was a HP Microserver N54L running 2012 R2. On this old server the performance running RAID 5 was not great but OK.

    On my new server, I installed 2019, and set up Storage Spaces with Parity. The performance was terrible. So I tried RAID 5, and the performance was still terrible although slightly better.

    After lots of troubleshooting with no luck, I then tested different scenario's under Hyper-V with some really interesting results. I believe testing under Hyper-V rules out a number of possible hardware & driver issues being the cause. 

    Check out the results.

    Initial tests on 2019 Server, physical bare OS install:

    1. Single drive, no RAID – 110MB/s write
    2. Windows RAID 0 4x drive stripe – 420MB/s write
    3. Windows Storage Spaces 4x Drive Mirror – 400MB/s write

    The above three would indicate that there's no driver issue causing a bottle-neck.

    Parity Testing:

    4. Storage Spaces 4x Drive with single Parity. – 13MB/s
    5. Windows RAID 5 4x drive – 23MB/s

    Parity is rubbish right? Well only Window it seems. I decided to try to run a Linux VM and map the drives as physical drives to rule out a Windows issue, and was blown away by the performance. Remember this is running under a VM but mapped to the same physical drives.

    6. Running Linux as Hyper-V guest, Linux RAID 5– 230MB/s

    At this stage, I decided to test 2012, 2016 and 2019 under Hyper-V. Each was configured as a Gen2 machine, 8GB RAM, a 127GB OS drive on the M.2 SSD.

    One VM at a time, i'd physically map the 4x4TB drives to the VM, set up the drives, and run some quick performance tests by copying a single large 50GB file. I'd measure speed after a couple of minutes to make sure any cache was full. 

    Here's the results of that test:

    RAID 5:
    2012  - 100MB write, 150MB read

    2016 - 100MB write, 150MB read

    2019 -  25 MB write, 150MB read.

    Storage Spaces with parity:
    2016 Storage spaces with parity - 40MB/s write, around 300MB read.
    2019 Storage spaces with parity- 13MB/s write. I forgot to test read!

    The vDisk was configured as follows: New-VirtualDisk -StoragePoolFriendlyName "vPool" -FriendlyName "vDISK" -UseMaximumSize -ProvisioningType Fixed -ResiliencySettingName "Parity" -PhysicalDiskRedundancy 1 -NumberOfColumns 4

    I tried playing with the numbers of columns, cluster size, interleave etc but nothing made much difference. 

    I can't see how this can be anything other than a 2019 specific issue. Any thoughts on where to start to understand better what the cause may be?


    Wednesday, June 26, 2019 11:56 AM

All replies

  • Hi,

    Thanks for posting in our forum!

    This is a quick note to let you know that I am currently performing research on this issue and will get back to you as soon as possible. I appreciate your patience.
    If you have any updates during this process, please feel free to let me know.

    BR,

    Daniel


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Thursday, June 27, 2019 7:18 AM
    Moderator
  • Thanks for looking into this.

    Here's some more interesting results. I took the same configured Storage Space Volume Group and moved it from server to server (same physical server)

    Testing write speeds only:

    2019 Physical – ~13MB/s
    2016 Hyper-V VM – ~120 MB/s
    2019 Hyper-V VM – 13 MB/s

    Saturday, June 29, 2019 4:23 AM
  • I've noticed this problem as well. Memory usage seems to be very high on 2019 as well while data is moving around. It's really, really disappointing. Finally software storage on Windows was getting somewhere, and now it's tanked again.

    Anyone have any ideas?

    Thursday, August 1, 2019 2:52 PM
  • We have found exactly the same issues, 2019 is way slower than 2016. Something has changed on the networking side from 2016 to 2019, Server 2019 is unusable.
    Thursday, August 1, 2019 7:34 PM
  • I don't doubt that others are having a genuine issue here especially if you configure a stgpool in 2016 and then move it as-is to 2019, but I solved my issue.

    My issue was that 2019 has different behaviour to 2016 when it chooses defaults for LogicalSectorSize on my disks. When I created my stgpool on 2016 it picks a LogicalSectorSizeDefault of 4096, whereas on 2019 it chooses a LogicalSectorSizeDefault of 512. I also noticed that when I do a "Get-PhysicalDisk | select FriendlyName,HealthStatus,LogicalSectorSize,PhysicalSectorSize,MediaType" in 2016 it correctly detects that the SSDs in my pool show up with a PhysicalSectorSize of 4096 but when I do the same in 2019 it thinks the SSDs have a PhysicalSectorSize of 512. Dunno why but could even be driver issues - some of the hardware I'm using for my testing is just consumer grade.

    So maybe a few other people with this issue might find some sector size tweaking will help them but it looks like this probably isn't the cause for most.

    Thursday, August 22, 2019 11:26 PM
  • Well it turns out I didn't actually fix the performance issue just with correcting the sector size. It did help a little, and enabled my sequential write to get near where it's supposed to be, but in my prev post above I was in a rush and only tested a sequential large file copy - I didn't benchmark it. Now that I have, I see that reads and random writes are still affected.

    However, there's good news, but the good news is actually bad news.

    In the Settings app in Windows Server 2019, I turned off Windows Defender real-time protection after getting the idea from another forum. When I ran a quick benchmark performance jumped back up nearly to where it was under Server 2016. It's not quite as good but it's nearly there.

    So that's it. It's Windows Defender. Try turning off real-time protection (this will leave your system more vulnerable, but vulnerable is better than unusable). It's pretty inexcusable that the built-in anti-malware causes an SSD-accelerated tiered storage space to be so slow that its performance is less than a single decent hard drive, even in mirror mode - let alone parity. I call that pretty unusable.

    To reiterate, here are steps I am taking which produces the problem. The SSDs for tiering aren't mandatory but they do expose how bad the problem is by increasing the difference in expected vs observed performance:

    1. Boot into Windows Server 2016 and create a storage pool using at least four HDDs and two SSDs, with the intention to use the HDDs for slow tier and SSDs for fast tier. When creating the storage pool be sure to observe correct logical sector size for your hardware, and if using tiering, set a reasonably large chuck of write-back cache such as 16GB or more.
    2. After creating the new storage pool, use PowerShell to set the correct number of columns for your storage layout.
    3. Use either the Server Manager GUI or PowerShell to create a tiered vdisk on the new storage pool. Be sure to allocate a chunk of SSD tier. The issue occurs no matter how much of the SSD tier you allocate, but the more you allocate the more obvious the problem becomes. I suggest at least 32GB.
    4. Create a volume on the new vdisk. You can format it as either NTFS or ReFS.
    5. Run a quick benchmark on the new tiered volume. Use an actual benchmark tool rather than just Windows file copy. CrystalDiskMark works just fine for this.
    6. Save or screenshot your benchmark results and shut down the server.
    7. Boot into Windows Server 2019, preferably on the exact same hardware but definitely with the exact same, already-configured storage space. I just had a spare SSD so I swapped it out for the one with 2016 on it and installed 2019 on that.
    8. Benchmark your unchanged storage space again. Save or screenshot your results and observe your gigantic performance loss by comparing results. Mine is about 90% loss. I go from 2450MB/sec seq read with queue depth of 32, to 238MB/sec. Sequential writes only take about a 30% hit by comparison. Random read and writes are down by at least 80%.
    9. Still in Windows Server 2019, open the Settings app and turn off Windows Defender real-time protection.
    10. Run one last benchmark. Cry as you discover that the inbuilt anti-malware is the cause of the crippling performance.

    There you go Microsoft, you have steps to reproduce. 

    Can this be fixed?


    • Edited by teh dave Wednesday, August 28, 2019 8:48 AM
    Wednesday, August 28, 2019 8:47 AM
  • False alarm, again. This does not "fix" it. After a while performance went back to where it was.
    Thursday, September 12, 2019 10:32 AM