none
Storage: RAID v JBOD

    Question

  • I recently watched Matt Gossage's excellent webcast 'Storage in Exchange 2010'
    http://www.microsoft.com/events/podcasts/default.aspx?topic=&audience=&view=&pageId=x4705&seriesID=Series-f326d2e3-0be8-485e-b521-12fe54cef4e7.xml
    But I still have some questions about Exchange 2010 storage.

    1. WHY would you chose JBOD over say RAID5?
    I understand that with 2010 you *can* use JBOD.  Thats cool.
    And I understand that, with DAGS, you can failover very quickly to a copy of the DB on another server.
    A RAID4 of large SATA drives would be only marginally more expensive (1 extra spindle per array) - but would add a valuable extra layer of protection.
    I feel like I would prefer to use RAID5 so that, if I have a single drive failure, no failover at all needs to take place.  Yeah, I understand the failovers are quick - but why even bother failing over when you could use RAID5.
    So even if I had 3 locations (London, Tokyo, Houston) all in a DAG, I'd be tempted to use RAID5 at each location. TINY additional expense - but a valuable first layer of protection.

    2. RAID10 or RAID5 ?
    One of the slides ('E2010 HA STorage Design Flexibility') shows that you could use 'DAS (SATA)' with RAID10, or 'JBOD (SATA)'.
    *If* one were to use the first option (DAS SATA), why would one chose RAID10?
    Given the huge reduction in i/o load for E2010, you probably don't need the performance of RAID10 (year, a lot of assumptions there, like: How many users? etc).
    But if JBOD provides enough performance, then surely RAID5 does to?

    Thanks!
    Friday, September 04, 2009 4:18 PM

Answers

  • I have done some reading and research into this, as well as considering the practical application into our environment, and I've come to some conclusions. I'm going to just throw them out there... :) **** These conclusions apply to an HA environment with DR required, two datacentres - primary datacentre and DR datacentre. I haven't tried to solve the problems of the world, just our problems.

    *** AND, remember we're talking Direct Attached Storage here. JBOD = Direct attached, right? SAN (FC or iSCSI) is excluded from the argument. SATA vs. SAS - ignored at this point. That's very implementation specific. I'm talking architecture here.

    Now the conclusions first.

    1) JBOD = BAD, for most of us. JBOD is theoretically possible, but operationally infeasible for many environments. Don't even think about it. The whole JBOD thing is ridiculous and talking about it at Tech Ed and promoting it ... the mind boggles. It's just out of touch with reality. My opinion. Read on.

    2) RAID10 = BAD. It's more expensive than we should need in this day and age. If IOs are supposed to be so reduced now, then we don't need RAID10, do we? But isn't this is what we appear to have been left with - as a recommendation from Microsoft. (WTF??? WHAT ABOUT RAID5/6??? Read on.)

    (FYI Read This Page first if you like : "Mailbox Server Storage Design Recommendations" - http://technet.microsoft.com/en-us/library/dd346703(EXCHG.140).aspx - it's a long, incomplete page at this point - this is really early stuff from Microsoft and I congratulate them on putting it out there warts and all. e.g. "Section here that talks about storage platform - esrp link and explanations - choosing a validated exchange storage solution using esrp" - anyways Step 2: Design Storage Architecture based on IO and Capacity Requirements - Microsoft Says: Exchange Mailbox Database File (EDB) Volume - HA: Supported Best Practices - All RAID types supported. JBOD/Raidless suupported (3+ DB Copies); Best Practice = RAID1/10)

    Definition of JBOD

    Firstly, what is JBOD. When we are talking about JBOD we are talking about individual disks, with individual volumes on each one. Full stop. They are not striped (that would be RAID0). They are not part of a spanned volume set. They are simply individual disks. Think old school days before RAID. We may have lots of them on a single bus.

    FYI Microsoft recommend that each individual disk be split into two dedicated volumes. One dedicated to the log files. One dedicated to the data file. They don't elaborate but it seems likely that this is to avoid *file system* fragmentation. Database fragmentation is not supposed to happen any more. But file system fragmentation surely still can, especially with the data file tripping over log files being written and deleted all the time. How important to performance this is in the real world, I'm not sure. "Supported: When using JBOD,single disk broken up in to two volumes (one for database, one for log stream). Best Practice: When usingJBOD ,single database/log per volume."

    Why JBOD is BAD

    (*** and if you *span* (not stripe) a volume across JBOD disks , the same issues apply. You just have bigger "disks" with 2/3/4/5 times the failure probability...)

    i) With individual volumes on individual disks - you will get IO pressure on certain disks. i.e. "hot" disks in terms of IOps. I.e. disks that can't cope with the IOps being demanded of them. It might not be easily resolvable. Without some "IO levelling" management strategy and processes to move mailboxes to level IOs across disks, this will happen. Could be automated??? But Scotty's Theorem applies: "The more complicated the plumbing, the easier it is to stop up the drain..."

    ii) With individual volumes on individual disks - you will get storage pressure on individual disks. i.e. some disks will become full sooner than others. Without some "Storage levelling" management strategy and processes to automatically move mailboxes to level IOs across disks, this will happen. Scotty's Theorem also applies. aka KISS principle.

    iii) When a disk goes down, you have to take some complex action. Microsoft will tell you these actions are scriptable. They are still more complex than a non-Exchange Admin type guy would like to undertake in the middle of the night. Scotty's Theorem/KISS/Murphies Law - take your pick.

    iv) What about a DR scenario. I hope you have your DAGs replicated across THREE DIFFERENT SITES - you lose the site that had two copies and you are perilously close to losing your DAG (i.e. running on only ONE copy...) . If not - you'll need 4 (FOUR!!!) copies.


    Why RAID10 is Good

    Against 4 DAG replica's on 4 different servers (if you're stuck with 2 datacentres) - RAID10 starts to look attactive.

    i) You'll only need one server with a mirrored disk / stripe set to have redundancy. Not two servers with a disk / stripe set in each

    ii) You get good write IOs. And you don't need to worry about the fact you might suffer from performance penalties.

    iii) Your middle-of-the-night stress is low (pop the faulty disk, put in a new one)


    Why RAID5 is Better

    Same as RAID10 - ish - but

    i) RAID5 is cheaper. You get more storage utilisation out of your disks. They are still expensive in servers we buy!!! And we STILL might not be able to use SATA!

    (You'll get performance penalties, but you might not *suffer* from them...)

    **** OK OK OK. RAID5 has a BIG performance hit on writes. When read/write ratio is less than 3:1 you are getting less perfomance per TB. Read/write with exchange is most likely something like 1:3 (completely the opposite). 

    BUT THERE IS A CHEAT! RAID5 doesn't HAVE to have that hit. IF THE OS/APPLICATION WRITES ONE WHOLE STRIPE at a time, then you should not need to incur a Read/Modify/Write performance hit that you get writing a partial stripe. IMHO E2010 is in the perfect position to be DESIGNED TO DEAL WITH RAID5. Total stripe size must be the same as the database block size, or an exact multiple of the database block size.

    It would be great if you could use a 32kb stripe size (matches the database block size) on RAID controllers. However that doesn't seem possible these days - stripes are sooo BIG. And 4kb disk sectors make it worse.

    Why RAID6 is Best

    All the benefits of RAID5, plus:

    i) You don't need to suffer a performance reducing array rebuild on a single disk failure.

    ii) You don't risk total loss of the array while you're rebuilding (two disk failures)

    iii) You get RAID6 "for free" on new servers (e.g. IBM x3650M2 with 256MB battery backed cache.) Try and save money by getting them to take it out... Good luck with that. And face it, you're probably going to buy new servers anyway for your Exchange upgrade (if you just bought them, you'll sweat them for a few more years rather than pay for ANOTHER upgrade project).


    Bottom line-

    WE WANT RAID5/6!!!!! WE LIKE IT!!!! We just wish it didn't have that blinkin write performance penalty! MICROSOFT - WHY JBOD? WHY NOT OPTIMISE E2010 FOR RAID5/6? You're sooooo close with this, I can feel it. RAID VENDORS - make it possible to work around the RAID5/6 write penalty!!!!

    Unless you ARE Microsoft, and you're running a cloud, and have developers on-tap. Then sure, use JBOD. :P

    Real world now - and it looks like we're stuck with RAID10...



    • Marked as answer by Drew-TX Tuesday, March 09, 2010 4:17 PM
    Sunday, November 01, 2009 11:39 PM

All replies

  • I haven't seen the webcast so I can't comment on the contents there, but I can offer a few thoughts on your questions nonetheless :)

    RAID5 are slow on writes, but fast on reads. This is due to the parity calculation that takes place on writes, and it's a pretty big performance hit. On reads you don't get the same hit, and you benefit compared to a single disk. So if you were to store for instance a large amount of MP3 files it would be ok, because you tend to read those files more than write. For something disk intensive like a database server it would be horrible. While Exchange isn't the same thing as a database server it does tend to do more writes than reads on the mailbox databases.

    JBOD is, like the acronym spells out, "just a bunch of disks". Different RAID controllers handle it differently, but in it's basic form data is written to disk 1, and when disk 1 is full you continue on disk 2. (You/Exchange just perceive it as the one disk of course.) No performance hit, and no performance gain basically. With 2TB SATA disks I personally don't often feel the need to "chain" more of them together to create a ginormous JBOD array though.

    RAID10 gives the redundancy of RAID1 combined with the performance gain of RAID0. It's the best of both worlds really, you gain both read and write performance. The cost is the number of disks, and whether this is worth is an individual choice.

    Go with plain SATA, JBOD or RAID10, but not RAID5. Just my two cents really.
    Friday, September 04, 2009 4:57 PM
  • Thanks Andreas.

    I know that RAID5 isn't the greatest in terms of performance - but is it slower than JBOD?  And, if it is slower, is it significantly slower?
    The same slide ('E2010 HA STorage Design Flexibility') recommends DAS SAS RAID5 for Exchange 2007 databases.  If RAID5 is fast enough for Exchange 2007 databases, then it should be *fine* for E2010 (with the 70% reduction in disk i/o).

    Historically I always did DAS/RAID10 for E2003 and E23007 - with up to 6000 mailboxes on a sible server. I 'get' that E2010 has a much lower i/o foot print.  So I don't see why the need for RAID10.  In my mind, RAID 5 gave a degree of protection without having the expense of a full mirror - whereas RAID10 gave the protection AND good performance.  For E2010 I don't think we'd need the performance (and expense) of RAID10 - but RAID5 would offer a very inexpensive extra layer of protection over JBOD.

    I'm not anti-JBOD.  But for a very minimal additional cost you can do RAID5 and add a bit of *local* resilience.

    A basic configuration might have 3 servers all in a DAG: Houston1, Houston2, Boston - each with JBOD.
    Houston1 would be the 'primary' and if a JBOD drive failed the that DB would failover to Houston2.  If a hurricane hit Houston, then everything could failover to Boston.
    But it seems like you're using Houston2 to provide protection against a disk failure in Houston1; you have 3 servers and 3 complete JBOD farms (Houston x 2, + Boston)

    An alternative would be to use RAID5.  If you lost a drive in Houston1, then the hot spare would kick in; no need to failover to Houston2.
    You could then just have 2 servers and 2 RAID5 farms (Houston1 + Boston).

    The second scenario seems cheaper - with no added risk/exposure.
    Friday, September 04, 2009 6:17 PM
  • Actually, when I drilled down into the Appendix slides in the PPT, I think my questions are covered.

    There it states that if you have a 2 node DAG then you *should* use RAID - and for EDB it can be 5,6,10 or 1.
    For a 3 node DAG it says RAID is optional.

    It seems to me that 2 nodes with RAID5 are going to be cheaper than 3 nodes with JBOD - with no tangible loss in resilience. (assuming the 2 nodes are in seperate locations)

    The second 'local' node (Houston2 in my example) only really exists BECAUSE JBOD has no fault tollerence; the "If we're usuing JBOD then we better make sure we have 3 copies" theory.  Verses the "Hey, why not use basic RAID, then maybe we only need 2 copies?" theory.


    Friday, September 04, 2009 8:18 PM
  • RAID5 is a performance pig, I'd be very cautious of using it. If JBOD is satisfactory then RAID5 may not be as Andreas mentions you're adding read/write penalties into the mix that you didn't start with.

    RAID5 may by the disk be a small cost increase, but the more spindles you add you're also increasing power needs, BTU output for cooling, rack footprint, and complexity. Suddenly what may have fit in a single drive shelf now requires 2 or more per server and you're footprint is growing very quickly. I've been doing some figures for our own install and if I'm able to get 3+ copies of data between two datacenters using JBOD it appears that I'll be able to cut down our physical footprint (of mailbox servers only) by 65% for a comparative Exchange 2007 infrastructure.

    General discussion....

    RAID is and always has been a high availability feature and not data protection as some folks over the years have thought it to be. RAID doesn't care if it is writing good data or bad, it's going to still stripe it across the spindles and be happy about it.

    If you take that 3 (or more) spindle RAID5 and instead go to an extra mailbox server with an additional data copy what have you done? You're accepting a small HA hit, but you've doubled your data protection by now having another server inspect & replay the logs before database commitment. It might not fit for everyone's business model, but I kind of like the sounds of it. With the CAS servers now being responsible for MAPI (except for PFs) via the RPC Client Access Service the Outlook clients won't even realize they went offline while the fast active DB switchover takes place. Shadow redundancy at the hub transports greatly speeds up the redelivery of any mail which may have been in transit and lost before log copies happened so that process helps with client transparency as well.

    Food for thought.
    Brian Day / MCSA / CCNA, Exchange/AD geek.
    Friday, September 04, 2009 11:14 PM
  • I have done some reading and research into this, as well as considering the practical application into our environment, and I've come to some conclusions. I'm going to just throw them out there... :) **** These conclusions apply to an HA environment with DR required, two datacentres - primary datacentre and DR datacentre. I haven't tried to solve the problems of the world, just our problems.

    *** AND, remember we're talking Direct Attached Storage here. JBOD = Direct attached, right? SAN (FC or iSCSI) is excluded from the argument. SATA vs. SAS - ignored at this point. That's very implementation specific. I'm talking architecture here.

    Now the conclusions first.

    1) JBOD = BAD, for most of us. JBOD is theoretically possible, but operationally infeasible for many environments. Don't even think about it. The whole JBOD thing is ridiculous and talking about it at Tech Ed and promoting it ... the mind boggles. It's just out of touch with reality. My opinion. Read on.

    2) RAID10 = BAD. It's more expensive than we should need in this day and age. If IOs are supposed to be so reduced now, then we don't need RAID10, do we? But isn't this is what we appear to have been left with - as a recommendation from Microsoft. (WTF??? WHAT ABOUT RAID5/6??? Read on.)

    (FYI Read This Page first if you like : "Mailbox Server Storage Design Recommendations" - http://technet.microsoft.com/en-us/library/dd346703(EXCHG.140).aspx - it's a long, incomplete page at this point - this is really early stuff from Microsoft and I congratulate them on putting it out there warts and all. e.g. "Section here that talks about storage platform - esrp link and explanations - choosing a validated exchange storage solution using esrp" - anyways Step 2: Design Storage Architecture based on IO and Capacity Requirements - Microsoft Says: Exchange Mailbox Database File (EDB) Volume - HA: Supported Best Practices - All RAID types supported. JBOD/Raidless suupported (3+ DB Copies); Best Practice = RAID1/10)

    Definition of JBOD

    Firstly, what is JBOD. When we are talking about JBOD we are talking about individual disks, with individual volumes on each one. Full stop. They are not striped (that would be RAID0). They are not part of a spanned volume set. They are simply individual disks. Think old school days before RAID. We may have lots of them on a single bus.

    FYI Microsoft recommend that each individual disk be split into two dedicated volumes. One dedicated to the log files. One dedicated to the data file. They don't elaborate but it seems likely that this is to avoid *file system* fragmentation. Database fragmentation is not supposed to happen any more. But file system fragmentation surely still can, especially with the data file tripping over log files being written and deleted all the time. How important to performance this is in the real world, I'm not sure. "Supported: When using JBOD,single disk broken up in to two volumes (one for database, one for log stream). Best Practice: When usingJBOD ,single database/log per volume."

    Why JBOD is BAD

    (*** and if you *span* (not stripe) a volume across JBOD disks , the same issues apply. You just have bigger "disks" with 2/3/4/5 times the failure probability...)

    i) With individual volumes on individual disks - you will get IO pressure on certain disks. i.e. "hot" disks in terms of IOps. I.e. disks that can't cope with the IOps being demanded of them. It might not be easily resolvable. Without some "IO levelling" management strategy and processes to move mailboxes to level IOs across disks, this will happen. Could be automated??? But Scotty's Theorem applies: "The more complicated the plumbing, the easier it is to stop up the drain..."

    ii) With individual volumes on individual disks - you will get storage pressure on individual disks. i.e. some disks will become full sooner than others. Without some "Storage levelling" management strategy and processes to automatically move mailboxes to level IOs across disks, this will happen. Scotty's Theorem also applies. aka KISS principle.

    iii) When a disk goes down, you have to take some complex action. Microsoft will tell you these actions are scriptable. They are still more complex than a non-Exchange Admin type guy would like to undertake in the middle of the night. Scotty's Theorem/KISS/Murphies Law - take your pick.

    iv) What about a DR scenario. I hope you have your DAGs replicated across THREE DIFFERENT SITES - you lose the site that had two copies and you are perilously close to losing your DAG (i.e. running on only ONE copy...) . If not - you'll need 4 (FOUR!!!) copies.


    Why RAID10 is Good

    Against 4 DAG replica's on 4 different servers (if you're stuck with 2 datacentres) - RAID10 starts to look attactive.

    i) You'll only need one server with a mirrored disk / stripe set to have redundancy. Not two servers with a disk / stripe set in each

    ii) You get good write IOs. And you don't need to worry about the fact you might suffer from performance penalties.

    iii) Your middle-of-the-night stress is low (pop the faulty disk, put in a new one)


    Why RAID5 is Better

    Same as RAID10 - ish - but

    i) RAID5 is cheaper. You get more storage utilisation out of your disks. They are still expensive in servers we buy!!! And we STILL might not be able to use SATA!

    (You'll get performance penalties, but you might not *suffer* from them...)

    **** OK OK OK. RAID5 has a BIG performance hit on writes. When read/write ratio is less than 3:1 you are getting less perfomance per TB. Read/write with exchange is most likely something like 1:3 (completely the opposite). 

    BUT THERE IS A CHEAT! RAID5 doesn't HAVE to have that hit. IF THE OS/APPLICATION WRITES ONE WHOLE STRIPE at a time, then you should not need to incur a Read/Modify/Write performance hit that you get writing a partial stripe. IMHO E2010 is in the perfect position to be DESIGNED TO DEAL WITH RAID5. Total stripe size must be the same as the database block size, or an exact multiple of the database block size.

    It would be great if you could use a 32kb stripe size (matches the database block size) on RAID controllers. However that doesn't seem possible these days - stripes are sooo BIG. And 4kb disk sectors make it worse.

    Why RAID6 is Best

    All the benefits of RAID5, plus:

    i) You don't need to suffer a performance reducing array rebuild on a single disk failure.

    ii) You don't risk total loss of the array while you're rebuilding (two disk failures)

    iii) You get RAID6 "for free" on new servers (e.g. IBM x3650M2 with 256MB battery backed cache.) Try and save money by getting them to take it out... Good luck with that. And face it, you're probably going to buy new servers anyway for your Exchange upgrade (if you just bought them, you'll sweat them for a few more years rather than pay for ANOTHER upgrade project).


    Bottom line-

    WE WANT RAID5/6!!!!! WE LIKE IT!!!! We just wish it didn't have that blinkin write performance penalty! MICROSOFT - WHY JBOD? WHY NOT OPTIMISE E2010 FOR RAID5/6? You're sooooo close with this, I can feel it. RAID VENDORS - make it possible to work around the RAID5/6 write penalty!!!!

    Unless you ARE Microsoft, and you're running a cloud, and have developers on-tap. Then sure, use JBOD. :P

    Real world now - and it looks like we're stuck with RAID10...



    • Marked as answer by Drew-TX Tuesday, March 09, 2010 4:17 PM
    Sunday, November 01, 2009 11:39 PM
  • It would be great if you could use a 32kb stripe size (matches the database block size) on RAID controllers. However that doesn't seem possible these days - stripes are sooo BIG. And 4kb disk sectors make it worse.


    Microsoft recommends a stripe size of 256KB if you are going to use RAID with E14. This will ensure you retain as much of the I/O performance increases they have put into E14 as possible when using RAID. Using smaller stripes is only going to increase the I/O count every time you read and write from the array because the controller must make multiple I/O calls to the disks to find the smaller chunks.


    The biggest problem with RAID5/6 besides their normal performance penalties is performance during degraded states when a disk fails or is rebuilding.


    Brian Day / MCSA / CCNA, Exchange/AD geek.
    Monday, November 02, 2009 2:11 AM

  • RAID6 is purpose design to make rebuilds possible at a lower priority. I.e. have a disk fail, spin up a hotspare or physically replace it, while still retaining the R in RAID (redundancy) - that means the rebuild from a single-disk failure can be done *behind* other IO, and does not have to impact it significantly. There's no rush to rebuild the array - redundancy is still there...

    To avoid the RAID5 IO penalty (controller chewing up disk time with otherwise pointless physical disk reads of data to calculate parity bits from and then throw away) - we should write a whole stripe at a time, not crossing the stripe boundary.

    Exchange caching should be fine-tuned to do that, IMHO. I.e cache blocks in memory, keep the message in the transport layer as undelivered, until a whole stripe can be written, or the age of the cache data trips a timer (to stop the transport layer getting clogged), or perhaps the transport layer can tell the storage to "hurry up I need to go...". The bigger percentage of a contiguous stripe that we can write, the less we need to busy the disk reading the missing bits from the disk...

    So why do they recommend 256kb stripes? I know its a nice even 8 x 32kb database blocks but what's the science? How does Exchange per se take advantage of that more then any other stripe size... is it because that size happens to be an "average" for Exchange write size? i.e based on statistics? Or is it some other reason (I can't see any other reason than a statistical one...)

    Obviously I'm interested in reading about RAID5/6 recommendations with E14 - can you post a link? Or just let us know the source. I think I only care to say all this cos I haven't heard this from Microsoft. All i've heard is "you can use JBOD" (*yak*) :)

    P.S I'm a fan of smart use of commodity hardware and DAS. Look at XIV, that's all commodity hardware and DAS effectively... RAID6 is commodity level these days isn't it???
    Monday, November 09, 2009 4:22 AM
  • See my other comments - I still think JBOD is silly, for anyone. Unless you have some smart algorithms and software to analyse DAGs and move mailboxes to level the *storage* across each DAG and level the *IOs* across all DAGs. Course inventing that would be fun and who knows how it might advance computer science. But alternatively, why not just use RAID (and it should be RAID6 forever IMHO...)
    The whole JBOD discussion seems like just wasted energy all over the world ...
    Monday, November 09, 2009 4:31 AM
  • Some additional thoughts.

    Keep in mind that "JBOD" is only an option if you have three or more DAG copies.  And even then, the JBOD option (IMO) is only for the DB/Log storage.  I plan on still RAID1 mirroring my OS/binaries, since I figure an OS drive lost would risk all databases on one copy down.   This way a single disk failure on a single server at worst would only impact one database/log disk.

    Monday, November 09, 2009 2:28 PM
  • I think the biggest argument against RAID-5/RAID-6 vs JBOD is the cost of rebuild and impact of a single drive failure. At least with the implementations I have worked with on the IBM x3650, rebuilds basically take down the volume group (I seem to remember 60 - 80 % performance degradation).

    With JBOD with single database/log alignment, impact of failure will be localized to a single database (hopefully a passive copy no less) and the reseed should not impact performance of the active databases too much I assume.
    Wednesday, November 11, 2009 12:17 AM
  • So why do they recommend 256kb stripes? I know its a nice even 8 x 32kb database blocks but what's the science? How does Exchange per se take advantage of that more then any other stripe size... is it because that size happens to be an "average" for Exchange write size? i.e based on statistics? Or is it some other reason (I can't see any other reason than a statistical one...)
    256KB stripe helps keep as much data on one spindle as possible to reduce overall read/write penalties behind the storage controller in a non-JBOD confguration. The product team has stated that you can destroy a lot of the I/O improvements they've made by not using a 256KB stripe, thus increasing the number of I/O the controller must do.
    Brian Day: MCSA 2000/2003, CCNA, MCTS: Microsoft Exchange Server 2010 Configuration, Overall Exchange/AD Geek.
    Wednesday, November 11, 2009 12:31 AM
  • Is 256KB a special number or should the answer be use the biggest stripe your configuration supports (which is typically 256KB)?
    Wednesday, November 11, 2009 5:04 PM
  • Is 256KB a special number or should the answer be use the biggest stripe your configuration supports (which is typically 256KB)?

    That's a good question that I don't have the answer to.

    Obviously to some degree if your controller has copious amounts of write cache the I/O penalty for writes may not be that bad if you used <256KB, but you can't really cache reads so the penalty will still exist since it'll be making multiple read calls for smaller stripes.
    Brian Day: MCSA 2000/2003, CCNA, MCTS: Microsoft Exchange Server 2010 Configuration, Overall Exchange/AD Geek.
    Wednesday, November 11, 2009 5:22 PM
  • With JBOD with single database/log alignment, impact of failure will be localized to a single database (hopefully a passive copy no less) and the reseed should not impact performance of the active databases too much I assume.

    Just to add my two cents here.  Another advantage of the JBOD configuration is that you have the capability to reseed a database from a passive copy which would cause no penalty to the active database.  The requirement for this is that you have at minimum two copies for every active database, but if you were using JBOD, you should already have this to meet N+1 requirements :)
    Sean | http://seanv.wordpress.com
    Wednesday, November 11, 2009 7:38 PM
  • Question around terminology - are you referring to the Stripe size? Or are you in fact referring to the Block size?

    (a stripe being made up of multiple blocks, one block per disk/spindle)

    When you create a RAID5/6 array, normally you get to choose the block size. Not the stripe size. Stripe size is a function of number of spindles, and controlling that is problematic - you may have to create a RAID5/6 array that contains a less than ideal number of disks e.g. 9/10 disks, whereas you might prefer to use 12 or 14 to use all slots in your disk chassis for the one array to get the volume size you actuall want.

    Wednesday, November 11, 2009 9:40 PM
  • The stripe size. Unfortunately RAID5 has been such a horrible idea for Exchange for so long that I've not worked much with it and can't remember if stripe/block terms get flipped around.

    For Exchange 2007 when configuring RAID1+0 volumes we use a 128KB stripe and for Exchange 2010 when not using JBOD we use 256KB.


    *Edit*

    I just ran through a couple RAID controllers, one from HP and one from IBM an I don't see block size being a configurable option for RAID5 at all. They did offer me stripe size as I expected. Are you thinking of the Allocation Size when you format the drive from within Windows?


    Brian Day: MCSA 2000/2003, CCNA, MCTS: Microsoft Exchange Server 2010 Configuration, Overall Exchange/AD Geek.
    Wednesday, November 11, 2009 10:16 PM
  • Hi Brian,   RAID5 can be as fast but it depends upon the controller. ...it is a little know fact about the importance of the BatteryBacked write cache on the Raid controller.

     RAID 5 is NOT substantially slower than RAID 0,
    because this does not take into account the increased performance that is achieved when BBWC is present and enabled on the controller

     
    Here is an interesting note from the HP site.
    DESCRIPTION

    HP Smart Array Controllers exhibit improved performance in RAID 5 configurations when a Battery-Backed Write Cache (BBWC) module is present and enabled. This notice describes how the BBWC improves performance.

    DETAILS

    When writing sequential data, the write cache enables the controller to store incoming data and combine it into a full stripe, compute the parity for that stripe, then write the full stripe to the array in a single operation. This can result in improved write performance on a RAID 5 array, achieving write performance close to that of a RAID 0 array.

    When reading data, RAID 5 and RAID 6 (RAID ADG) read performance is typically better than a RAID 1+0 configuration of equal capacity (using the same size, speed, and family of physical hard drives) because there are more drives in a RAID 5 or RAID 6 configuration reading data simultaneously. Additionally, RAID 5 and RAID 6 read performance is approximately equal to RAID 0 read performance in configurations of equal capacity.

    Although industry documentation generally maintains that write performance of RAID 5 is substantially slower than RAID 0, this does not take into account the increased performance that is achieved when BBWC is present and enabled on the controller. Additionally, a larger cache size provides greater write performance than a smaller cache size.

    Note: Write performance is also affected by the type of write operations (random or sequential), stripe depth or distribution factor, number of disks in the array, drive type (Ultra320, Ultra3, Ultra2, SATA, SATA II, SAS, etc.), and drive speed (10K, 15K, etc.)

    Note2: Tested and confirmed this BBWC read/write performance with a number of controllers and hdds...and seen the poor rebuild issue without BBWC, get the BBWC and enable it, never use a hotswap disks just put it into the array and use the additional spindles for increased i/o.

    deam : Microsoft SQL performance improving like the i/o improvements in Exchange 2010



    Cheers,
    Out_theBack

    Thursday, November 12, 2009 1:17 AM
  • Hi Brian,   RAID5 can be as fast but it depends upon the controller. ...it is a little know fact about the importance of the BatteryBacked write cache on the Raid controller.


    Well aware of BBWC, but as you said it isn't very useful for reads and personally I'd rather be reading from a couple disks with a large stripe than many disks with smaller stripes. :) Storage vendors love to say cache solves all of life's problems, but real life seems to show other trends with Exchange. :)

    Plus we cannot forget the detrimental performance during RAID5 rebuilds.


    Brian Day: MCSA 2000/2003, CCNA, MCTS: Microsoft Exchange Server 2010 Configuration, Overall Exchange/AD Geek.
    Thursday, November 12, 2009 1:45 AM
  • Just noticed the Exchange 2010 Mailbox Server Role Requirements Calculator has a note on RAID configuration on the Storage Design page:

    The recommended RAID stripe size (the unit of data distribution within a RAID set) should be configured to 256KB or greater. 
    Thursday, November 12, 2009 10:54 PM
  • Perhaps you have miss read the article: It says that IS very useful for reads also.

    " When reading data, RAID 5 and RAID 6 (RAID ADG) read performance is typically better than a RAID 1+0"
    AND
    "When writing sequential data, the write cache enables the controller to store incoming data and combine it into a full stripe........ achieving write performance close to that of a RAID 0 array"

    Our tests confirm the value and importance of the BBWC and mis-conception within the industry of the poor performance of RAID 5/6.

    Cheers
    Out_theBack
    Saturday, November 14, 2009 3:33 AM
  • Perhaps you have miss read the article: It says that IS very useful for reads also.

    " When reading data, RAID 5 and RAID 6 (RAID ADG) read performance is typically better than a RAID 1+0"
    AND
    "When writing sequential data, the write cache enables the controller to store incoming data and combine it into a full stripe........ achieving write performance close to that of a RAID 0 array"

    Our tests confirm the value and importance of the BBWC and mis-conception within the industry of the poor performance of RAID 5/6.

    Cheers
    Out_theBack

    I'm pretty sure I didn't, but I'll read it again. Where does it say specifically that BBWC itself makes reads faster?
    Brian Day, Overall Exchange & AD Geek
    MCSA 2000/2003, CCNA
    MCTS: Microsoft Exchange Server 2010 Configuration
    LMNOP
    Saturday, November 14, 2009 3:21 PM
  • Unrecoverable Bit Error rate on large storage seems like a reason not to use RAID 5.

    Interesting article titled "Why RAID 5 stops working in 2009"

    http://blogs.zdnet.com/storage/?p=162
    Saturday, December 19, 2009 3:55 AM
  • I don't disagree with everything that person says, but some of the logic is flawed in my opinion. If you have 7 drives in a RAID 5 array and each drive has a 3% chance of failure like their example, you do not have a 21% chance of a drive failing. You still have a 3% chance.
    Brian Day, Overall Exchange & AD Geek
    MCSA 2000/2003, CCNA
    MCTS: Microsoft Exchange Server 2010 Configuration
    LMNOP
    Saturday, December 19, 2009 4:24 PM