none
How to Scale my SQL Server 2008(by 5 fold)

    Question

  • Hi All

    My DB is SQL Server 2008 and i want to scale up the system by 5 fold(may be more in future).

    The system currently has 95% read & 5% write from Front END. And from Backend actual data is bulk loaded using ETL.

    I looked in to options like

    1. Scaling with shared Database(using SAN)
    2. SODA(new caching mechanism)
    3. Clustering in SQL Server 2008 r2

    but all either allows only read or duplicate the existing data.

    can some one point me in write direction(replication of data would be my last but one option).

    Thanks
    JC

    Tuesday, April 03, 2012 8:01 AM

Answers

  • Good start, we need to be a little more granular. :-)

    500k users is probably from the application (a web app?) viewpoint. The actual number of concurrent database users is likely a very small fraction of that as there is likely some form of connection pooling/sharing mechanism for efficiency. It may be important to track both depending on your application design but for now, see if you can capture data from SQL Server to track how many users you have there.

    Next you need to figure out what your hardware resource pattern may loook like. In other words, if at 100 concurrent users, you will consume a CPU, b RAM, c IOPS, d MB/s, etc.... what will you consume at 500, 1000, 5000, 20,000, etc... SQL Server maxes out at 32K concurrent connections but most applications don't even get past the halfway mark even when there are a hundred thousand application users.

    Once you have those numbers, you can estimate what your requirements might be as your grow the number of SQL Server users and determine what server, architecture, etc... will meet your needs.

    Of course, don't forget the design, query and hardware optimizations suggested by others. In most environments, the queries or design will be the scale limiter.


    No great genius has ever existed without some touch of madness. - Aristotle

    Friday, April 06, 2012 3:23 AM

All replies

  • Define scale? Performance? Data? Add 10 more processors and 5 times the disk spindles :)

    Seriously, federations may of interest... really depends on what you need...

    http://msdn.microsoft.com/en-us/library/ms190381.aspx 


    Louis

    Tuesday, April 03, 2012 2:00 PM
    Moderator
  • Good morning,

    What exactly are you trying to accomplish? Do you want to shard data between multiple servers or use multiple servers accessing the same data in order to reduce workload per server?

    Speaking of data sharding/federations - that approach really depends on how you partition your data between servers and how many "shared" entities you need to support. In case if there are a lot of dependencies between data (e.g. possibility of cross-server joins and/or need of constant data replications between servers) sharding in general would not be the best solution. Of course, even in that case, you can architect your system as SOA and solve some of the issues but not all system would fit/benefit from that approach.

    In case, if you want to reduce workload per server a lot of things depend on the system. Scalable shared database could be an option although it would depend on a couple factors. First, how data is updating. If it happens on the batches - that would fit very well - you just disconnect all nodes but one during the update window, convert db to read/write mode and do the batch load. Although it's obviously not the best choice if data is constantly changing. Of course, you can have separate server/database where you store the data that can be modified and build app server that combines data from the multiple sources but at the end it would lead to the quite complex design. Another thing to keep in mind about Scalable Shared Database - you will need to build "load balancer" by yourself. Which also can be non-trivial things.

    Other options depend on requirements. With SQL Server 2008 you can report against snapshot from the mirrored database. That will require Enterprise edition as well as some code to manage snapshots/failovers. You can use log shipping and report against instances where you ship logs. You can use replication. Keep in mind licensing though - you'll need to obtain licenses when you start using secondary servers for reporting. And with SQL Server 2012 you have other options.

    Although, to be absolutely frank, if I were in your shoes, my first step would be system optimization and tuning. There is the very good chance that in the long run it would be cheaper and better option that trying to scale system by multiple servers (just consider licensing, hardware, management and development cost). Don't take me wrong - there are some cases when scaling and data sharding make sense - especially if you're dealing with the clouds and limitations it introduces (DB size in Azure, IO latency in AWS, etc), but overall it's usually the exception rather than the rule


    Thank you!

    My blog: http://aboutsqlserver.com


    Tuesday, April 03, 2012 2:46 PM
  • Thanks for the suggestions

    replicating data for view part will not fit my need as the read operations are most from UI screens rather that reports as such.

    As i dont have much flexibility in modifying the scripts at his point of time, i am going with some fine tunning and increasing the Hardware(Ram, Prcessor). and i am also planning to separate data into two servers based on regions by which load almost reduce by half.

    Thanks...

    Wednesday, April 04, 2012 9:28 AM
  •   Dmitri,

     

    First of all congrats on becoming SQL server MVP. It was well deserved.

     

    Another way to scale out SQL Server is using Peer to Peer Replication.

    Out of curiosity, why SQL server won't let us scale out something more like ORACLE RAC ( Real Application Clustering in which multiple nodes can read and most noticeably Write to same shared disk)

    I am a die hard SQL serve fan (and only SQL DBA) in a company that employs more than 16+ ORACLE DBAs. Sometimes this troubles me…not sure I am guessing may be something in Windows OS that may be imposing such restrictions for SQL server product team.



    • Edited by Chirag Shah Wednesday, April 04, 2012 3:17 PM
    Wednesday, April 04, 2012 3:12 PM
  • Thank you, Chirag!

    Peer to Peer replication would work although (as all other techniques) would introduce a lot of headache :)

    When someone ask me "how to scale out the system?" I usually answer "why do you need to scale out"? There are, of course, some cases when it makes sense but usually performance tuning and tiered storage would do the trick much easier and cheaper.

    Based on what I understand, author wants to copy/replicate data between multiple servers in order to reduce workload per server. It's not federation/sharding, it's kind of load balancing model (yes, Oracle RAC would be very nice here unless there is IO contention author is trying to solve). I could be wrong but I'm unaware of any solutions on the market that allow to load-balance clients between multiple SQL Servers. And creating that service is anything but trivial task. Well, it's not very complicated to make something that just works - for example selecting SQL Server on round-robin basis. But create the product that works efficiently and takes current SQL Servers activity/load into consideration is completely different story. I don't want to be the one who needs to architect that solution :)


    Thank you!

    My blog: http://aboutsqlserver.com

    Wednesday, April 04, 2012 7:58 PM
  • We spend a LOT of energy to make SQL Server scale very well as you increase the hardware resources available and many of the largest database workloads in the work are run on a single server (both for SQL Server and other databases, including UNIX platforms).

    In your case, can you define what exact "scale up the system by 5 fold" means? What is your measure of 5x? Number of transactions per second? Database size? Concurrent users? Mixed of a few things?

    A current shipping class x64 server can hold 8-sockets with up to 80 cores and 2TB of RAM. It's also got over a dozen high speed PCI-E slots for your HBAs to hook up some serious storage throughput. Not many workloads can make this beast break a sweat.

    If you can share more details about your workload, folks here can provide better guidance on what's appropriate. While there are lots of ways to scale, not all are equal in terms of effort and cost (up front and long term).


    No great genius has ever existed without some touch of madness. - Aristotle

    Wednesday, April 04, 2012 8:37 PM
  • Scale up in my context here is to support 2.5 million users oposed to 500k currently. in future there will be more users coming. Relatively my concurrent users, transaactions by every user is going to go up. But i have a limitation here to do minimal changes to the scripts as it involves high amount of testing and development activities. Other thing is i have option to move my DB to our private cloud, so infrastructure/cost would not be much of problem..

     

    Thursday, April 05, 2012 7:22 AM
  • The easiest way to scale up in the future is to run an optimized system today.

    DB & query optimization article:

    http://www.sqlusa.com/articles/query-optimization/

    Disk optimization:

    http://www.sqlusa.com/bestpractices/configureharddisk/

    You have to perform performance diagnostics (SQL Profiler and PerfMon) to make sure that you have sufficient (correctly configured) CPU, memory and disk resources to support high performance.

    I have found that the main challenge on a busy consumer website DB system is to support peak usage.


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    Thursday, April 05, 2012 4:52 PM
    Moderator
  • Good start, we need to be a little more granular. :-)

    500k users is probably from the application (a web app?) viewpoint. The actual number of concurrent database users is likely a very small fraction of that as there is likely some form of connection pooling/sharing mechanism for efficiency. It may be important to track both depending on your application design but for now, see if you can capture data from SQL Server to track how many users you have there.

    Next you need to figure out what your hardware resource pattern may loook like. In other words, if at 100 concurrent users, you will consume a CPU, b RAM, c IOPS, d MB/s, etc.... what will you consume at 500, 1000, 5000, 20,000, etc... SQL Server maxes out at 32K concurrent connections but most applications don't even get past the halfway mark even when there are a hundred thousand application users.

    Once you have those numbers, you can estimate what your requirements might be as your grow the number of SQL Server users and determine what server, architecture, etc... will meet your needs.

    Of course, don't forget the design, query and hardware optimizations suggested by others. In most environments, the queries or design will be the scale limiter.


    No great genius has ever existed without some touch of madness. - Aristotle

    Friday, April 06, 2012 3:23 AM