none
Aggregate Transform to return Standard Deviation

    Question

  • Is there a way to extend the Aggregate Transform to calculate the Standard Deviation along with the other aggregate functions?  Thanks.
    Sunday, July 21, 2013 9:21 PM

Answers

  • Here's how to do a custom aggregate calculation in an SSIS Script transformation.

    First read the instructions here Creating an Asynchronous Transformation with the Script Component.  Here "Asyncronous" means that the transform doesn't output rows for each input row.  "Unsyncronized" would probably be a better word.

    Anyway you configure the script transform's output to not be linked with the input:

    Then in the Script transform you need to aggregate the row values as they come through and then output a single row at the end, like this:

        double sum_val = 0;
        double count_val = 0;
        double sum_sqr_val = 0;
        public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
            sum_val += Row.val;
            count_val += 1;
            sum_sqr_val += Row.val * Row.val;
        }
        public override void FinishOutputs()
        {
            this.Output0Buffer.AddRow();
            double avg_of_squares = sum_sqr_val / count_val;
            double avg = sum_val / count_val;
            double square_of_avg = avg * avg;
            this.Output0Buffer.stdev = Math.Sqrt(avg_of_squares - square_of_avg);
            
            base.FinishOutputs();
        }

    And, you'll find that SSIS is full of stuff like this.  The built-in transforms and tasks are handy, but not intended to do everything you will ever need.  Instead SSIS has really good and easy extensibility for you to implement exactly the behavior you need, when necessary.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, July 25, 2013 8:07 PM
  • Staging the data in SQL Server is probably the easiest way.  You can do it in a data flow task, if you use this formula for Standard Deviation

    stdev(x) = sqrt( avg(x*x) - avg(x) * avg(x) )

    http://en.wikipedia.org/wiki/Standard_deviation

    So create a derived column with the for the squares and in your aggregate calculate the average and the average of the squares.  Then add a derived column having a formula like SQRT(avg_of_val_sqr - avg_of_val * avg_of_val).

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, July 25, 2013 6:46 PM

All replies

  • No need to transform. There is a function built-in: STDEV. But if you really like to do all the work manually, here is the formula.

    Sunday, July 21, 2013 11:26 PM
  • No, you cannot extent the aggregate transform.
    If you'd want to, you could create your own custom component.

    If your source is SQL Server, it is much easier and faster to do the calculation in the source query.
    I guess that's what Vedran is hinting at.


    MCSE SQL Server 2012 - Please mark posts as answered where appropriate.

    Monday, July 22, 2013 9:56 AM
  • Crap!!!  That really stinks as my source is a flat file.  What's up with the Aggregate returning Min, Max and Avg only? - how dumb is that?  Am I the only one that thinks the Aggregate Transform is a bit lacking in features or is there another method to calculate STDEV from a flat file source without first having pump into a db table?
    Thursday, July 25, 2013 3:10 PM
  • The Aggregate transform is indeed a bit lacking.

    Maybe you can load it into memory with a script component and use a .NET library to calculate stdev.


    MCSE SQL Server 2012 - Please mark posts as answered where appropriate.

    Thursday, July 25, 2013 4:09 PM
  • Koen, I appreciate the idea but I can't even conceptualize what that code would look like. In my case creating a SQL Server staging table and utilizing the SQL Server Aggregate functions would be quicker and cleaner.  Thanks for your time.
    Thursday, July 25, 2013 4:20 PM
  • Staging the data in SQL Server is probably the easiest way.  You can do it in a data flow task, if you use this formula for Standard Deviation

    stdev(x) = sqrt( avg(x*x) - avg(x) * avg(x) )

    http://en.wikipedia.org/wiki/Standard_deviation

    So create a derived column with the for the squares and in your aggregate calculate the average and the average of the squares.  Then add a derived column having a formula like SQRT(avg_of_val_sqr - avg_of_val * avg_of_val).

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, July 25, 2013 6:46 PM
  • Koen, I appreciate the idea but I can't even conceptualize what that code would look like. In my case creating a SQL Server staging table and utilizing the SQL Server Aggregate functions would be quicker and cleaner.  Thanks for your time.

    Yeah well, I have no idea either :)
    There probably will be some .NET function that can do the trick, but I'm no C# expert.

    Anyway, SQL Server has a built-in function waiting for you: STDEV


    MCSE SQL Server 2012 - Please mark posts as answered where appropriate.

    Thursday, July 25, 2013 7:33 PM
  • Here's how to do a custom aggregate calculation in an SSIS Script transformation.

    First read the instructions here Creating an Asynchronous Transformation with the Script Component.  Here "Asyncronous" means that the transform doesn't output rows for each input row.  "Unsyncronized" would probably be a better word.

    Anyway you configure the script transform's output to not be linked with the input:

    Then in the Script transform you need to aggregate the row values as they come through and then output a single row at the end, like this:

        double sum_val = 0;
        double count_val = 0;
        double sum_sqr_val = 0;
        public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
            sum_val += Row.val;
            count_val += 1;
            sum_sqr_val += Row.val * Row.val;
        }
        public override void FinishOutputs()
        {
            this.Output0Buffer.AddRow();
            double avg_of_squares = sum_sqr_val / count_val;
            double avg = sum_val / count_val;
            double square_of_avg = avg * avg;
            this.Output0Buffer.stdev = Math.Sqrt(avg_of_squares - square_of_avg);
            
            base.FinishOutputs();
        }

    And, you'll find that SSIS is full of stuff like this.  The built-in transforms and tasks are handy, but not intended to do everything you will ever need.  Instead SSIS has really good and easy extensibility for you to implement exactly the behavior you need, when necessary.

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, July 25, 2013 8:07 PM