Answered by:
Aggregate Transform to return Standard Deviation
Question
Answers

Here's how to do a custom aggregate calculation in an SSIS Script transformation.
First read the instructions here Creating an Asynchronous Transformation with the Script Component. Here "Asyncronous" means that the transform doesn't output rows for each input row. "Unsyncronized" would probably be a better word.
Anyway you configure the script transform's output to not be linked with the input:
Then in the Script transform you need to aggregate the row values as they come through and then output a single row at the end, like this:
double sum_val = 0; double count_val = 0; double sum_sqr_val = 0; public override void Input0_ProcessInputRow(Input0Buffer Row) { sum_val += Row.val; count_val += 1; sum_sqr_val += Row.val * Row.val; } public override void FinishOutputs() { this.Output0Buffer.AddRow(); double avg_of_squares = sum_sqr_val / count_val; double avg = sum_val / count_val; double square_of_avg = avg * avg; this.Output0Buffer.stdev = Math.Sqrt(avg_of_squares  square_of_avg); base.FinishOutputs(); }
And, you'll find that SSIS is full of stuff like this. The builtin transforms and tasks are handy, but not intended to do everything you will ever need. Instead SSIS has really good and easy extensibility for you to implement exactly the behavior you need, when necessary.
David
David http://blogs.msdn.com/b/dbrowne/
 Edited by davidbaxterbrowneMicrosoft employee Thursday, July 25, 2013 8:08 PM
 Marked as answer by Mike YinMicrosoft contingent staff, Moderator Monday, July 29, 2013 1:07 AM

Staging the data in SQL Server is probably the easiest way. You can do it in a data flow task, if you use this formula for Standard Deviation
stdev(x) = sqrt( avg(x*x)  avg(x) * avg(x) )
http://en.wikipedia.org/wiki/Standard_deviation
So create a derived column with the for the squares and in your aggregate calculate the average and the average of the squares. Then add a derived column having a formula like SQRT(avg_of_val_sqr  avg_of_val * avg_of_val).
David
David http://blogs.msdn.com/b/dbrowne/
 Marked as answer by Mike YinMicrosoft contingent staff, Moderator Monday, July 29, 2013 1:07 AM
All replies

No need to transform. There is a function builtin: STDEV. But if you really like to do all the work manually, here is the formula.

No, you cannot extent the aggregate transform.
If you'd want to, you could create your own custom component.If your source is SQL Server, it is much easier and faster to do the calculation in the source query.
I guess that's what Vedran is hinting at.MCSE SQL Server 2012  Please mark posts as answered where appropriate.
 Proposed as answer by Mike YinMicrosoft contingent staff, Moderator Thursday, July 25, 2013 9:54 AM

Crap!!! That really stinks as my source is a flat file. What's up with the Aggregate returning Min, Max and Avg only?  how dumb is that? Am I the only one that thinks the Aggregate Transform is a bit lacking in features or is there another method to calculate STDEV from a flat file source without first having pump into a db table?



Staging the data in SQL Server is probably the easiest way. You can do it in a data flow task, if you use this formula for Standard Deviation
stdev(x) = sqrt( avg(x*x)  avg(x) * avg(x) )
http://en.wikipedia.org/wiki/Standard_deviation
So create a derived column with the for the squares and in your aggregate calculate the average and the average of the squares. Then add a derived column having a formula like SQRT(avg_of_val_sqr  avg_of_val * avg_of_val).
David
David http://blogs.msdn.com/b/dbrowne/
 Marked as answer by Mike YinMicrosoft contingent staff, Moderator Monday, July 29, 2013 1:07 AM

Koen, I appreciate the idea but I can't even conceptualize what that code would look like. In my case creating a SQL Server staging table and utilizing the SQL Server Aggregate functions would be quicker and cleaner. Thanks for your time.
Yeah well, I have no idea either :)
There probably will be some .NET function that can do the trick, but I'm no C# expert.Anyway, SQL Server has a builtin function waiting for you: STDEV
MCSE SQL Server 2012  Please mark posts as answered where appropriate.

Here's how to do a custom aggregate calculation in an SSIS Script transformation.
First read the instructions here Creating an Asynchronous Transformation with the Script Component. Here "Asyncronous" means that the transform doesn't output rows for each input row. "Unsyncronized" would probably be a better word.
Anyway you configure the script transform's output to not be linked with the input:
Then in the Script transform you need to aggregate the row values as they come through and then output a single row at the end, like this:
double sum_val = 0; double count_val = 0; double sum_sqr_val = 0; public override void Input0_ProcessInputRow(Input0Buffer Row) { sum_val += Row.val; count_val += 1; sum_sqr_val += Row.val * Row.val; } public override void FinishOutputs() { this.Output0Buffer.AddRow(); double avg_of_squares = sum_sqr_val / count_val; double avg = sum_val / count_val; double square_of_avg = avg * avg; this.Output0Buffer.stdev = Math.Sqrt(avg_of_squares  square_of_avg); base.FinishOutputs(); }
And, you'll find that SSIS is full of stuff like this. The builtin transforms and tasks are handy, but not intended to do everything you will ever need. Instead SSIS has really good and easy extensibility for you to implement exactly the behavior you need, when necessary.
David
David http://blogs.msdn.com/b/dbrowne/
 Edited by davidbaxterbrowneMicrosoft employee Thursday, July 25, 2013 8:08 PM
 Marked as answer by Mike YinMicrosoft contingent staff, Moderator Monday, July 29, 2013 1:07 AM