Integration Services uses transformations to manipulate data during an ETL dataflow. Transformations can be used in [[articles:SSIS Dataflow
Task|Data Flow Tasks]], between data source and destination components, or other transformations.
Sometimes the source data has large volumes, and during development, less rows would be more comfortable to keep the speed of the tests high. The Percentage Sampling transformation can be used for random sampling, to create a small but representative dataset.
A percentage of input rows can be selected using Percentage Sampling, defined in the SamplingValue property - which can be overridden with SSIS Expressions.
From the statistics point of view: the population (and the sampling frame) is the whole input, and the sampling method is simple random sampling - which is a probability sampling: all element of the population has an equal probability of selection. So note
that the count of the selected rows will not be exactly the product of the input row count and the specified percentage: for 100 input rows and 20 as SamplingValue it might mean sometimes 18, sometimes 25. Basically every row has a probability of being selected
as it is defined in the SamplingValue property.
If a specified number of the input rows should be selected by a transformation instead, check the Row Sampling Transformation.
This transformation is a row (synchronous) transformation, and supports one input and two outputs: the sample output and the unselected output. It does not support an error output since data values on the outputs do not depend on expression evaluation.
For more detailed information on this transformation, please refer to
this MSDN article.
For the complete list of SSIS transformations, check this Wiki article.