Adding Standard Deviation to LINQ

by Chris Bennett

Learn how to implement a simple Standard Deviation function for LINQ (Language Integrated Query).

If you have performed much in the way of basic data analysis using LINQ (Language Integrated Query), you probably have noticed that standard deviation is not included out of the box. Why Standard Deviation is missing is a mystery as LINQ does include the other basic aggregators such as min, max, avg, sum, and count. To construct a function to perform standard deviation over a set of numbers, you need to examine the math. Figure 1 shown below is the basic standard deviation function.

Figure 1: The math behind the basic stand deviation function.

In simpler terms, standard deviation is calculated as follows:

1. Calculate the average of the values.
2. Subtract each value from the average.
3. Take the result to the second power.
4. Sum up the result of steps 1-3 over all of the values.
5. Divides the sum by the number of values.
6. Take the square root of the result.

Create the Standard Deviation Function

Now that you are familiar with the math, you can create a function that performs this math over a set of values of type double, as shown below.

```private double CalculateStdDev(IEnumerable<double> values)
{
double ret = 0;

if (values.Count() > 0)
{
//Compute the Average
double avg = values.Average();

//Perform the Sum of (value-avg)^2
double sum = values.Sum(d => Math.Pow(d - avg, 2));

//Put it all together
ret = Math.Sqrt((sum) / values.Count()-1);
}
return ret;
}
```

LINQ Usage Example

This function performs the math shown above; it also confirms that the IEnumerable passed in contains elements. Before you create the LINQ statements to utilize this function, you will start with a basic query to calculate the average of the sub total and number of orders for each Customer ID, as shown below.

```var query = from rec in dc.SalesOrderHeaders
group new { rec } by new
{
rec.CustomerID
} into g
select new {
g.Key.CustomerID,
Avg = g.Average(p =>
Convert.ToDouble(p.rec.SubTotal)),
Count = g.Count()
};
```

Next, you add the statements to utilize the function you just created and return standard deviation, as shown below.

```var query = from rec in dc.SalesOrderHeaders
group new { rec } by new
{
rec.CustomerID
} into g
select new {
g.Key.CustomerID,
Avg = g.Average(p => Convert.ToDouble(p.rec.SubTotal)),
Count = g.Count(),
StdDev = CalculateStdDev(from o in dc.SalesOrderHeaders
where o.CustomerID == g.Key.CustomerID
select Convert.ToDouble(o.SubTotal))
};
```

This LINQ query will return the CustomerID, Order Sub Total Average, and the Number of Orders as in the first example. Additionally, it will now return the standard deviation. It does this through a call to the CalculateStdDev and through the use of a nested query.

Conclusion

You have created the standard deviation function as well as an example to demonstrate the results in LINQ. It is important to note that using this function with LINQ will increase the number of calls to the database. Essentially, LINQ will execute the outer query to retrieve the overall results, and then it will return the order subtotal for each customer, one at a time, and execute the call to CalculateStdDev. This method is primarily geared towards relatively small results sets. If you are working on large results instead, it would be more efficient to create a store procedure or a table-valued function and use LINQ to return the results.