I have a function that takes a DataTable
as a parameter and returns an object of type NormalData
for each column in the data table
NormalData definition
public class NormalData
{
//AttributeName = ColumnName of DataTable
public string AttributeName { get; set; }
//each column will have its mean and standard deviation computed
public double Mean { get; set; }
public double StandardDeviation { get; set; }
//a DataTable with three columns will create an IEnumerable<NormalData>
//with a count of three
}
The following works, but I would like a second opinion on how I implemented it:
public static IEnumerable<NormalData> GetNormalDataByTableColumns(DataTable dt)
{
//get list of column names to iterate over
List<string> columnList = GetDataTableColumnNames(dt);
List<NormalData> normalDataList = new List<NormalData>();
for (int i = 0; i < columnList.Count; i++)
{
//creates a NormalData object for each column in the DataTable
NormalData normalData = new NormalData();
//find average
normalData.Mean = GetColumnAverage(dt, columnList[i]);
//find stDev
normalData.StandardDeviation = GetColumnStDev(dt,columnList[i],normalData.Mean);
normalData.AttributeName = columnList[i];
//add to NormalDataList
normalDataList.Add(normalData);
}
return normalDataList;
}
private static List<string> GetDataTableColumnNames(DataTable dt)
{
return (from DataColumn dc in dt.Columns
select dc.ColumnName).ToList();
}
private static double GetColumnAverage(DataTable dt, string columnName)
{
return dt.AsEnumerable().Average(x => x.Field<double>(columnName));
}
private static double GetColumnStDev(DataTable dt, string columnName,double average)
{
var squaredDiffs = (dt.AsEnumerable()
.Sum(x => (x.Field<double>(columnName) - average) *
x.Field<double>(columnName) - average));
return Math.Sqrt(squaredDiffs / dt.Rows.Count);
}
What I feel is poor design is the parameter list that GetColumnAverage
and GetColumnStDev
are required to take. In reality, they should only need a list of numeric types (not necessarily double, but it's hardcoded at the moment) to compute their values. This is, however, the only way I've gotten this to work this morning. What are some rules that I'm breaking here in this design? How can I amend this so that the GetColumn..
functions only take the DataColumn
that's being iterated over in the for
loop of columnList
?
EDIT: the average
variable is changed for each column and cannot be reused. Or is it possible that this is OK design and I need to have overloaded versions of these methods if I don't need to compute the standard deviation and yes, only the average?