0

I am currently working on a project for my Data Structures Course that involves a Binary Search Tree built using a doubly linked list-type format (where each node has a left, right, and parent pointer). The class managing the tree also has a root pointer and a current pointer (like a cursor).

One of the functions I need to build calculates the standard deviation of the entire tree. Unfortunately, I'm getting stuck on how to approach this function without using a standard library container (It's heavily discouraged within the course). I understand how to get the sum of the entire tree, but being able to access each number individually to compute the standard deviation is what is confusing me (since I can not store the tree as an array and access the contents sequentially.

Please forgive me if this is a basic question or I am missing some obvious approach. I tend to struggle with data structure concepts and especially when it comes to recursion. I'm not requesting someone to write the function, but at least to point me in the right direction. Thank you!

Edit: Thank you trincot for your help, I was thinking of the method the wrong way but your explanation helped!

  • 1
    What is the formula you're using for standard deviation? The point being is that the formula presented in most statistics books is not friendly since you have to make multiple passes over the data. [See this](https://stackoverflow.com/questions/1174984/how-to-efficiently-calculate-a-running-standard-deviation) for alternatives. – PaulMcKenzie Apr 30 '21 at 02:14
  • *I understand how to get the sum of the entire tree* -- Then getting the variance (which the link I posted describes) in a single pass is no different. Once you have the variance, you simply take the square root of that total to get the standard deviation. This is not really a C++ issue, but a "choosing the appropriate formula" issue. – PaulMcKenzie Apr 30 '21 at 02:24

1 Answers1

0

I understand how to get the sum of the entire tree

So you know how to inspect the value of each node and do something like:

sum += current_node->value;

Then keep track of a count too (unless you already keep track of the count as a list property):

sum += current_node->value;
node_count++;

After you have visited all the nodes, you can calculate the mean:

mean = sum / node_count;

Then traverse all the nodes of the tree a second time, and calculate the sum of the squared deviations, starting from zero:

sum_sq_dev += (current_node->value - mean) * (current_node->value - mean);

After you have visited all the nodes for accumulating that sum of squared deviations, calculate the variance from it, and the standard deviation:

variance = sum_sq_dev / node_count;
std_dev = sqrt(variance);

See Wikipedia on sample variance

trincot
  • 211,288
  • 25
  • 175
  • 211