Home > RippleStat > Concept Tutors


Instructional Software for
Statistics & Experimental Psychology

RippleSoft Software

Concept Tutors

Sum of Squares for One-Way ANOVA
Sum of Squares: Calculation of Standard Deviation is Illustrated.
Degrees of Freedom: Thinking Differently about the Mean

Click here to open a page of full-sized card images

Calculation of the Sums of Squares for ANOVA
(One-Way Completely Randomized)

Overview: Click here to open a page with a video of the process
Step1: Generate and Plot the Data
Step 2: Calculate Total Sum of Squares
Step 3: Calculate Within Groups Sum of Squares
Step 4: Calculate Between Groups Sum of Squares

Step 1: Generate and plot the data

Generate the data set for the ANOVA. Click "Generate". Choose whether the null Hypothesis is to be true, false, or unknown (to the user).

Step 2: Calculate the Total Sum of Squares

Click on the "Total SS" button to calculate the deviations from the overall mean. The animation first shows a deviation graphically and puts the value in the "Deviations" field. After all deviations are calculated, the values are squared and accumulated in the ANOVA Source Table.

Step 3: Calculate Within Groups Sum of Squares

Click the "Within SS" to calculate the deviations of each observation for the appropraite group mean. Again the deviations are displayed, squared, and accumulated in the Source Table.

Step 4: Calculate the Between Groups Sum of Squares

Click the "Bet SS" to calcualte the Between Groups Sum of Squares. Click on any field of the Source Table to display the value. The field on the far right of the "Total" row will display the status of the null hypothesis (True or False).

Calculation of the Sum of Squares

A straight-forward implementation of the defining formula of the standard deviation. Click on the red "n" to toggle dividing the Sum of Squares by n or (n - 1)

This tool asks the student to reframe their thinking about the mean and standard deviation by showing how the deviation score is where the calculation process starts instead of with the mean. The contents of the "Explanation" tab of the "Info" stack is given below:

Explanation: Understanding Degrees of Freedom

The Deviation Score is the beginning.

The Mean is not the beginning.

The value of the Standard Deviation is based on Deviation scores.

  • A Deviation is the (observation - "constant")
  • If the "constant" is the Mean then, the total of the deviations is 0.
  • In a manner of speaking the Standard Deviation is the "Average Deviation Score"--actually it is the square root of the average of the squared deviation scores. The slight misstatement is useful to remember.

The definition of the Mean is "the number that yields a 'Sum of Deviation Scores' of 0". Mathematicians have developed a theorem that proves the sum of the observations divided by the number of observations meets this criterion.

Changing any observation in a data set changes the values of the deviation scores because a new constant is required to make the Sum of Deviations be 0.

The usual way of thinking puts the cart before the horse. We usually calculate the mean and then the deviation scores (so we can get the value of the standard deviation). In actuality, first I find the set of deviation scores that total 0; the "constant" I subtracted from each observation is given the name, Mean.

So, changing any score in a data set changes the mean because a new set of deviation scores must be specified. If I have 4 scores, any of the four can be changed.

Suppose instead, I am interested in describing how a change in one or more scores affects the standard deviation. To do that, it is important to hold the mean constant. If I don't hold the mean constant, the set of deviation scores is calculated from a new location--I am confounding changes in the scattering of the scores with changes in where the scores are "centered."

If, instead, were I to lock the value of the mean to a particular value, I could observe how changes in scores affect the scatter of the scores without a corresponding change in the "central" location of the data set.

Locking the mean implies the total of the scores is being held constant. I quickly discover I can't change all of the scores and keep the mean constant. I can however arbitrarily choose values for all the scores but one--that one score is forced to become the value that keeps the total of the scores and the mean constant.

Notice how locking the mean does not also lock the value of the standard deviation. A set of scores with a particular mean can have any value for the standard deviation.

If I have a data set of four observations, three of them are free to vary in an arbitrary manner; one is not. This is the meaning of "Degrees of Freedom": How many scores in the data set are free to change in an arbitrary manner.

The general principle is that one degree of freedom is lost for every constraint put on the data set. In calculating the standard deviation, the constraint is that the sum of deviations must be 0.

© 2005 - 2008 by Burrton Woodruff. All Rights Reserved. Modified Mon, Dec 24, 2007