Lab_1_Descriptive_Statistics - The Evergreen State College

Download Report

Transcript Lab_1_Descriptive_Statistics - The Evergreen State College

Summary Statistics Excel Tutorial
Using Excel to calculate descriptive statistics
Core Quantitative concepts and skills
Statistics: Mean, Mode, Median and Standard
Deviation
Supporting Quantitative concepts and skills
Visual Display of Data – Bar Charts
Prepared for SSAC by
*David McAvity – The Evergreen State College*
© The Washington Center for Improving the Quality of Undergraduate Education. All rights reserved. *2007*
1
Number of Children in the Family
We’ll use this example about the number of children in the families of a class of 80 to
illustrate how to use Excel to calculate a variety of descriptive statistics. The data has been
grouped into a frequency table, where the values, x, are the number of children in a family,
and the frequency,f, is the number of people who have families of that size.
Create a worksheet like this one and enter the data as shown in yellow cells. Use the Sum()
function in the peach cell to add up the frequency column to find the total number of people
in the class. This is called the population size, N.
N f
A
1
2
= cell with a number in it
B
Children in the family
3 x (children)
4
1
5
2
6
3
7
4
8
5
9
6
10
7
11
8
12
9
13 Total
= cell with a formula in it
f ( frequency)
9
32
24
5
4
3
0
2
1
80
Hint: to sum a column click in the cell you want the
sum to be in (eg B13) and type =Sum(). Inside the
parentheses type the cell range (in this case B4:B12).
The entry would look like this:
=Sum(B4:B12)
Instead of typing a cell range you can also just select
the cells you want with the mouse.
2
Graphing the Solution
To get a visual impression of the data it is useful to plot it. You can create a bar chart or pie
chart readily in Excel.
Create the following bar chart using the chart wizard . Make sure you label all the axes.
A
Number of Children in the Family
B
Children in the family
35
3 x (children)
4
1
5
2
6
3
7
4
8
5
9
6
10
7
11
8
12
9
13 Total
f ( frequency)
9
32
24
5
4
3
0
2
1
80
30
Frequency
1
2
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
Number of Children
To create a bar chart select the frequency column and click on the chart
wizard icon.
Choose the column chart type then click next. In the next
dialogue box click the series tab and then fill in the correct range of values
in the category (X) axis labels. Click next and then enter axis and title labels
3
Measures of Central Tendency
Now we use Excel to assist with calculating the three measures of central tendency, the mode,
median and mean. The mode is the most common value, the median is the middle value of ranked
data and the mean is the sum of the values divided by the total number of values:
x f
x
N
Write Excel formulas in the peach colored cells in order to help you evaluate the mean, mode
and median. Write down or calculate each of those statistics in separate cells of your
worksheet. ( You should get a mean of 2.85)
A
1
2
B
C
D
Children in the family
3 x (children)
4
1
5
2
6
3
7
4
8
5
9
6
10
7
11
8
12
9
13 Total
f ( frequency)
9
32
24
5
4
3
0
2
1
80
f
xf
9
41
65
70
74
77
77
79
80
9
64
72
20
20
18
0
16
9
228
This column is the cumulative sum of
frequencies (ie a running total). Each entry is
the sum of all previous frequencies. You need
this to find the median value.
Hint: Need help with entering formulas?
Here is an example: If you have a number in
Cell B2 and you want to calculate the square of
it in Cell C2 then in C2 you type =B2^2.
B
2
C
4 =B2^2
If you have another value in B3 then if you copy
cell C2 to C3 the formula will update to
reference C3. This is called a relative reference.
If you want the formula to always refer to a
particular cell (eg B2) use an absolute
reference: type =$B$2^2 instead.
B
2
C
4 =$B$2^2
4
Standard Deviation
The standard deviation is the measure of the spread of data. It is a measure of the average
distance of values from the mean value. More specifically it is the square root of the mean
squared deviations from the mean. Its formula is:

 (x  x)
2
f
N
Use Excel formulas to calculate the last column below. It will be helpful to use an absolute
reference to the cell containing the mean value in your formula, so that when you copy the
formula to new rows the reference to the mean does not change. Now calculate the
standard deviation in a separate cell. (You should get 1.58).
A
1
2
B
C
D
f ( frequency)
f
xf
E
Children in the family
x (children)
3
4
1
5
2
6
3
7
4
8
5
9
6
10
7
11
8
12
9
13 Total
9
32
24
5
4
3
0
2
1
80
9
41
65
70
74
77
77
79
80
(x  x)2 f
9
64
72
20
20
18
0
16
9
228
30.8025
23.12
0.54
6.6125
18.49
29.7675
0
53.045
37.8225
200.2
5
Lab Assignment
You have used Excel to calculate descriptive statistics from data that is arranged in a frequency table. If the data is not in a frequency table, but is
entered as raw data in a column or row, Excel has function that can calculate the mean, mode, median and standard deviation directly, without
having to use the mathematical equations explicitly. In the following exercises I want you to calculate the descriptive statistics using the method in
this tutorial.
1.
After the class has completed the following on-line Number Games question set*, upload the data file called
Number_Games_Data.csv from our program space on the MASU server. Organize the data from each question
into frequency tables (count up the number of people giving each response), create a bar chart, and than find the
mean, mode, median and standard deviation for each question using the method you learned in this tutorial.
2.
After calculating the statistics reflect on the answers a bit, did they turn out as expected? In Question 1 does the
distribution look random – are people choosing number randomly as asked? In Question 2, how successful were
people at choosing the same number as everyone else? If you were asked this question again, what number
would you choose? In Question 3, how successful were people at not choosing the same number as everyone
else? Were you successful? In Question 4, is the mean value less than in Question 1?. What is 2/3 of the value of
the mean in Question 4. Did you guess right?
3.
Now answer the Number Games question set again. Pay particular attention to question 4. If your goal was to
choose a number equal to the mean, then the mean you calculated from question 1 might be a good guess.
However, question 1 asks you to guess the number closest to 2/3 of the mean, which is less. But if everyone
chooses a number smaller than the mean value in question 1, then the mean value in question 4 be smaller. Hmm
…
3.
Write a one page mini-paper discussing the results, using the statistics as supporting evidence. Your paper
should included a comparison of means and standard deviations you calculated for each response, and offer a
suggestion about why or why not you see a difference. Your paper should include an introductory paragraph
introducing what you think the game theoretical idea behind the four questions might be, a main body where you
use statistics to illustrate or support some point you are making, and a concluding paragraph where you state your
main “findings”. You should include graphs in your paper to make direct comparisons, but do not included too
many. This is a short paper. I am mainly interested in finding out how well you understand what the numbers you
calculated mean. The paper is due Wednesday morning of week 2
*here is the full html address in case the above link is not accessible.
http://spreadsheets.google.com/viewform?formkey=dGtQRlRlR2xkc1lLMGR6VUNva0FOOFE6MA..
6