Class 2 - ellenmduffy

Download Report

Transcript Class 2 - ellenmduffy

Class 2
Submit Homework
Questions
Quiz
Introduction to Web CT
Descriptive Statistics
Introduction to Excel
Web CT
• Address is http://webct.liu.edu (No www)
• Or you can click on it from my web pages
Logging Into Web CT
•
•
Your I.D. is first name.last name
Use all lower case & no spaces
•
Password is your soc. sec. #, no spaces
Descriptive Statistics
• Organize
• Summarize
• Clarify
• Present Data
Measures of Central Tendency
Mean
Group of friends ages: 29, 27, 22, 31, 32, 27
Number of individuals = n = 6
Each age = X, i.e. X1, X2, X3, X4…
General term is Xi
Sum of the ages = Σ Xi
Mean = Σ Xi/n
Mean
Add the ages = 168
Divide by 6
Mean (Average) = 28
Is this a good representation of the ages?
29, 27, 22, 31, 32, 27
Second Example of Mean
One friend is replaced by a great-grandpa
Ages: 29, 27, 22, 31, 32, 90
Mean = 231/6 = 38.5
Hey!!!
The answer is 38.5
Older than 5 of the 6 people there!
Doesn’t look so representative of the data to
me. What d’ ya think?
Property of the Mean
The mean gives a lot of weight to extreme
values.
The Median
Middle Value of an Array of Values
An array????
The values in ascending or descending
order.
How do we get the Median?
We have to pick one of the values.
Which one?
Pick it by its location in the array.
Formula for the location.
Location = (n+1)/2
Same Two Groups
29, 27, 22, 31, 32, 27
29, 27, 22, 31, 32, 90
There are 6 values
Location = (6+1)/2 = 7/2 = 3.5
Halfway between the 3rd & 4th values
The Median Value
Halfway between 3rd & 4th value
29, 27, 22, 31, 32, 27
29, 27, 22, 31, 32, 90
OOOPS! What 3rd and 4th value???
Do the arrays first
22, 27, 27, 29, 31, 32
Median = 28
22, 27, 29, 31, 32, 90
Median = 30
Commentary
The medians are 28 and 30.
Do these represent the data any better than
the mean did?
Depends on your opinion and on what you
want to convey and the circumstances,
etc.
More about the location
•
•
•
•
•
•
•
Another example: 7, 9, 22, 27, 33, 34, 80
How many values?
7
What is (n+1)/2
4
The median is the 4th value.
It's easier to find when there are an odd
number of values.
• What is the median?
• 27
Property of the Median
Completely ignores the extremes
Two more examples
24, 27, 29, 30, 34, 36, 39
2, 8, 10, 30, 40, 66, 90
Median 1 = 30
Median 2 = 30
Mode
Number that occurs most frequently
29, 27, 22, 31, 32, 27
What’s the mode?
27
29, 26, 22, 31, 32, 27
Mode?
There is none.
Is the Mode Useful
• Sometimes there is no mode
• 24, 24, 27, 29, 30, 30, 80
• The mode?
• There are 2 modes.
• So the usefulness of the mode is limited
Is the Mode Representative of the
Data?
29, 27, 22, 31, 32, 27
What’s the mode?
27
29, 27, 22, 32, 80, 80
What’s the mode?
80
Advantage of the Mode
• Can be used for qualitative data
• Family picnic group
3 children, 1 single adult, 1 widow,
2 married
What’s the mode?
• children
• Patients’ Diseases in Dr. Jones’ practice
10 diabetes, 14 coronary artery disease,
3 asthma
What’s the mode?
• Coronary artery disease
What you should know about
Measures of Central Tendency
• How to determine each of them.
• Properties, advantages, disadvantages
of each.
How Good is our Measure
• Whenever we summarize data, for
example showing central tendency, we
always show an approximation.
• We are always underestimating or
overestimating or even just throwing away
some data.
• One method of judging our estimate of
central tendency is to look at how closely
the individual values are clustered around
it.
Measures of Dispersion
• Indicate whether the values are
compressed or widely spread.
• 3 patients, ages 23, 24, 27
Mean = 24.67
• 3 patients, ages 1, 33, 40
Mean = 24.67
Measures of Dispersion
RANGE
Highest Value minus Lowest Value
_
Patients ages: 23, 24, 27.XX = 24.67
Patients ages: 1, 33, 40. X = 24.67
Ranges?
27-23 = 4
40-1 = 39
Properties of the Range
Easy to use – a quick indication.
Immediately showed the difference in our
two sets of patients.
BUT…
• Ignores most of the values
• Uses only the two extremes
Weakness of the Range
7, 20, 33, 48, 60, 70, 80
What’s the Range?
80 – 7 = 73
But look, a very different group
7, 25, 25, 26, 28, 29, 80
Range?
80 – 7 = 73
Interquartile Range
•
•
•
•
•
IQR = Quartile 3 – Quartile 1
Find quartiles
Data has to be in an array
Divide data into 4 equal parts
Dividing value between the top ¼ and the
bottom ¾ of the values = Q3 = 75th %ile
• Between the top ¾ and the bottom ¼ = Q1
= 25th percentile
Finding the Quartiles
• Have to find locations for Q3 and Q1
• For Q3 loc, find n*3/4
• If it’s not an integer, take the next higher
whole number.
• Find it in the array. That is Q3
• For Q1 loc, find n/4. If it’s not an integer,
take the next higher whole number.
• Find it in the array. That is Q1
Interquartile Range Example
• 29, 27, 22, 31, 32, 27, 31, 43
• What first?
• Array 22, 27, 27, 29, 30, 31, 31, 43
• Q1 = 27
Q3 = 31
• IQR = 31 – 27 = 4
Spread around the Mean
• Suppose we look at how far each
individual value is from the mean
• And take a sum of Xn – X
• Would give a good idea of the spread
• Could even make it into an average
spread by dividing by n
Variance
The Spread Around the Mean
7, 20, 30, 33, 50, 60, 80
The mean is 40
Variance
Difference between each value and the mean
Squared
Totaled
Divided by n-1
Calculate the Variance
7, 20, 30, 33, 50, 60, 80
Mean is 40
(7-40) + (20-40) + (30-40) + (33-40)
+ (50-40) + (60-40) + (80-40)
Finish Calculating the Variance
• Square the Differences
• Take the sum of the Squares
• Divide by n – 1.
• Why not divide by n. Discuss later.
n – 1 = degrees of freedom
A Change of Pace
We have more to do on Descriptive
Statistics but let’s do some calculations
using excel
Excel & Descriptive Stats
Turn on Computers
•
•
•
•
Go to Start (lower left)
All Programs
Office
Excel
Excel
• Use as a calculator
• Use statistical package
requires an “add-in”
Getting the Statistical Pkge
• Open excel, click tools, look for “Data Analysis”
• If it’s not listed, click Add-Ins
(if you don't see add-ins, click on the arrows at
the bottom of tools menu)
• Click the box next to Data Analysis and then
click OK
• (You may be told that you have to insert the cd
with excel program on it. If so, insert it.)
• You should notice the computer working to
download the add-in
• Click on tools again and you should see Data
Analysis in the list.
Using Excel as a Calculator
1.
2.
3.
4.
5.
6.
List the ages in a column
Click on cell after the last age
Click Σ on the toolbar
Write n in next column & 6 under it.
Write Mean in another column
In cell under Mean, type =, click on the
cell containing total, type /, click on the
cell containing 6.
To do an Array
So-o-o Easy
•
•
•
•
Type ages in a column
Copy and paste into the next column
Highlight the values in the copy
A
On the toolbar, click Z
To locate the median value
• Remember the formula for location is
Location = (n+1)/2
• In excel, start all formulas with the equal
sign
• Type in “Location”. In the next cell,
type =(6+1)/2, Enter
To Calculate the Variance
•
•
•
•
Put ages in a column & get the sum
Make a labeled cell for n
Make a labeled cell for n-1
Put label “Mean” & under it put
=, click on the cell with sum, then /, click
on cell with value for n-1, Enter.
Very Important Hint for Excel
• I will be putting models for calculation on
the web site.
• You will see the values, not the formulas
• BUT, you can change to view the formulas
• Hold down control and hit the accent key,
upper left above the tab and under esc.
Copying Formulas
• To copy to the next cells, highlight the cell
and drag the little box in the lower right of
the cell.
• Remember that the cells to be used in the
formula will be adjusted
• To copy the value in the cell not the
location, use a $ before the column letter
or before the row number or before each.
Formatting Excel
• To make a column wide enough for the
label or values, double-click on the right
border
• To highlight cells adjacent to each other,
click on one, hold down shift key and
move to the other cells.
• To underline a cell, not just the letters in it,
use the little box on the toolbar
• To merge several cells and center the
label, use the little box with 2 arrows
Excel Doing All the Work
• You now know the summation sign on the
toolbar. Click on the little arrow next to it.
We will find lots of “functions” there.
• Find the Mean. Click on an empty cell,
probably the one just under the mean you
already calculated. Click on the functions
arrow.
• Click Average & highlight the column of
ages, Enter.
• You should have the same answer.
Variance
• Click cell under the variance you
calculated
• Click Autofunction (the little arrow by the
summation sign)
• Click More Functions & In select a
category, choose statistical
• Go down to VAR
• Next to the Number 1 box, click on the
little graph, highlight the ages data, Enter,
Enter.
• Should have same answer.
Standard Deviation
• May be the most widely used measure of
dispersion
• Related to the Variance
• It’s just the square root of the variance.
Root-Mean-Square
• Descriptive name for the St Dev
• (∑(Xn-X)/n-1)-1/2
• Go back to excel and get the standard
deviation by taking square root and by
letting excel do it directly from data
To Compare s from 2 Samples
• Use Coefficient of Variation
• s/X times 100. Answer is a percentage.
• Why use CV?
CV
• Removes differences due to units of
measurement
• We might want to know whether serum
cholesterol levels, measured in mg/100ml,
are more variable than body weight,
measured in pounds
Another Advantage of CV
• Variance or st.dev. if applied to 2 groups
with very different means may give
misleading idea of variability
• Wts of 11year-old boys compared to
weights of 25 yr-olds
11’s, X = 80, s = 10
25’s, X = 145, s = 10
Are they equally variable?
• CV’s = 12.5% & 6.9%
• Young boys wt more variable
End of Ungrouped Descriptives
• Continue these topics using Grouped Data
Grouped Data
We do this all the time
Change in my pocket
3 dimes, 2 nickels, 4 quarters, 6 pennies
(3X10) + 2(5) + (4X25) + (6X1)
Sum = $1.46
Group Mean or Weighted Mean
Average Age of Dr. Jones’ Patients
12, 14, 15, 14, 8, 7, 8, 9, 14, 12
n = 10
∑ = (2X12) + (3X14) + (2X8) + 7 + 9 + 15
X = (24 + 42 + 16 + 31)/10
= 113/10
= 11.3
Frequency Distribution
• Group values within intervals
• All intervals of equal size
• Intervals not overlapping
Midpoint of Intervals
• Use midpoint as though it were the actual
value for the individuals in the interval
• Calculate midpoint: (LL + UL)/2
The End for now . . .
• Let’s just try a frequency table on excel.
On Course Outline, go to
Excel & Descriptive Stats