Transcript RMTD 404

RMTD 404
Lecture 2
Summation Notation
• We need a way to talk about the processes that occur in a
statistical analysis in a succinct way
• We use summation notation
Σ - stands for “sum”
X - stands for the variable we sum
i - referred to as a subscripting index, stands for the individual
values of X
N - stands for the highest value we sum across (usually the
number of cases). N could be replaced by a number, but we
usually use a letter like N to indicate that we’re summing
across all values of X (i.e., there are N values of the X
variable).
Summation Notation
 Examples
 Is read as the sum of the values of X ranging from 1 (the first
unit/person) to the Nth person (the last unit/person)
 Say X is a vector of {1,2,3,4,5}
 Using the above summation notation we can get 1+2+3+4+5 = 15
Summation Notation
• We can be more specific
• In this case we are only interested in summing the first 4
integer: 1+2+3+4 = 10
•
What do you think about these ones?
Summation Notation
• X = {11,9,8,15,3}
• If i = 2, Xi = 9
• If i = N, Xi = 3 (the Nth case value; N = 5)
• What do we think about this?
Summation Notation
 X = {11,9,8,15,3}
 If i = 2, Xi = 9
 If i = N, Xi = 3 (the Nth case value; N = 5)
 What do we think about these?
 Pay attention to the parentheses – solve those first then
exponentiate
Summation Notation
• Some rules
• Adding a constant
• Multiplying a constant
• Multiplying matched pairs (two vectors)
• Difference between two vectors
Summation Notation
• Don’t let summation notation scare you
• All we’re doing here is summing across a
vector of rows (I) and a vector of columns (J)
Column 1
Column 2
Column 3
Total
Row 1
5 (X11)
7 (X12)
1 (X13)
13
Row 2
3 (X21)
9 (X22)
0 (X23)
12
Row 3
1 (X31)
2 (X32)
18 (X33)
21
Total
9
18
19
46
Measures of Central Tendency
• To get at the “location” of the distributions we
use measures of central tendency
• We look at location shifts
Measures of Central Tendency
• Mean
• Median
• Mode
X = {5,3,2,9,3,4,9,8,2}
Using R…
Distributions: Modality
• Compare the following two graphics
• The left graph shows evidence of a bimodal
distribution (two distinct points)
Mean, median, mode
Distributions: Shape
• When talking about shape, we are talking
about kurtosis – the concentration of the data
center
in the center, shoulders, and tail
leptokurtic
mesokurtic
platykurtic
shoulders
tails
Distribution: Skewness
• The left is negatively skewed while the right is
positively skewed
• When skewness is present, our measures of
central tendency aren’t as obviousmode
median
mean
Measures of Variability
• Range – difference between two most
extreme points
• Interquartile Range – the difference between
the 25th and 75th percentiles
• Variance - the average deviation score from
the mean
• Standard deviation – average absolute
deviation from the mean
Measures of Variability
• Coefficient of Variation - An index that
rescales the standard deviations from two
groups that are measured on the same
scale but have very different means
(useful for comparing group variability).
SPSS & R
• Using the NELS student data we can get the
following output for the base-year math
scores
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
• Using
SPSS
Base-year Math
standardized score
270
Valid N (listwise)
270
summary(bytxmstd)
Min. 1st Qu. Median
30.28
43.17
51.45
• Using R
30.282
71.222
51.71431
Mean 3rd Qu.
51.71
59.66
Max.
71.22
10.083413
NA's
30.00
Transformation
• There are some solutions to skewed
distributions
– Linear transformations
• We can add a constant to each case in the
dataset will shift the mean of the distribution
by that value
• We can similarly multiply or divide values each
case by some constant
Transformation
• Standardization is a very common method
• Z-scores help us turn raw scores into standard
deviations (with a mean of 0 and sd of 1)
• For example, if someone has a GRE score of
620, and the mean is 500, and sd is 100 then…
Transformation
• You can use the following formula to
transform scores into have a mean and
standard deviation of your interest
• X’ is the transformed score, sx’ is the desired
sd, and Xbar’ is the desired mean
Parameters and Statistics (Quick
Notation)
Sample
Population
Mean
X

Variance
s X2
2
SD
sX

Some Important Properties
Sufficiency
Statistic uses all of the information in the sample –
think of the mean, median, and mode…
Unbiasedness
The average of the sum of all possible samples will
yield the exact estimate of the parameter of interest –
the expected value is equal to the parameter
Efficiency
The variability of a large number of samples is smaller
for some statistic than for another (related) statistic
Resistant
Not heavily influenced by outliers
Introduction to R
•
•
•
•
Basic commands
Creating variables
Graphics
Importing data
Introduction to SPSS
• Descriptives
• Transformations
• Graphics