Transcript EXAM

1. Homework #2
2. Inferential Statistics
3. Review for Exam
HOMEWORK #2: Part A

Sanitation Eng. Z=.53 = .2019 + .50 = .7019
F.C.
Z=.67 = .2486 + .50 = .7486

5 GPA’s, which are in the top 10%?


GPA of 3.0 and 3.20 are not:
Z = (3.0-2.78)/.33 =.67
 Area beyond = .2514 (25.14%)
 Z=(3.20-2.78)/.33=1.27
 Corresponds to .8980 (.3980+.5000)
 Area beyond = .1020 (10.2%)
By contrast, for 3.21…
 Z=(3.21-2.78)/.33=1.30
 Corresponds to .9032 (.4032+.5000)


HOMEWORK #2: Part B

Question 1

a. Mean=18.87; median=15; mode=4

b. The mean is higher because the distribution is positively
skewed (several large cities with high percents)

c. When you remove NYC, the mean=16.43 & the median
goes from 15 to 14.5. Removing NYC’s high value from the
distribution reduces the skew.
 The mean decreases more than the median because value
of the mean is influenced by outlying values; the median is
not—it only moves one case over.
HOMEWORK #2: Part B

Question 2
 For this problem, there are two measures of central tendency
(indicating the “typical” score).


The mean per student expenditure was almost $2,000 higher in
2003 ($9,009) than in 1993 ($7,050).
The median also increased, but not nearly as much (from $7,215
to $7,516).

The spread of the scores, as indicated by the standard deviation,
was more than double 2003 (1,960) than it was in 1993 (804).

Shape

For 1993, the distribution of scores has a slight negative skew;
this distribution is essentially normal (bell-shaped) as the mean
($7,050) and median ($7,215) are similar. By contrast, for 2003,
the mean is much greater than the median; this distribution has a
strong positive skew.
HOMEWORK #2: Part B

Q3






a. 53.28%
 Opposite sides of mean, add 2 areas together
b. 6.38%
 Both scores on right side of mean, subtract areas
c. 10.56%
 “Column C” area for Z=1.25 is .1056
d. 69.15%
 “Column B” area for Z= -0.5 is .1915 + .5000 (for other half
of normal curve)
e. 99.38%
 Z=2.5; Column B (for area between 2.5 & 0) = .4938 +
.5000 (for other half of normal curve)
f. 6.68%
 Z = -1.5; Column C for area beyond -1.5 =.0668
HOMEWORK #2: Part B

Q4






a. .9953
 Column B area (.4953) + .5000 (for other half of normal
curve)
b. .5000
 50% of area on either side of mean (47)
c. .6826
 “Column B” for both – .3413 + .3413
d. .9997
 Column B area (.4997) + .5000 (for other half of normal
curve)
e. .0548
 “Column C” area for Z=1.6
f. .3811
 Scores on opposite sides of mean  add “Col. B” areas
HOMEWORK #2: Part C

SPSS:
 All the info needed to
answer these questions
is contained in this
output 
Statistics
HOURS PER DAY WATCHING TV
N
Valid
1426
Missing 618
Mean
3.03
Median 2.00
Mode
2
Std. Deviation
2.766
Percentiles
10
1.00
20
1.00
25
1.00
30
2.00
40
2.00
50
2.00
60
3.00
70
3.00
75
4.00
80
4.00
90
6.00
Distribution (Histogram) for TV
Hours
Sibs Distribution
College Science Credits
Sampling Terminology


Element: the unit of which a population is
comprised and which is selected in the sample
Population: the theoretically specified
aggregation of the elements in the study (e.g., all
elements)

Parameter: Description of a variable in the population


σ = standard deviation, µ = mean
Sample: The aggregate of all elements taken
from the pop.

Statistic: Description of a variable in the sample (estimate of
parameter)

X = mean, s = standard deviation
Non-probability Sampling

Elements have unknown odds of selection

Examples


Snowballing, available subjects…
Limits/problems


Cannot generalize to population of interest (doesn’t
adequately represent the population (bias)
Have no idea how biased your sample is, or how close
you are to the population of interest
Probability Sampling

Definition:


Elements in the population have a known (usually
equal) probability of selection
Benefits of Probability Sampling

Avoid bias



Both conscious and unconscious
More representative of population
Use probability theory to:


Estimate sampling error
Calculate confidence intervals
Sampling Distributions

Link between sample and population

DEFINITION 1


IF a large (infinite) number of independent,
random samples are drawn from a population,
and a statistic is plotted from each sample….
DEFINITION 2

The theoretical, probabilistic distribution of a
statistic for all possible samples of a certain
outcome
The Central Limit Theorem I

IF REPEATED random samples are drawn from
the population, the sampling distribution will
always be normally distributed


As long as N is sufficiently (>100) large
The mean of the sampling distribution will equal
the mean of the population

WHY? Because the most common sample mean will
be the population mean


Other common sample means will cluster around the
population mean (near misses) and so forth
Some “weird” sample findings, though rare
The Central Limit Theorem II

Again, WITH REPEATED RANDOM SAMPLES,
The Standard Deviation of the Sampling
distribution = σ
√N

This Critter (the population standard deviation
divided by the square root of N) is “The Standard
Error”

How far the “typical” sample statistic falls from the
true population parameter
The KICKER

Because the sampling distribution is normally
distributed….Probability theory dictates the
percentage of sample statistics that will fall
within one standard error



1 standard error = 34%, or +/- 1 standard error = 68%
1.96 standard errors = 95%
2.58 standard errors = 99%
The REAL KICKER

From what happens (probability theory) with an
infinite # of samples…

To making a judgment about the accuracy of statistics
generated from a single sample

Any statistic generated from a single random sample has a
68% chance of falling within one standard error of the
population parameter

OR roughly a 95% CHANCE OF FALLING WITHIN 2 STANDARD
ERRORS
EXAM

Closed book

BRING CALCULATOR

You will have full class to complete

Format:
 Output interpretation
 Z-score calculation problems
 Memorize Z formula
 Z-score area table provided
 Short Answer/Scenarios
 Multiple choice
Review for Exam

Variables vs. values/attributes/scores

variable – trait that can change values from case to case



example: GPA
score (attribute)– an individual case’s value for a given
variable
Concepts  Operationalize  Variables
Review for Exam

Short-answer questions, examples:
 What is a strength of the standard deviation over other
measures of dispersion?

Multiple choice question examples:
 Professor Pinhead has an ordinal measure of a variable called
“religiousness.” He wants to describe how the typical survey
respondent scored on this variable. He should report the ____.





a. median
b. mean
c. mode
e. standard deviation
On all normal curves the area between the mean and +/- 2
standard deviations will be




a. about 50% of the total area
b. about 68% of the total area
c. about 95% of the total area
d. more than 99% of the total area
EXAM

Covers chapters 1- (part of)6:

Chapter 1

Levels of measurement (nominal, ordinal, I-R)



Any I-R variable could be transformed into an ordinal or
nominal-level variable
Don’t worry about discrete-continuous distinction
Chapter 2

Percentages, proportions, rates & ratios

Review HW’s to make sure you’re comfortable interpreting
tables
EXAM

Chapter 3: Central tendency


ID-ing the “typical” case in a distribution
Mean, median, mode




Appropriate for which levels of measurement?
Identifying skew/direction of skew
Skew vs. outliers
Chapter 4: Spread of a distribution



R&Q
s2 – variance (mean of squared deviations)
s



Uses every score in the distribution
Gives the typical deviation of the scores
DON’T need to know IQV (section 4.2)
Keep in mind…

All measures of central tendency try to
describe the “typical case”

Preference is given to statistics that use the most
information



For interval-ratio variables, unless you have a highly
skewed distribution, mean is the most appropriate
For ordinal, the median is preferred
If mean is not appropriate, neither is “s”

S = how far cases typically fall from mean
EXAM

Chapter 5

Characteristics of the normal curve


KNOW Z score formula



Know areas under the curve (Figure 5.3)
Be able to apply Z scores
 Finding areas under curve
Z scores & probability
Frequency tables & probability
EXAM

Chapter 6






Reasons for sampling
Advantages of probability sampling
What does it mean for a sample to be representative?
Definition of probability (random) sampling
Sampling error
Plus…

Types of nonprobability sampling
Interpret
1. Number of
cases used to
calculate mean?
2. Most common
IQ score?
3. Distribution
skewed?
Direction?
4. Q?
5. Range?
6. Is standard
deviation
appropriate to
use here?
Total IQ Score
N
Mean
Median
Mode
Std. Deviation
Minimum
Maximum
Percentiles
Valid
Missing
10
20
25
30
40
50
60
70
75
80
90
1826
9092
88.98
91.00
94
20.063
0
160
63.00
74.00
78.00
80.00
86.00
91.00
95.00
100.00
103.00
105.00
112.00
Scenario

Professor Scully believes income is a good
predictor of the size of a persons’ house




IV?
DV?
Operationalize DV so that it is measured at all
three levels (nominal, ordinal, IR)
Repeat for IV
Express the answer in the
proper format




Percent
Proportion
Ratio
Probability