Preliminary Chapter

Download Report

Transcript Preliminary Chapter

Preliminary Chapter





Statistics: The science of learning from
data
Chapters 1-4: Tools and strategies for
organizing, describing, analyzing data
Chapter 5: How to produce data
Chapter 6-9: Probability: The study of
chance behavior
Chapter 10-15: Testing
Claims/Computing estimates
Ask the following questions
1. What individuals do the data describe? How
many individuals are there?
2. How many variables? Defs of these variables?
What units? (lbs? kilos?)
3. Reasons the data was gathered? (For a sample or
population)?
Every set of data comes with background
information to help us understand the data!
Data Production
Where can we find good data?
Library
Internet
www.nces.ed.gov (Nat’nl Center for
Education Statistics website)
www.fedstats.gov (good source for projects)
 Statistical offices of foreign countries
(www.statcan.ca, www.inegi.gob.mx )
Is this good data?

Suppose you want to find out if your
classmates prefer cheeseburgers from
McDonald's or Burger King. You decide
to ask 50 people under the age of 20
which fast-food they prefer. In order to
save time and energy, you conduct your
survey at the McDonald’s closest to
campus. Is there a problem with this?
Does drinking at least five
carbonated sodas a week
improve a student’s GPA?


Observation: Compare the GPA’s of a sample
of students who drink more than five sodas a
week with those who drink less.
Experiment: From a random group of
students, require some to drink more than
five sodas per week, and require the rest ot
drink less. After a couple of years, compare
their GPA’s.
Good and bad survey results
In 1976, Shere Hite published The Hite Report
on Female Sexuality, Seven Stories Press, Ny,
Ny 2004. The conclusions reported in her
book were based on 3,000 returned surveys
from 100,000 surveys distributed by women’s
groups. The results were that women were
highly critical of men. In what way might the
author’s findings have been biased?
W-6, H








W-6H: Who What Why How Where When by
Whom?
Who – is being studied
What – are the variables
Why – was the data gathered
How – was the data produced
Where – was the data gathered
When – was the data produced
By Whom – who directed it, can we trust it?
Describes public education in the
USA
State
Region Pop
SAT
verbal
SAT
math
%
taking
% No
HS
Teacher
pay
($1000)
CA
PAC
35894
499
519
54
18.9
54.3
CO
MTN
4601
551
553
27
11.3
40.7
CT
NE
3504
512
514
84
12.5
53.6
More Defs!


Exploratory Data Analysis: 
Examining data in order to
describe their main features.
(What do u see?)
2 steps
1) Examine the variables

2) Graph them.
Distribution of a
Variable: what
values the variable
takes on and how
often it takes these
values.
The pattern of a
variable is its
distribution.
Do you wear your seat belt?
Region
NE
% Wearing
belts 2003
74
% Wearing
belts 1998
66.4
MW
75
63.6
South
80
78.9
West
84
80.8
Compare
Dotplots
Number of goals scored by the US
women’s soccer team in 34 games
played in the 2004 season are:
3027824351145311333212
224356155115
What does this tell us about the
performance of the US women’s team in
2004?
TI-83: 1 Var Stats L1
Exploring Relationships between
variables
Air travelers would like their flights to arrive on time.
Airlines collect data about on-time arrivals and report
them to the department of Transportation. Here’s one
month’s data for flights from several western cities
for two airlines:
On time
Delayed
3274
501
American West 6438
787
Alaska Air
Simpson’s Paradox
An association or comparison that holds
for all of several groups can reverse
direction when the data are combined
to form a single group.
Probability
Statistical Inference



Population values (parameters) are fixed
Sample values (statistics) vary from sample to
sample.
A sample value will not give us precise
information about a population parameter
(but if properly collected, it will provide us
with reasonable bounds on a parameter).
How unlikely must an event be before we
conclude that it isn’t due to chance?
25%? 10% 1%? 0.01?
Our willingness to declare an event
“unlikely” is usually based on….
Communication in Statistics: IMPORTANT
Case Closed!
P. 26 (groups)