Kokoska_LecturePPT_ch01_xg

Download Report

Transcript Kokoska_LecturePPT_ch01_xg

Introductory Statistics:
A Problem-Solving Approach
by Stephen Kokoska
Chapter 1
An Introduction to Statistics and
Statistical Inference
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
1
Statistics
1.
2.
3.




Science of collecting and interpreting data.
(newer names and directions: data science, analytics, data mining,
machine learning)
Examples of statistics in everyday life:
Online shopping, netflix, recommending system
National election polling
Marketing research; actuary;
Clinical trial for medicine
Make decisions, assess risk, draw a conclusion.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
2
Descriptive Statistics
Graphical and numerical methods used
to describe, organize, and summarize
data.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
3
Inferential Statistics
Def:Techniques and methods used to analyze a small, specific set of
data in order to draw a conclusion about a large, more general
collection of data.
Question:
T/F: Descriptive statistics are used to indicate how the data are collected.
T/F: Inferential statistics are used to draw a conclusion about a
population.
Fill in the blank:

The entire collection of objects being studied is called the ----.

A small subset from the set of all 2013 minivans is called ----.

Consider the amount of sugar in breakfast cereals. This characteristics
of breakfast cereal is called -----.

Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
4
Population, Sample, Variable




A population is the entire collection of individuals or objects to be considered or
studied.
A sample is a subset of the entire population, a small selection of individuals or
objects taken from the entire collection.
A variable is a characteristic of an individual or object in a population of interest.
Example: Marketing and consumer behavior: Magazines, newspapers, and books
have become more readily available in digital format. In addition, the quality of
readers, for example, the Kindle, Nook, and iPad, has increased. A recent study
suggests that 21% of adults read an ebook within the past year in US. Suppose
a subset of 500 adults in US is obtained. Describe the population and the
sample in the problem. Write a probability and a statistics question involving the
population and the sample.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
5
Probability vs. Statistics


In order to solve a probability problem,
certain characteristics of a population are
assumed known. We then answer questions
concerning a sample from that population.
In a statistics problem, we assume very little
about a population. We use the information
about a sample to answer questions
concerning the population.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
6
Probability vs. Statistics
Illustration
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
7
Observational Study
In an observational study, we observe
the response for a specific variable for
each individual or object.
Example: poll; ebook example above;
case-control retrospective studies on
smoking habit and health risk
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
8
More on sampling






If a sample is not random, it is biased.
(In Toronto Mayor election, if we collect samples only from
Etobicoke, what will happen?)
election
Non-response bias
Self-selection bias
If the population is infinite, the number of simple random
sample is also infinite
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
9
Experimental Study
In an experimental study, we
investigate the effects of certain
conditions on individuals or objects in
the sample.
Examples: clinical trial; agricultural
experiments
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
10
Simple Random Sample (SRS)
A (simple) random sample (SRS) of size n is a sample selected
in such a way that every possible sample of size n has the
same chance of being selected.
Question: If we have a population of 5 individuals:A,B,C,D,E.
If we draw SRS of size 2, what is the probability that a subset of
A and B are chosen? What is the probability that individual A
is chosen in the sample?
Question: How to implement SRS? Method 1: random number
table
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
11
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
12
How to select a SRS of size 5?
•
•
•
Suppose the population subjects are labelled from 00,01,…99.
Start any location and follow any direction on the table. Assume
we start from the first row.
We record two digits at a time, discard the repeated ones until we
get 5 unique pairs of two-digit labels: 11, 74, 26, 93, 81.
We got the sample of {11,74,26,93,81}
Method 2: Use computer: www.r-fiddle.org
x<-sample(0:99,5)
Try it a few times, do you get different SRS?
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
13
Statistical Inference Procedure
The process of checking a claim can be divided into
four parts

Claim

Experiment

Likelihood

Conclusion
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
14
Statistical Inference Procedure

Claim
This is a statement of what we assume
to be true.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
15
Statistical Inference Procedure

Experiment
In order to check the claim, we conduct
a relevant experiment.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
16
Statistical Inference Procedure

Likelihood
Consider the likelihood of occurrence of the
observed experimental outcome, assuming
the claim is true. We will use many
techniques to determine whether the
experimental outcome is a reasonable
observation (subject to reasonable variability)
or whether it is a rare occurrence.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
17
Statistical Inference Procedure

Conclusion
There are only two possible conclusions:
1. If the outcome is reasonable, then we cannot doubt the
claim. We usually write, “There is no evidence to suggest
that the claim is false.”
2. If the outcome is rare, we disregard the lucky alternative,
and question the claim. A rare outcome is a contradiction.
It shouldn't happen (often) if the claim is true. In this case
we write, “There is evidence to suggest that the claim is
false.”
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
18
Example

The wireless emporium ships a box
containing 1000 cell phone chargers
and claims 999 are in perfect condition
and only 1 is defective. Upon receipt of
the shipment, a quality control
inspector reaches into the box and
mixed the chargers around a bit, select
one at random and it is defective!
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
19
Example




Claim: There were 999 good cell phone chargers and 1
defective charger in the box.
Experiment: Random select one charger and it is defective.
Likelihood: If the claim is true, the probability of getting the
defective one is 0.001.
Conclusion: The experiment outcome is extremely rare and
unlikely. We disregard the lucky alternative and question the
claim. There is evidence to suggest that the claim is false.
Copyright 2011 by W. H. Freeman
and Company. All rights reserved.
20