Introduction and Data Collection.

Transcript Introduction and Data Collection.

Elementary Statistics
Professor K. Leppel
Introduction
and Data Collection
Definitions
Population: All observations of interest in a
given context
Sample: A subset of a population
Example 1
Suppose you are the president of Widener University.
Population: All Widener students.
Sample: All Widener students taking classes in
the School of Business Administration.
Example 2
Suppose you are the head of the Economics Dept.
Population: All Widener students taking Economics
classes.
Sample: All Widener students taking Professor
Leppel’s classes.
More Definitions
Population parameter or parameter:
numerical characteristics of a population
Sample statistics:
numerical characteristics of a sample
Deductive vs. Inductive Reasoning
Deductive:
population
general
sample
specific
Probability
Example: Suppose you
have a bowl with 2 red
marbles & 3 green ones.
If you pick one, what is
the probability that the
marble is green?
Deductive vs. Inductive Reasoning
Inductive:
sample
population
specific
general
Statistics
Example: If you take a poll &
note the voting preferences of
this sample, we will be able to
draw some conclusions about
the votes of the population.
Sampling with & without Replacement
Sampling without replacement: once an
element of a population has been selected as
part of a sample, it cannot be selected
again.
Sampling with replacement: an element of a
population that has been selected as part of
a sample can be selected again.
Random sampling vs. non-random sampling
Random sampling or probability sampling:
sampling in which the probability of inclusion of
each element in the population is known.
Non-random sampling or judgment sampling:
sampling in which judgment is exercised in
deciding which elements of population to
include in the sample.
Simple Random Sample
A sample of n elements is a simple random
sample if sampling is performed such that
every combination of n elements has an
equal chance of being the sample selected.
Two Types of Studies
1. Observational or comparative studies
The analyst examines historical relationships
among variables of interest.
Problem: Deriving cause & effect
relationships from historical data is difficult
because important environmental factors are
generally not controlled & not stable.
2. Direct experimentation
or controlled studies
The investigator directly
manipulates factors that
affect a variable of interest.
Control Group
To understand the effect of a “treatment,” we
need to compare a group that received a
treatment with a group that received no
treatment. The “no-treatment” group is the
control group.
Two types of errors
1. Systematic errors or bias:
These errors cause measurement to be incorrect
in some systematic way.
They are caused by inaccuracies or deficiencies
in the measuring instrument.
Systematic errors persist even when the sample
size is increased.
2. Random error or sampling error:
These errors arise from a large number of
uncontrolled factors - chance.
Random errors decrease on average as the
sample size is increased.
Some of the variables with which you will work
are qualitative and some are quantitative.
Qualitative variables are categorical and can be
subdivided into nominal and ordinal measures.
Quantitative variables are numerical and can be
subdivided into interval and ratio measures.
Qualitative (categorical) variables that are
nominal have no order to them.
Example 1: U.S. citizenship (yes, no)
Example 2: On what continent were you born? (N. America,
S. America, Africa, Antarctica, Asia, Australia, Europe)
Sex (male, female) is sometimes considered as a nominal variable.
However, if you take into consideration intersex individuals, who
can have any of a variety of anatomical conditions that don’t fit
the typical definitions of female or male, you no longer have a
simple nominal measure.
Qualitative (categorical) variables that are
ordinal have an implied ranking of a
characteristic.
Example 1: student class (freshman, sophomore, junior, senior)
Example 2: customer service satisfaction
(very dissatisfied, somewhat dissatisfied, neither satisfied nor
dissatisfied, somewhat satisfied, very satisfied)
Switching to quantitative (numerical) variables,
interval variables are a bit tricky.
They are measured on an ordered scale in which the
difference between measurements is meaningful.
However, there is no true zero point where there is none of a
specific characteristic.
Also, if the measure is twice as large, that does not imply
that there is twice as much of the characteristic.
Example 1: Intelligence
A person with an IQ of 150 is much more intelligent than a
person with an IQ of 100, while a person with an IQ of 140 is
somewhat more intelligent than a person with an IQ of 125.
However, what would an IQ of 0 mean? And a person with an
IQ of 200 is not twice as smart as one with an IQ of 100.
Example 2: SAT scores
Quantitative variables (numerical) that are ratio
variables have true zero points and ratios work in
the expected way.
Example 1: Income
A person with zero income has no earnings or other source of
money. And someone who has income of $100,000 has twice
as much money coming in as someone who has income of
$50,000.
Example 2: Age

Introduction and Data Collection.

Transcript Introduction and Data Collection.

Directory