Statistics 02

Download Report

Transcript Statistics 02

Statistics 02
Normal distribution
Also called normal curve or the Gaussian
curve
Any variable whose value comes about as the
result of summing the values of several
independent, or almost independent,
components can be modeled successfully as
a normal distribution.
Normal distribution
Normal distribution
• Three features of the normal distribution
• 1. symmetrical histogram
• 2. the mean of the sample is very close to that of
the original population.
• 3. the standard deviation of the set of sample
means will be very close to the original population
standard deviation divided by the square root of
the sample size, n.
Z score
• Converted raw score on the basis of
standard deviation. We convert a raw score
to z score o determine how many standard
deviation units that raw score is above or
below the mean.
• Z=(X-M)/s
Application of Z score
• Comparison of two scores from two tests
• Conversion to standardized score (T score):
T=50+10Z
• Determining the proportion below a
particular raw score: X < Score
• Statistic inference: Range estimation
Case
• Student A takes 2 tests with the following
data:
• Test 1: Raw score=67. Mean=63, Standard
deviation=3
• Test 2: R=56, M=51, s=4
• Question: What possible information can
we obtain?
Case
• Two students take two different tests of
English.
• Student A: RS=67, M=63, s=3
• Student B: RS=56, M=51, s=4
• Question 1: Which student is better in
English?
• Question 2: Their T scores?
Table of Normal Distribution
• Relation between Z score and Proportion
Case
• When we select a score randomly from the
population, how much probability is this
score below or above a certain score?
• That is: the probability of this score (X) < a
certain score (say: 60)
• X<60
Case
•
•
•
•
•
•
•
•
Z<?
Z=(X-M)/s
Therefore, inequality
X-M < 60-M
(X-M)/s < (60-M)/s
Z<-1
P=0.1587
The chance that we randomly select a score that is
below 60 is 16%.
Case
• Xiamen University wants to give the freshmen a
placement test upon the admission and put them
into 5 levels of English learning. Work out a plan
for this test and inform the students before the test
the scores required for each level.
•
•
•
•
•
•
•
•
Total of freshmen: 5000
Classes for each level:
B0: 4
B1: remaining
B2: 20
B3: 8
B4: 4
Normal class size: 35
Level
Classes
Sub
4
Number %
109
2
20
3
6
4
4
Cut-off
0.028
140
1
Z
-1.90 44.6
0.762
0.80
60.8
0.14
1.5
65
210
0.042
1.90
67.4
140
0.028
3810
700
Statistic inference
• Use a collection of observed values to make
inferences about a larger set of potential values.
• Classical problem of statistic inference: how to
infer from the properties of a part the likely
properties of the whole.
• Because of the way in which samples are selected,
it is often impossible to generalize beyond the
samples.
Population
• The largest class to which we can generalize the
results of an investigation based on a subclass, in
other words, the set of all possible values of a
variable.
• A population, for statistical purpose, is a set of
values.
• We need to be sure that the values that constitute
the sample somehow reflect the target statistical
population.
Sampling
• Random sampling gives us reasonable confidence
that our inference from sample values to
population values are valid.
• The most common type of sampling frame is a list
(actual or notional) of all the subjects in the group
to which generalization is intended.
• What the techniques of statistics offer is a
common ground, a common measuring stick by
which experimenters can measure and compare
the strength of evidence for one hypothesis or
another that can be obtained from a sample of
subjects.
Sampling
• Careful considerations are needed to ensure the sample
represents the population.
eg. The gravity of errors in written English as
perceived by two different groups: native Englishspeaking teachers of English and Greek teachers of
English. Both samples contained individuals from
different institutions to avoid institution attitude bias.
• Researchers have an inescapable duty of describing
carefully how their experimental material -- including
subjects -- was actually obtained. It is also a good
practice to attempt to foresee some of the objections
that might be made about the quality of the material
and either attempt to forestall criticism or admit openly
to any serious defects.
Case Study
• Study the population and sample for the
following investigations:
• Vocabulary size
• Listening input and listening
comprehension
• Social backgrounds and learning strategy
Random Sampling
• Use the Table of Random Numbers
• Other methods
Statistic Parameters
• Population parameters
• Mean: μ(mu, [mju], English
correspondent: m)
• Standard deviation: σ(sigma [sigm],
English correspondent: s)
• Sample parameters
• Mean: M
• Standard deviation: s
Other Greek Alphabets
• Σ sigma, symbol of sum, English
correspondent: S
• ε: epsilon, symbol of error, English
correspondent: e
• α: alpha
• χ: chi [kai], English correspondent: x
Parameter Estimation (参数估计)
• Point estimator (点估计): a single number
calculated from a sample and used to
estimate a population parameter.
• Interval estimator (区间估计): a likely
range within which the population value
may lie.
Standard error of the sample
means
• If we draw repeatedly a sample from the
population and calculate the means of these
samples, these means will fall into a normal
distribution. The variability of these means from
the population mean is called standard error of the
sample means, and is calculated as follows:
• Standard error σx = σ/√n
• When the population standard deviation σ is
unknown, we often use the sample standard
deviation s: σx = s/√n
Case
• If the following data are obtained from a
test
• N=132
• M=67
• S=6.5
• What is the standard error of the sample
means?
Case
•
•
•
•
σx = s/√n
=6.5/√132
=6.5/11.49
=0.566
Confidence(置信度)
• The probability at which we are confident
the value will fall into, usually 95% or 99%.
• Procedure: calculate the Z score
• Look up in the Normal Distribution Table
the Z score that corresponds to the
probability of Z=α/2.
• Compare Z and Z=α/2
Case
• N=132, M=67, S=6.5
• μα=0.05?
Case
•
•
•
•
•
•
•
•
Z=(X-M)/s
=(X-μ)/ σx
=(67-μ)/0.566
-Z=α/2 ≤ Z ≤ Z=α/2
-1.96 ≤ (67-μ)/0.566 ≤ 1.96
-1.10936 ≤67-μ≤ 1.10936
-68.1036 ≤-μ≤ -65.89064
65.89064 ≤μ≤ 68.1036
t Distribution
• When the sample size becomes less than 30,
the sample fall into T distribution.
• T distribution is a family of curves
• Degree of freedom (自由度):the number
of conditions that are free to vary. In t
distribution, df=n-1
Case
•
•
•
•
Sample mean=63.16
Sample standard deviation=7.25
N=19
μα=0.05?
Case
•
•
•
•
•
•
•
•
•
Standard error=s/√19=7.25/ 4.36 = 1.66
Z=(X-M)/s
=(X-μ)/ σx
=(63.16-μ)/1.66
t0.05/2(18)=2.101
-2.101<=(63.16-μ)/1.66<=2.101
-3.48766<=63.16-μ<=3.48766
-66.64766<=-μ<=59.67234
59.7<=μ<=66.6