Transcript Slide 1
Lecture 3
Chapter 2. Studying Normal
Populations
It
is often the case that collected data have
a distribution with the characteristic shape
of the Normal distribution.
Let’s
have a look at an example…
Example – Female Haematocrit
Haematocrit
measures the percentage of
blood volume occupied by packed red
blood cells.
Measurements
taken from 126 female
medical students are as follows…
Female haematocrit measurements
42.0
46.0
49.0
44.0
55.0
44.0
48.0
44.0
41.0
40.0
44.0
36.0
42.0
45.0
42.0
38.0
45.0
41.0
42.0
40.0
41.0
43.0
40.0
46.0
40.0
38.0
35.0
40.5
44.0
46.0
39.0
44.0
41.0
34.5
38.5
44.5
45.0
40.0
46.0
42.0
44.0
49.0
45.0
41.0
45.0
42.0
40.0
48.0
42.0
46.0
44.0
38.0
43.0
44.0
41.0
44.0
36.0
42.0
40.0
45.5
43.0
45.0
37.0
35.0
42.0
44.0
42.0
43.0
39.0
39.0
44.0
36.0
43.0
43.0
42.0
42.0
32.5
43.0
49.0
36.0
41.5
39.0
43.0
42.0
42.0
42.0
43.0
42.0
46.0
40.5
38.0
46.0
40.0
40.0
41.0
41.0
40.0
42.0
44.0
40.0
39.0
45.0
42.0
39.0
42.0
44.0
36.0
45.0
45.0
49.0
43.0
48.0
46.0
44.0
43.0
35.0
41.0
41.0
40.0
43.0
41.5
42.0
40.0
41.0
46.0
42.0
Let’s look at the shape of the distribution of this data
using a histogram…
Example – Female Haematocrit
Histogram of Haematocrit (F)
35
30
Frequency
25
20
15
10
5
0
32
36
40
44
Haematocrit (F)
48
52
56
These data show the characteristic shape of the
Normal Distribution.
It is characterised by the symmetrical “bell
shape”, which corresponds to values near the
mean being more common, while values further
away “tail off” in terms of their frequencies.
A perfect Normal distribution curve looks like….
Normal Distribution Frequency Curve
0.4
Frequency Curve
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
1
2
Standard Deviations from Mean
3
4
In order to understand what it really means for
data to be Normally distributed, we first need to
consider the idea of probability…
Probability
Probability is used to measure the likelihood of
an event occurring.
Definition
Suppose we were to repeat a particular experiment over
and over again.
Then the probability of a particular outcome A is
defined as the proportion of the total number of repeats
in which A would actually occur, if we were to keep on
repeating the experiment.
We denote this probability by Pr(A).
Probability Examples
1. Rolling a fair die
We roll a standard six-sided die. Let event A be that
the die lands with three spots face up.
Then the probability of the event A is:
Pr(A) = 1/6 ≈ 0.167
because in the long run, the proportion of times that A
happens will be 1/6.
Note that in this experiment there are six equally likely
outcomes, all with probability 1/6.
Probability Examples
2. Tossing a fair coin
You toss a fair coin once. Let event A be that the coin lands
heads up.
Then the probability of the event A is:
Pr(A) = ½ = 0.5
because in the long run, the proportion of times that A
happens will be 1/2. This time there are two possible
outcomes with equal probability.
Note that the Probability scale runs between 0 and 1
inclusive. The higher the number, the more likely the
event.
Probability Examples
3. Buying a ticket for the UK National Lotto
You buy a single ticket for one draw of the UK National
Lotto. The event A is that your six numbers exactly
match the six main numbers drawn from 1, … , 49, so
that you win a share of the jackpot.
Then the probability of the event A is:
Pr(A) = 1 / 13,983,816 ≈ 0.0000000715
because there are 13,983,816 equally likely outcomes
for the six main numbers.
Probability measurements only really make
sense for discrete outcomes, i.e. when we can
make a list of all the possible outcomes.
When the measurements are on a continuous
scale, such as the haematocrit measures, then
there are infinitely many possible outcomes, and
it is not possible to list them.
The distribution of haematocrit outcomes has
roughly the Normal distribution shape:
Fitted Normal Distribution for Female Haematocrit
0.12
Probability Density
0.10
0.08
0.06
0.04
0.02
0.00
30
35
40
45
Haematocrit (%)
50
55