Transcript Lecture 14
Physics 114: Lecture 14
Mean of Means
Dale E. Gary
NJIT Physics Department
The Goal of Measurement
When we make measurements of a quantity, we are mainly after two
things: (1) the value of the quantity (the mean), and (2) a sense of how
well we know this value (for which we need the spread of the distribution,
or standard deviation).
Remember that the goal of measurement is to obtain these two bits of
information. The mean is of no use without the standard deviation as well.
We have seen that repeated measurement of a quantity can be used to
improve the estimate of the mean. Let’s take a closer look at what is going
on.
Say we create a random set of 100 measurements of a quantity whose
parent distribution has a mean of 5 and standard deviation of 1:
x = randn(1,100)+5;
Create a histogram of that set of measurements: [y z] = hist(x,0:0.5:10);
Here, y is the histogram (frequency of points in each bin), and z is the bin
centers. Now plot it: plot(z,y,’.’). If you prefer bars, use stairs(z,y).
Mar 22, 2010
Comparing Measurements
If you make repeated sets of 100 measurements, you will obtain different
samples from the parent distribution, whose averages are approximations
to the mean of the parent distribution.
Let’s make 100 sets of 100 measurements:
y = zeros(100,21);
for i = 1:100; x = randn(1,100)+5; [y(i,:) z] = hist(x,0:0.5:10); end
Plotting some of the histograms,
for i = 1:16; subplot(4,4,i); stairs(z-0.25,y(i,:)); axis([2,8,0,25]); end
Mar 22, 2010
Comparing Measurements
you can see that the samples means vary from one set of 100
measurements to another.
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
2
4
6
8
2
4
6
8
2
4
6
8
Mar 22, 2010
Comparing Measurements
Now the mean can be determined from the original values (xi):
mean(x)
or from the histograms themselves:
mean(y(100,:).*z)/mean(y(100,:))
Make sure you understand why these two should be the same (nearly),
and why they might be slightly different.
Since we have saved the histograms, let’s print out the means of these 16
sample distributions:
Here is one realization of those means.
for i=1:16; a = mean(y(i,:).*z)/mean(y(i,:)); fprintf('%f\n',a); end
5.015 4.915 5.000 4.940
4.995 4.980 4.975 4.960
4.990 4.920 4.965 5.040
5.010 4.870 5.135 4.890
We might surmise that the mean of these means might be a better
estimate of the mean of the parent distribution, and we would be right!
Mar 22, 2010
Distribution of Means
Let’s now calculate the 100 means
And plot them
a = zeros(1,100); for i=1:100; a(1,i) = mean(y(i,:).*z)/mean(y(i,:)); end
subplot(1,1,1)
plot(a,'.')
This is the distribution of means.
Distribution of Means
5.4
5.3
The mean of this distribution
is 4.998, clearly very close to
the mean of the parent
distribution (5)
5.2
Mean Value
5.1
5
4.9
4.8
4.7
0
10
20
30
40
50
Mean #
60
70
Mar 22, 2010
80
90
100
Mean of Means
We can think about this distribution in two different, but equivalent ways.
If we simply sum all of the histograms, we obtain a much better estimate of
the parent population:
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
mom = sum(y);
stairs(z-0.25,mom)
2
4
6
8
2000
20
20
20
20
10
10
10
10
0
0
0
0
1800
1600
1400
1200
2
4
6
8
2
4
6
8
2
4
6
8
2
4
6
8
1000
800
20
20
20
20
10
10
10
10
0
0
0
0
600
400
200
0
2
4
6
8
2
4
6
8
2
4
6
8
20
20
20
20
10
10
10
10
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
2
4
6
0
1
2
3
4
5
6
7
8
9
10
8
mean(mom.*z)/mean(mom)
gives: 4.998
2
4
6
8
Mar 22, 2010
Mean of Means
Alternatively, we can think of these different estimates of the mean of the
original population as being drawn from a NEW parent population, one
representing the distribution of means.
This NEW parent population has a different (smaller) standard deviation
than the original parent population.
Distribution of Means
5.4
Historgram of Means
5.3
45
Number of Occurrences of Mean
40
Mean Value
5.2
5.1
5
4.9
35
30
25
20
15
10
5
0
4
4.8
4.5
5
5.5
Mean Value
4.7
0
10
20
30
40
50
Mean #
60
70
80
90
100
std(a) is 0.0976
Mar 22, 2010
6
Calculation of Error in the Mean
Recall in Lectures 11 and 12 we introduced the general formula for
propagation of errors of a combination of two measurements u and v as:
x
x
x x
s x2 s u2 s v2 2s uv2
u
v
u v
Generalizing further for our case of N measurements of the mean m’, and
ignoring correlations between measurements (i.e. setting the cross-terms to
zero), we have
m 2
s m2 s i2
.
x
i
We can make the assumption that all of the si are equal (this is just saying
that the samples are all of equal size and drawn from the same parent
m 1
population). Also
1
x
i .
xi xi N
N
2 1 2 s 2
so 2
s m s .
N N
2
2
Mar 22, 2010
Calculation of Error in the Mean
This is no surprise, and it says what we already knew, that the error in the
mean gets smaller according to the square-root of the number of means
averaged.
s
sm
.
N
Again, this is the case when all of the errors in the means used are equal.
What would we do if, say, some of the means were determined by
averaging different numbers of observations instead of 100 each time?
In this case, we can do what is called weighting the data. If we know
the different values of si, then the weighted average of the data points is
xi s i2
m
.
2
1
s
i
Mar 22, 2010
Error in the Weighted Mean
In this case, if we want to combine a number of such weighted means to
get the error in the weighted mean, we still have to calculate the
propagation of errors:
m 2
s m2 s i2
.
xi
but now the si are not all the same, and also we use the weighted mean to
get the gradient of the mean
2
1 s i2
m xi s i
.
2
xi xi 1 s i 1 s i2
Inserting that into the above equation, we have
1
s m2
.
2
1 s i
Mar 22, 2010
Relative Uncertainties
In some cases, we do not necessarily know the uncertainties of each
measurement, but we do know the relative values of si. That is, we may
know that some of our estimates of the mean used 100 measurements, and
some used only 25. In that case, we can guess that the latter
measurements have errors twice as large (since the standard deviations are
proportional to the square-root of the number of measurements).
2
So, say kwi 1 s i . Then
xi s i2 kwi xi wi xi
m
.
2
1 s i kwi wi
In other words, because of the nature of the ratio, the proportionality
constant cancels and we need only the relative weights.
To get the overall variance in this case, we must appeal to an average
variance of the data: 2 wi ( xi m )2
N
s
.
N 1
wi
s
.
So the standard deviation is found using this, as s m
N
Mar 22, 2010