Random Errors
Download
Report
Transcript Random Errors
Survey – extra credits (1.5pt)!
• Study investigating general patterns of college students’
understanding of astronomical topics
• There will be 3~4 surveys this semester.
• Anonymous survey (the accuracy of your responses will
not affect your course grade). But, be accurate, please!
• Your participation is entirely voluntary.
• SPARK: Assessments > Survey2
• The second survey is due: 11:59pm, March 27th (Sun.)
• Questions? - Hyunju Lee ([email protected]) or
Stephen Schneider ([email protected])
Funded by Hubble Space Telescope Education & Public Outreach
grant
Uncertainty
• Ultimately, all measurements of physical quantities are subject to
uncertainties.
• Variability in the results of repeated measurements arises because
variables that can affect the measurements result are impossible to
hold constant.
• Even if the "circumstances," could be precisely controlled, measures
would still have an error associated with them, because measure
apparatuses can only be manufactured with finite level of quality (the
infinitely accurate instrument is only a theoretical abstraction)
• Steps can be taken to limit the amount of uncertainty, but it will
always be there, no matter how refined (and expensive) technology
can be.
• So, the real goal of an experiment is: reduce the uncertainty in the
measures to the degree that is needed to prove or disprove a theory
How to keep uncertainty at bay
• Even if errors cannot be eliminated, they can be
understood, controlled and minimized. For example:
• Systematic errors: need to be thoroughly investigated to
understand in which way they affect the experiment, and
minimized (often this is the most difficult thing to do)
• Random Errors: if they “randomly” add to or subtract
from the true value, then a very effective way to
minimize them is to take repeated measures (under the
same conditions) and then take the average.
• When taking the average of large number of measures, the
measures in deficiency compensate for those in excess, and the
net result is a much better estimate of the true value
The Average
• Suppose you have N repeated measurements of the
same physical quantity:
• x1, x2, x3, …, xN
• Their average is their sum divided by their number,
namely
• The average is a much better estimator of the true
value than any individual measurement
• The more numerous the measurements, the more
accurately their average estimates the true value
Why does the average work. I?
• Of course, this is no magic!
• Think of each measurement as the sum of the True
Value (Tv) plus the Random Error (ε):
• In each measurementm the Random Error
unpredictably adds (positive ε) or subtracts (negative
ε) from the True Value Tv
• The magnitude of ε, too, varies at random from
measure to measure. Some time ε is big, other small
Why does the average work.II?
• Examples of ε’s:
• -0.1, +0.15, -1.5, +1.3, -0.01, +0.7, +0.9, -0.005, +1.0…, you got the idea!
• When we add measures together to take the average, some of the
negative ε’s compensate for the positive ε’s:
• If the sum of all the random errors (the ε’s) were zero, the average
would be exactly equal to the True Value!
• But in the average of only a few measurements, the compensation is
almost certainly crude:
• For example, the sum of the first two ε’s is -0.1+0.15 = +0.05
• The more measurements we average together, the better the
compensation (i.e. the closer the sum of all the errors to zero)
Why many repeated measurements?
• In a large sample of measurements the likelihood of finding pairs of
ε’s, one positive and the other negative, that have nearly exactly the
same absolute value is high:
• E.g.: -0.301 and +0.299; -0.001 and +0.0009
• In other words, large numbers of measurements explore more
thoroughly all the possible realizations (i.e. all the possible values) of
the random errors.
• As long as the errors are symmetrically distributed relatively to zero,
i.e. there are as many positive ε’s as negative ones, and they have the
same absolute values, the compensation will be nearly perfect.
• Larger numbers of measurements have larger information
content of smaller ones because they contain more realizations of
random errors.
• The important thing to know: the distribution of the errors
The probability distribution:
how many times a measure with a given value is observed?
•Measures closer to the true
value are more likely to occur
•In other words, measures with
smaller random errors are more
likely than those with larger
errors (blue histogram: peaky
probability distribution).
•Greatly deviant measures can
be found, too, only more rarely
•Random numbers do not
distribute around a peak. All
numbers are equally likely (red
histogram, flat probability
distribution). E.g.: tossing dices
How big errors can be?
The dispersion
•
•
•
•
But how large an error one can encounter?
That depends on how good the measures are.
This is reflected in the width of the distribution function
The larger the width of the distribution the larger the probability to have big random
errors
• The width (a.k.a. dispersion) is estimated by the standard deviation:
• …here A is the average of the measurements
• In other words, one takes the average of the errors squared!
• Why squared? Because squared numbers are positive, so that the sum does not
compensate. One wants to know the average error magnitude!
• The standard deviation is basically the average of the error absolute value (the final
square root compensates for having taken the square of the errors)
The class’ measurements of the period of the pendulum
3.00
2.72
2.94
2.87
2.69
2.97
2.94
2.72
2.94
2.91
2.87
2.97
2.85
2.60
2.50
2.60
3.00
……
The average of many measures is
a much more accurate estimator
of the true value than the
individual measures.
The standard deviation is an
estimate of the typical error
The importance of repeated
measures
• Let’s do a computer simulation
• i.e. let’s simulate measures of the period of the pendulum.
• The true value is 1.000 sec, and the typical random error
of the measure is 0.5 sec.
• (…but let’s pretend we do not know this).
• Then let’s take three sets of data, one set with 100
measurements, one with 1,000 and one with 10,000
• Then let’s compare the results, like calculating the
average, standard deviation, and plotting the distribution
of the measures
The larger the sample of measure,
the more accurate is the average
100 measures
1,000 measures
10,000 measures
10,000 measures contain much more information
than 1,000 measures, which contain much more
information than 100 measures, which contain
much more information than 1 measures!
That is why large samples are important
Distributions in Astronomy
the distribution of galaxy luminosity
The Distribution of Galaxy
Luminosity
• There are many more faint
galaxies than bright one
• In fact, the number of very bright
ones is uncertain, because they
are rare
• (step the telescope a bit to the
right, and you loose the two very
bright ones)
• The distribution of galaxy
luminosity (and colors) contains
key information on how these
systems formed and develop
Bright End
Faint End
More on distributions
• When tossing one dice, what is the distribution of
probability of each value?
• A: all values are equally likely
• B: some values are more likely than others
• When tossing two dices, what is the distribution
of probability of the sum of two values?
• A: all sums are equally likely
• B: some sums are more likely than others
• Let’s try…
The answer:
can you explain why?
•Each dice face has equal probability
•However, the probability to have 2
or 12 is low, because these can only
be done with 1+1 or 6+6.
•Much more likely are numbers that
can be mode by more combinations,
such as 7, which can be made as 1+6,
2+5, 3+4.
Information content of
distributions
• The shape of the distribution tells us which values are
more likely to be found, and which are less likely
• In a gaussian (bell-shaped) distribution, the most likely
value of all is the true value
• The least likely values are those with big deviations from
the true value
• Unfortunately, medium-size deviations are likely, too.
• This is why the standard deviations is a good indicators of
the random error
• The average is a powerful way to “average out” these
deviations