Lecture 1 - University of Cape Town

Download Report

Transcript Lecture 1 - University of Cape Town

Lecture 3
• Good vs evil programming practices.
• Types of data.
• Noise & random variables.
– Good books:
• “Data reduction and Error Analysis for the Physical
Sciences”, Bevington, 1969
• “Practical Statistics for Astronomers”, Wall &
Jenkins.
NASSP Masters 5003F - Computational Astronomy - 2009
Good vs.EVIL programming habits.
• The good:
– Make your code as unspecialized as possible.
– Chunk tasks into functions and modules.
– Use the ‘need to know’ principle.
– Bytes are cheap – use plenty!!
• The bad:
– GOTO – python has abolished this statement.
This is a good thing – goto is exceptionally
evil!
– Equality testing against reals. Eg
if (myvar == 7.2):
• Only do this against zero.
NASSP Masters 5003F - Computational Astronomy - 2009
Good vs.EVIL programming habits.
• The bad continued:
– Changing function arguments. Eg
def hobbit_state(bilbo, sam):
sam = ‘hungry’
– This is actually only ugly in python...BAD is
doing it with lists. Try the following:
>>> def moonshine(list_of_moons):
...
list_of_moons.append(‘half’)
...
>>> mondlicht = [‘new’,’blue’]
>>> moonshine(mondlicht)
>>> print mondlicht
[‘new’, ‘blue’, ‘half’]
– This is what is known as a side effect.
NASSP Masters 5003F - Computational Astronomy - 2010
Good vs.EVIL programming habits.
• The bad continued:
– Using reserved words for variable names.
>>>open = 4
– Leaving some values untested, eg:
if (nuddervar < 5.9):
# do something
elif (nuddervar > 5.9):
# do something else
# next statement is after the ‘if’ block.
• The ugly:
– Too-short variable names for important
objects. Eg:
>>> a = [num_stars, output_file, ‘galactic’]
NASSP Masters 5003F - Computational Astronomy - 2009
Good vs.EVIL programming habits.
• The ugly continued:
– ‘Hard-wired’ values. If you do this (and it is,
like many ‘evil’ practices, sometimes
convenient) then try NEVER to do it inside a
function – but only in the top-level program.
numIterations = 1000
– Ambiguous imports:
>>> from pyfits import *
>>> open(‘myfile’)
– Does it use the pyfits open or the standard
python open? Better to always do something
like:
>>> import pyfits as pf
>>> pf.open(‘myfile’)
NASSP Masters 5003F - Computational Astronomy - 2009
Programming aims – conflicting criteria:
Runs fast
Low memory
use
Maintainable
Portable
Easy to
build
Low bug count
Man-hours to
write it
User-friendly
UI
•Compromise is ALWAYS necessary;
•Different situations call for different ordering of the priorities.
NASSP Masters 5003F - Computational Astronomy - 2009
Recommended aims for you:
• Quality results.
– Make your code robust.
• this means, don’t make it sensitive to external changes.
– generic
– flexible (no hard-wiring)
– Make it easy to debug.
• Compartmentalisation (= cut into independent chunks)
– ‘need-to-know’
– no side-effects.
• Saving your time.
– Write maintainable code:
• self-documenting.
• avoid the ‘sin of pride’ - no dense, incomprehensible code
NASSP Masters 5003F - Computational Astronomy - 2009
Astronomy data – binned:
• 1-d:
– Time series (aka light curves)
– Spectra
•
•
•
•
vs frequency…
wavelength…
energy…
recession velocity… etc
• 2-d:
– Modern images (CCD imposes binning)
• 3-d:
– Cubes!
NASSP Masters 5003S - Computational Astronomy - 2009
Bins
• Usually a regular sequence – all bins the
same size.
• Have to consider bin widths (equivalently,
locations of bin edges).
• Can only store a finite number of bins…
– hence there must be a first and a last bin,
– thus also there may be data which fall outside
the sequence of bins.
• Need to know how to go from the bin
number or index to the ‘world coordinates’
of the data it contains.
NASSP Masters 5003F - Computational Astronomy - 2009
Astronomy data – unbinned
• Lists of sources, spectral lines or other
objects.
• A sequence of observations (= an
unbinned or asynchronous time series).
• A list of photons – “time-tagged events”.
– Actually there is usually binning at some level.
– But, if there is on average <<1 event per bin,
can often treat them as unbinned. There will
be examples of this during the course.
• Photographic plates.
NASSP Masters 5003S - Computational Astronomy - 2009
Astronomy data continued
• One conceptual way to divide up data:
– Signal
– Background
– Noise
•
•
•
•
Gaussian or ‘white’ noise (thermal)
Poisson (quantum)
1/f or ‘red’ noise (fractal Nature)
Other filtered noise
• Note: difference between signal and
background is often an ‘academic
question’.
NASSP Masters 5003S - Computational Astronomy - 2009
Astronomy data continued
• Another way to think of
things:
– Truth
– Measurement
• = truth x instrumental
effects + noise
– Estimated truth –
sometimes called the
model.
• Usually expressed in
terms of a small number
of parameters.
• The aim of much data
analysis is to determine
best-fit parameters.
NASSP Masters 5003S - Computational Astronomy - 2009
The probability density function, aka the PDF, or
the probability distribution.
p(x)
Average μ:
  x   dx x p x 
Units: probability
per unit x.
Variance σ2:
 2   dx  x   2 p  x 
Standard deviation σ is sqrt(variance).
Area under the
curve ought to
be equal to 1.
x
NASSP Masters 5003F - Computational Astronomy - 2009
Random variables
• Important properties of a random value X:
– The probability density p(x). Contains all the information.
– The mean or average
– The variance
  x   dx x p x 
 2   dx  x   2 px 
These are not guaranteed to exist!
• It is important to distinguish between the ideal values of
these and estimates of them which one can calculate
from a sample [x1,x2,...xN] of X.
• Although the ideal values are formally unattainable, in
practice good estimates of them may be available. Eg:
– There may be formulae which predict p, μ and (often most
importantly) σ2 (see eg radio astronomy).
– Long-term calibration measurements can provide good
estimates (true for most scientific instruments).
NASSP Masters 5003F - Computational Astronomy - 2009
The ‘survival function.’
• Also often of

p(x)
Px  x0    dx px
interest is an
x
integral over the
probability
density.
0
x0
x
NASSP Masters 5003F - Computational Astronomy - 2009
Example of a survival function
NASSP Masters 5003F - Computational Astronomy - 2009
Joint and conditional probabilities
• Where there are M>1 random variable one
can talk of their joint probability
distribution, which is a function in M
dimensions.
– Eg the JPD for variables X and Y is written
p(x,y).
– Only if X and Y are uncorrelated (= their
covariance σ2A,B=0) can their JPD be
decomposed to p(x)p(y).
NASSP Masters 5003F - Computational Astronomy - 2009
Joint and conditional probabilities
• To obtain p(x) from p(x,y) you have to
integrate or marginalize:
p x    dy p x, y 
• Another way to write this is
p  x    dy p x y  p  y 
• where p(x|y) is the conditional probability
density of x given a particular value of y.
– You can think of it as a slice through p(x,y) at y.
NASSP Masters 5003F - Computational Astronomy - 2009
The two most important distributions:
1. Gaussian
p(x)
 x2 
1
p  x,   
exp  2 
 2
 2 
NASSP Masters 5003F - Computational Astronomy - 2009
The two most important distributions:
2. Poisson
…but note Central Limit theorem.
(= i)
NASSP Masters 5003F - Computational Astronomy - 2009
Examples: Gauss vs Poisson noise
The ‘hat’ means
‘estimate’.
NASSP Masters 5003F - Computational Astronomy - 2009