f(t) - AAVSO
Download
Report
Transcript f(t) - AAVSO
Basic Time Series
Analyzing variable star data
for the amateur astronomer
What Is Time Series?
Single variable x that changes over time t
can be multiple variables, W W W A T
Light curve: x = brightness (magnitude)
Each observation consists of two numbers
Time t is considered perfectly precise
Data (observation/measurement/estimate)
x is not perfectly precise
Two meanings of “Time Series”
TS is a process, how the variable
changes over time
TS is an execution of the process, often
called a realization of the TS process
A realization (observed TS) consists of
pairs of numbers (tn,xn), one such pair for
each observation
Goals of TS analysis
Use the process to define the behavior of
its realizations
Use a realization, i.e. observed data (tn,xn),
to discover the process
This is our main goal
Special Needs
Astronomical data creates special
circumstances for time series analysis
Mainly because the data are irregularly
spaced in time – uneven sampling
Sometimes the time spacing (the
“sampling”) is even pathological – with big
gaps that have periods all their own
Analysis Step 1
Plot the data and look at the graph!
(visual inspection)
Eye+brain combination is the world’s best
pattern-recognition system
BUT – also the most easily fooled
“pictures in the clouds”
Use visual inspection to get ideas
Confirm them with numerical analysis
Data = Signal + Noise
True brightness is a function of time f(t)
it’s probably smooth (or nearly so)
There’s some measurement error ε
it’s random
It’s almost certainly not smooth
Additive model: data xn at time tn is sum of
signal f(tn) and noise εn
xn = f(tn) + εn
Noise is Random
That’s its definition!
Deterministic part = signal
Random part = noise
Usually – the true brightness is
deterministic, therefore it’s the signal
Usually – the noise is measurement error
Achieve the Goal
Means we have to figure out how the
signal behaves and how the noise
behaves
For light curves, we usually just assume
how the noise behaves
But we still should determine its
parameters
What Determines Random?
Probability distribution (pdf or pmf)
pdf: probability that the value falls in a
small range of width dε, centered on ε is
Probability = P(ε) dε
pmf: probability that the value is ε is P(ε)
pdf/pmf has some mean value μ
pdf/pmf has some standard deviation σ
Most Common Noise Model
i.i.d. = “independent identically distributed”
Each noise value is independent of others
P12(x1,x2) = P1(x1)P2(x2)
They’re all identically distributed
P1(x1) = P2(x2)
What is the Distribution?
Most common is Gaussian (a.k.a. Normal)
P( )
e
1
2
2
( ) /
2
2
Noise Parameters
μ = mean = <ε>
Usually assumed zero (i.e., data unbiased)
σ2 = variance = <(ε-μ)2>
σ = √(σ2) = standard deviation
Typical value is 0.2 mag. for visual data
Smaller for CCD/photoelectric (we hope!)
Note: don’t diparage visual data, what they
lack in individual precision they make up by
the power of sheer numbers
Is the default noise model right?
No! We know it’s wrong
Bias: μ values not zero
NOT identically distributed – different
observers have different μ, σ values
Sometimes not even independent
(autocorrelated noise)
BUT – i.i.d. Gaussian is still a useful
working hypothesis, so W W W A T
Even if …
Even if we know the form of the noise …
We still have to figure out its parameters
Is it unbiased (i.e. centered at zero so μ = 0)?
How big does it tend to be (what’s σ )?
And …
We still have to separate the signal from
the noise
And of course figure out the form of the
signal, i.e.,
Figure out the process which determines
the signal
Whew!
Simplest Possible Signal
None at all!
f(t) = constant = βo
This is the null hypothesis for many tests
But we can’t be sure f(t) is constant …
… that’s only a model of the signal
Separate Signal from Noise
We already said
data = signal + noise
Therefore
data – signal = noise
Approximate signal by model
Approximate noise by residuals
data – model = residuals
xn – yn = R n
If model is correct, residuals are all noise
Estimate Noise Parameters
Use residuals Rn to estimate noise
parameters
1
Estimate mean μ by average R
N
R
N
j 1
Estimate standard deviation σ by sample
standard deviation
N
s
(R
j 1
j
R)
N 1
2
j
Averages
When we average i.i.d. noise we expect to
get the mean
Standard deviation of the average
(usually called the standard error) is less
than standard deviation of the data
( ave) " s.e."
( raw)
N
Confidence Interval
95% confidence interval is the range in
which we expect the average to lie, 95% of
the time
About 2 standard errors above or below
the expected value
95% C.I . x 2 ( ave)
x 2 ( raw) / N
Does average change?
Divide time into bins
Usually of equal time width (often 10 days)
Sometimes of equal number of data N
Compute average and standard deviation within
each bin
IF signal is constant AND noise is consistent,
THEN expected value of data average will be
constant
So: do the “bin averages” show more variation
than is expected from noise?
ANOVA test
Compare variance of averages to variance
of data (ANalysis Of VAriance = ANOVA)
In other words… compare variance
between bins to variance within bins
“F-test” gives a “p-value,” probability of
getting that result IF the data are just noise
Low p-value probably NOT just noise
Either we haven’t found all the signal
Or the noise isn’t the simple kind
ANOVA test
50-day averages:
Fstat
df.between df.within
p
0.315563
2
147
0.729871
NOT significant
10-day averages:
Fstat
df.between df.within
p
0.728138
14
135
0.743133
NOT significant
ANOVA test
50-day averages:
Fstat
df.between df.within
13.25758
2
147
IS significant
p
5e-06
10-day averages:
Fstat
df.between df.within
p
2.546476
14
135
0.002879
IS significant
Averages Rule!
Excellent way to reduce the noise
because σ(ave) = σ(raw) / √N
Excellent way to measure the noise
Very little change to signal
unless signal changes faster than averaging
time
So in most cases averages smooth the
data, i.e., reduce noise but not signal
Decompose the Signal
Additive model: sum of component signals
Non-periodic part
sometimes called trend
sometimes called secular variation
Repeating (periodic) part
or almost-periodic (pseudoperiodic) part
can be multiple periodic parts (multiperiodic)
f(t) = S(t) + P(t)
Periodic Signal
Discover that it’s periodic!
Find the period P
Or frequency ν
Pν = 1 ν = 1 / P
Find amplitude A = size of variation
P=1/ν
Often use A to denote the semi-amplitude,
which is half the full amplitude
Find waveform (i.e., cycle shape)
Periodogram
Searches for periodic behavior
Test many frequencies (i.e., many periods)
For each frequency, compute a power
Higher power more likely it’s periodic with
that frequency (that period)
Plot of power vs frequency is a
periodogram, a.k.a. power spectrum
Periodograms
Fourier analysis Fourier periodogram
Don’t use DFT or FFT because of uneven
time sampling
Use Lomb-Scargle modified periodogram OR
DCDFT (date-compensated discrete Fourier
transform)
Folded light curve AoV periodogram
Many more … these are the most common
DCDFT periodogram
AoV periodogram
Lots lots more …
Non-periodic signals
Periodic but not perfectly periodic
(parameters are changing)
What if the noise is something “different”?
Come to the next workshop!
Enjoy observing variables
See your own data used in real scientific
study (AJ, ApJ, MNRAS, A&A, PASP, …)
Participate in monitoring and observing
programs
Assist in space science and astronomy
Make your own discoveries!
http://www.aavso.org/