Example-based analysis of binary lens events(PV)
Download
Report
Transcript Example-based analysis of binary lens events(PV)
Pierre Vermaak
UCT
An attempt to automate the discovery of
initial solution candidates.
Example-based learning
Why?
◦ Track record on difficult problems
◦ Very different to ∆2 – approaches ; complimentary
◦ Neat
In this talk, I’ll give a practical perspective
What is it?
◦ Very broad field
◦ Data mining
◦ Machine learning
Well-known algorithms
◦
◦
◦
◦
Neural Networks
Tree inducers (J48, M5P)
Support Vector Machines
Nearest Neighbour
Same idea throughout ...
Attempt to map input to output
◦ e.g. binary lens light curve -> model parameters
Uses example data set: “training set”
◦ e.g. many simulated curves and their model
parameters
Adjusts learning model parameters to best fit
of training data: “training”
◦ Usually some sort of iteration
◦ Algorithm dependent
Evaluation
◦ Usually performance measured on unseen data set:
“test set”.
Famous data set by Fisher.
Want to classify irises into three categories
based on petal and sepal width and length.
150 examples
Data Snippet
sepallength (cm)
5
6
6.2
6
4.5
sepalwidth (cm)
2
2.2
2.2
2.2
2.3
petallength (cm)
3.5
4
4.5
5
1.3
petalwidth (cm)
1
1
1.5
1.5
0.3
class
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-virginica
Iris-setosa
“OneR”
◦ Deduces a rule based on one input column
(“attribute”)
Rule
Results
“Multi-layer Perceptron”
◦ simple Neural Network
Network
Result
Neat examples ...
Can it be used on the real problem?
Issues
◦
◦
◦
◦
Many ambiguities of binary model
No uniform input - not uniformly sampled
Noise
Complexity
Success/failure with a variety of approaches
The approach I’d like to take
DIY tools for the job
◦ Do try this at home.
“Raw” light curves are unsuitable
Require uniform inputs for training
Interpolation – non-trivial
Smoothing – non-trivial
Centering/Scaling – non-trivial
◦ And the same scheme needs to be applied to subsequent
unseen curves
◦ Which scheme? What biases are introduced?
◦ Required for interpolation anyway
◦ Also for derived features (extrema, slope)
◦ Algorithms performed much better with normalized light
curves
◦ What to centre on? Peak? Which one?
◦ What baseline? Real curves are truncated
How many example curves to use?
Ranges of binary lens model parameters in
training set.
Noise model for example curves.
Choice of learning algorithm
Pre-processing parameters
etc.
Normalized Curves
◦ Using truncation/centering/scaling and smoothing
Derived Features
◦
◦
◦
◦
◦
Attempt to extract properties of a light curve
PCA
polynomial fits
extrema
etc.
Various schemes attempted
Most successful
◦ Find time corresponding to peak brightness
◦ Translate the curve in time to this this value
◦ Discard all data fainter (by magnitude) than 20% of
the total magnitude range
◦ Normalize the time axis (-0.5 to 05)
Required for interpolation of equally-spaced
data points on the curve
Too much smoothing destroys features
Too little smoothing turns noise into features
Final scheme was a fitted B-spline iteration.
◦
◦
◦
◦
Fit a B-spline
Count extrema
Repeat until number of extrema in suitable range
Worked out to be surprisingly robust
Truncation
◦ Slope-based ... Numerical derivatives too noisy
◦ Fitting a simpler model (Gaussian, single-lens)
◦ Brightness exceeds 3 standard deviations of wing
brightness
Smoothing
◦ Moving window averaging – destroys small features
◦ Savitzky-Golay – only works on evenly-spaced
points
Chebyshev Polynomials
PCA
Single lens fits
Moments
Derivatives
Smoothed Curves
Time and Magnitude of extrema
Features are then selected for usefulness
using selection algos (brute-force,
information-based, etc.)
Using simulated curves
The pre-processed curves themselves
performed slightly better than derived
features.
A simple learning algorithm performed best
(nearest neighbour)
It sort of works on real events, but not at
Production strength and still with
intervention.
Still required Genetic Algo fine-tuning.
Not good at finding multiple solutions
Automation: Mimic a human expert
Categorize curves instantly
Use categorization to come up with joint
likelihood distribution in model parameter
space.
I want multiple solutions and large regions of
exclusion.
Still believe in feature selection
Eliminate dodgy pre-processing
◦ Smoothing
◦ Interpolation
Use fast fits of “basis” functions
◦ Possibly use binary curves themselves for
comparison, but with a robust distance metric.
◦ Use the quality of fits as main feature
◦ Fit a single lens and characterize residuals
These algorithms are very powerful
But no algorithm is any good against
impossible odds.
So, alternative parameterizations, etc. are
extremely important to this approach, just
like to traditional fitting.
Java
◦ 60%-100% as fast as C++ nowadays
◦ Cross-platform
◦ Plugs into and out of everything (Python, legacy
COM, Matlab, etc.)
◦ Oh, the tools! – Parallelisation, IDE’s, just
everything.
“javalens” – my rather humble new Java code
◦
◦
◦
◦
Asada’s method
Lots of abstraction, more like framework
Open Source
Search “javalens” on google code
R
WEKA
Netbeans
VI
◦ Awesome, free and open source statistics environment
◦ Can be called from Java
◦
◦
◦
◦
Great data mining app, used extensively in my thesis
Dangerous! Can spend years playing with it.
Make sure you concentrate on the sensibility of your data
NOT the large variety of fitting algorithms
◦ Just a great free, open source Java IDE
◦ Code completion
◦ Automatic refactoring tools
◦ No comment