Probability Density Based Indexing for High
Download
Report
Transcript Probability Density Based Indexing for High
Math Models for Learning
and Discovery
Kristin P. Bennett
Mathematical Sciences Department
Rensselaer Polytechnic Institute
1/14/03
1
The Learning Problem
The problem of understanding intelligence
is said to be the greatest problem in
science today and “the” problem of for
this century – as deciphering the genetic
code was for the second half of the last
one…the problem of learning represents a
gateway to understanding intelligence in
man and machines.
-- Tomasso Poggio and Steven Smale 2003
1/14/03
2
What do these problems
have in common?
Design and Discovery of Pharmaceuticals
Target Marketing in Business
Diagnosis of Breast Cancer
Discovery of Novel Superconductors
Detection of Anthrax using TZ spectroscopy
Modeling and predicting global trade
RNA Transcription
1/14/03
3
DRUG TRIVIA
•
•
•
•
•
•
•
•
•
In USA $25B/yr for R&D of pharmaceuticals (33% clinicals)
Worth their weight in gold
10-15 years from conception market for drug
Development cost 0.5B/drug
First-year sales > $1B/drug
1 drug approved/5000 compounds tested
1 out of 100 drugs succeeds to market
19 Alzheimer’s drugs in development
20,000,000 Americans with Alzheimer by 2050
1/14/03
RENSSELAER
4
1/14/03
RENSSELAER
5
TOWARDS TREATING THE HIV EPEDIMIC
HIV Reverse-Transcriptase Inhibition modeling:
Have a few Molecules that have been tested:
R2
O
X
O
HN
HN
N
R1
O
N
R
HO
R
N
O
S
R1
N
N
O
TBDMSO H2N
O
N
S
N
OTBDMS
O
O
S
O
R1
N
O
R1
R2
O
N
R2
N
N
O
TBDMSO H2N
OTBDMS
O
O
S
O
Can we predict if new molecule will inhibit HIV?
1/14/03
6
What do we know?
The bioactivities of a small set of molecules
Many Possible Descriptors for each molecules:
Molecular Weight
Electrostatic Potential
Ionization Potential
Can we predict molecules bioactivity?
1/14/03
7
Database Marketing
Bank has $1.7 billion portfolio
of home mortgages.
When customer refinances,
they may lose customer.
Questions will a customer
refinance?
If so, offer that customer a
good deal on refinancing.
1/14/03
8
What do we know?
For many customers, we know if they
refinanced or not.
We know attributes of customer:
Income
Age
Residential Area
Payment History
Can we predict behavior of future customers?
1/14/03
9
Breast Cancer Diagnosis
Fine needle aspirate of breast tumor.
Is tumor benign or malignant?
1/14/03
10
What do we know?
For patients in initial study, we know
whether tumor was benign or malignant.
Have a digital image of tumor aspirate.
Know characteristics doctors look at:
Uniformity of cell shape
Uniformity of cell size
Cell Mitosis
1/14/03
11
What do we know?
For patients in initial study, we know
whether tumor was benign or malignant.
Have a digital image of tumor aspirate.
Know characteristics doctors look at:
Uniformity of cell shape
Uniformity of cell size
Cell Mitosis
1/14/03
12
Superconductivity
Superconductivity is the ability of a
material to conduct current with no
resistance and extremely low loss.
A few high temperature
superconductors have been found.
What other compounds are
superconductors?
1/14/03
13
Applications of
Superconductivity:
Magnetic Resonance Imaging
1/14/03
14
Applications of
Superconductivity
Maglev Trains
1/14/03
15
Applications of
Superconductivity
Very small and efficient motors
Better power transmission cables
Better cellular phone service
Find a cheap high-temperature superconductor
and you will get the NOBEL PRIZE.
1/14/03
16
What do we know?
Many compounds have been tested to see
if they are superconductors.
Many descriptors exists for these
compounds based on molecular
properties.
1/14/03
17
What do all these problems
have in common?
Each problem
Can be posed as a “yes” or “no” question.
Has examples known to be of the “yes”
type or the “no” type.
Each example has an associated set of
descriptors.
Learn Classification Function !
1/14/03
18
Data Mining
Each problem has data.
Our job is to “mine” information from this
data.
Information depends on the question
asked.
In this case we must produce a predictive
yes/no model (a.k.a. a classification
model) based on the data.
1/14/03
19
Mathematical Model
Have data
( x1 , y1 ),
,( xm , ym )
Construct predictive function
f(x)y
Solve mathematical
model
to find f
2
m
min f
f (x ) y
i
i
i
f
2
K
Want f to generalize well on future data
1/14/03
20
Types of Learning Problems
Classification
yi 1 or 1
Regression
yi R
Clustering
yi unknown
Ranking
1/14/03
y1 y2 , yk y j ,
21
Data Mining
Classification = yes/no models
Start with examples of yes and no.
Associate a set of descriptors with each
example. Descriptors must be
appropriate for the question you are
asking.
Construct a model to split the two sets
Use the model to predict new examples.
1/14/03
22
Learning Model
What kind of learning task is it?
What sort of f should we use?
Kernel function
1/14/03
f ( x) i K ( x, xi )
What loss function to use?
i
What regularization function?
How can we solve this learning model?
How well will the model predict new points?
23
Class information
See course web page
http://www.rpi.edu/~bennek/class/mmld/i
ndex.htm
1/14/03
24