CSC 599: Computational Scientific Discovery
Download
Report
Transcript CSC 599: Computational Scientific Discovery
CSC 599: Computational
Scientific Discovery
Lecture 8: Lagramge (sp)
and Inductive
Process Modeling
Outline
Review
Processes
BACON
Other equation finders
The act of equation finding
Lagramge (sp)
Inductive Process Modeling
Review: Processes
Deal with changes over time
Range
(start, finish), (start, duration)
Rates of change
dx/dt
Previous history
attribute[event[t]] = f(event[1], event[2], . . . event[t-1])
Attributes of processes (when time is included)
Quantity of “always on” forces as function of time
Maximum limit of “homeostatic” forces
Gravity, electro-magnetism
Friction, Normal force
Misc. changes during process
Review: BACON
BACON was driven by the data and domain
knowledge:
Distinguishing between independent and
dependent attribute
Discovery of intrinsic properties of conceptual
values
BACON 3
BACON 4
Preference for symmetric equations given
knowledge that suggests one could exist
BACON 5
Equation Finders, 1990s
[From Ljupco Todorovski's Homepage:]
http://www-ai.ijs.si/~ljupco/
COPER [Kokar 1986]
Uses information about the dimension units of the
system variables to restrict the space of possible
equation structures.
BACON [Langley 1987]
Pioneer among equation discovery systems. It uses a
set of data-driven heuristics for finding regularities
(constancies and trends) in data and for formulating
hypotheses based on them.
Equation Finders, 1990s (2)
Fahrenheit/EF[Langley and Zytkow 1989],
[Zembowitz and Zytkow 1992]
ABACUS [Falkenhainer and Michalski 1990]
Used as a equation discovery subsystem of the
scientific discovery system. For discovering bivariate
equations only, user being able to specify the set of
operators and functions to be used within equations.
Experiments with different search strategies through
the space of equation structures. Also allows
discovery of piecewise equations using clustering for
identifying the limits between pieces.
IDS [Nordhausen and Langley 1990]
ARC [Moulet 1992]
Equation Finders, 1990s (3)
E* [Schaffer 1993]
LAGRANGE [Dzeroski and Todorovski 1995]
Extended LAGRANGE towards discovery from noisy
data.
SDS [Washio and Motoda 1997]
Handled differential equations.
GOLDHORN [Krizman et al. 1995]
Discovers of bivariate equations using a small set of
predefined equation structures.
Used information about scale types of the dimension
units of the system variables to restrict the space of
possible equations.
LAGRAMGE [Todorovski and Dzeroski 1997]
Allowed the user to specify the space of possible
equations with context free grammar.
Issues with equation finders
1. How to incorporate time
Esp. derivatives
2.How separate
Heuristics used to search eqn space
Grading fnc used to determine how well eqn fits:
Compare:
True function = x2
Current function = -x2
Large “error” in numeric space
Small “error” in symbolic (eqn space)
3. How best to incorporate domain knowledge?
BACON programs did so, but was ad hoc
Lagrange:
The Person
Italian/French
Mathematician and physicist
1736-1813
The System
Mid-1990s equation finder
Found differential equations
Dzeroski and Todorovski 1995
Lagramge (sp)
Lagramge = Lagrange (system) + grammar
Handling the issues:
1. How to account for time?
Each vector of variable values can represent the state
of the system at a particular time
Throw both attr and d[attr]/dt in variable set
2. Separating
a) Heuristics for searching equation space
Explicit user-defined function for judging fnc similiarity
b) Gauging equation fit
Sum of squared errors
Lagramge, top algorithm
void lagramge
(Grammar g, int maxComplexity, int maxBeamWidth,
fnc fittingFnc(), fnc eqnPreferFnc(), fnc stoppingCriteriaMet()
)
{
T0 = init symbol
Q1 = {T0}
do {
Q = Q1
R = Q.generateRefinements();
foreach r in R {
r.calcBestFitConstants(fittingFnc);
r.symbolicCloseness = eqnPreferFnc.calc(r);
}
Q1 = union(Q,R);
Q1.keepBest(maxBeamWidth,symbolicCloseness);
}
while ( (Q != Q1) && !stoppingCriteriaMet() );
}
Heuristic functions:
fnc fittingFnc()
Job = given this equation parse tree, find best fitting
parameters
attrlearn = c1*attr1 + c2*attr2 + c3* attr1*attr2 + c4
Distinguish between:
space of equation parse trees (countable)
space of all the floating point parameters (in principle,
uncountable)
fnc eqnPreferFnc()
Job = given this instantiated equation, how well does
it fit the data?
Traditional least squares, or some variant
Noise tolerant
Lagramge Handling Issues (2)
3. Incorporation of domain knowledge
Use a grammar!
Inputs:
1. Context free grammar
(More on next slide)
2. Input data D = (V,vd,M)
V = set of variables
vd = variable to predict in V
M = measurements of all vars:
time v1
t0
v1,0
t1
v1,1
. . .
v2
v2,0
v2,1
...
Outputs:
Ordinary or differential equation
vn
vn,0
vn,1
Grammar
Context Free Grammar =
<nonTerminalSet,
terminalSet,
productionSet,
startSym>
Ex:
double monod(double c, double v)
{ return(v/(v+c)); }
N = {E, F, M, v}
T = {+, const, *, monod, (“,”), N, P, Z}
P = { E -> const | const*F | E + const*F
F -> v | M | v*M
M -> monod(const,v) }
S=E
Tree's “Height” as its
complexity
The deeper the tree, the more complex the
expression
y = f1(x),
y = f1(f2(x)),
y = f1(f2(f3(x)))
Sym Height = shallowest height of deepest prod.
p = A -> A1,A2, . . .Al
h(p) = 1 + maxi {h(Ai)}
h(A) = if (isNonTerminal(A)) minq in prods of A{h(q)}
else /* isTerminal(A) */ 0
height, production
1 E->const; v->N; v->P; V->Z
2 M->monod(const,v); F->v
3 F->M; F->v*M; E->const*F; E->E+const*F
Lagramge Refinement
void refinement (Tree t, int maxHeight)
{
choose A, nonterminal in T
pA,i = production applied at that node
l = pathLength from root T to A
delete subtrees of A
if ( (i+1) <= A.numProductionsFor()) && (l + height(pA,i+1) <= maxHeight) ) {
pA,i+1 = A->A1,A2, . . . Al
replace pA,i with pA,i+1
expand A with successors A1,A2, . . . Al
do {
choose nonTerm leaf B if T
pB,1 = B->B1,B2, . . . Bm
expand B with successors B1,B2, . . . Bm
}
while ( T.hasNonTerminals() );
}
Refinement Example
Lagramge Findings (1):
Aquatic Ecosystems:
N = concentration of nutrient
P = concentration of phytoplankton
Z = concentration of zooplankton
dN/dt = -NP/(kN + N)
dP/dt = NP/(kN + N) – rPP – PZ(kP+P)
dZ/dt = PZ(kP+P) - rZZ
Lagramge Findings (2):
Two poles on a cart
(inverted pendulum):
Lagramge Discussion
Problems it solves
Handling noisy data
Handling of time and derivatives
Derivatives are just another variable
Nothing special to do on user's part
Distinguishes between error in symbol space
and error in numeric value space
sum of squared error
Heuristic fnc for 1st, grading fnc for 2nd
Incorporation of domain knowledge
Symbol space heuristic fnc
Grammar
Lagramge Discussion (2)
Hold up one second! Do you believe this?
Incorporation of domain knowledge
Grammar
Grammer ?!?
Computer scientists think in terms of grammars
FSM/regular expression
PDA/context free
Turing Machines/context sensitive
Do scientists think in terms of grammars?
(Besides linguists, of course)
Lagramge Discussion (3)
Remember BACON 5 heuristic:
If have structural knowledge that suggests a
symmetric equation than look for one first
Can we do that with Lagramge?
Grammar for symmetry(?)
A -> BCB
A -> c*B + c*D
A -> (B)
(not quite, both B's are no distinct)
(symmetric but not that powerful)
(ditto)
Preference for symmetry
Jury-rig eqnPreferFnc() as needed
Are you happy with that?
Inductive Process
Modeling
IPM
We want it all!
Processes
Explicitly recognized objects (in software engineering
sense)
Distinguish between instances and classes
higher level constructs with uninstantiated equations
Simulation equations constructed from process ones
Automatically deals with time
Exhaustive search
To some maximal complexity
Better fitting function for time series data
IPM Generic Processes
Library pred_prey
generic process logistic_growth;
variables S[species];
parameters gr[0,3], ic[0,0,1];
equations d[S,t,1] = gr * S * (1-ic*S)
generic process exponential_growth;
variables S{species};
parameters gr[0,3];
equations d[S,t,1] = gr * S;
generic process exponential_decay;
variables S{species};
parameters dr[0,2];
equations d[S,t,1] = -1*dr * S;
generic process holling_1;
variables S1{prey}, S2{predator};
parameters ar[0.01,10], ef[0.001,0.8];
equations d[S1,t,1] = -1 * ar * S1 * S2;
d[S2,t,1] = ef * ar * S1 * S2
IPM Quantitative Processes:
Fills in constants from equation templates:
Model PredatorPrey:
vars: aurelia{prey}, nasutum{predator};
observable: aurelia, nasutum;
process aurelia_growth;
equations d[aurelia,t,1] = 1.81 * aurelia * (1-0.0003*aurelia);
process nasutum_decay
equations d[nasutum,t,1] = -1 * 1.04 * nasutum;
process predation_holling_t:
equations d[aurelia,t,1] = -1 * 0.03 * aurelia * nasutum;
d[nasutum,t,1] = 0.30 * 0.03 * aurelia * nasutum;
d[aurelia]/dt = 1.81 * aurelia * (1-0.0003*aurelia) – 0.03 * aurelia * nasutum
d[nasutum]/dt = -1.04 * nasutum + 0.03 * aurelia * nasutum
IPM Algorithm
1. Find all permissible instantiations of generic
processes with specified variables:
logistic_growth:
S -> aurelia
logistic_growth:
S -> nasutum
exponential_decay:
S -> aurelia
exponential_decay:
S -> nasutum
holling_1: S1 -> aurelia, S2 -> nasutum
IPM Algorithm (2)
2. Go from partially instantiated processes to
generic models by carrying out exhaustive
search of model structures, up to some
complexity limit
process logistic_growth;
parameters gr[0,3], ic[0,0.1];
equations d[aurelia,t,1] = gr * aurelia * (1-ic*aurelia)
process exponential_decay;
parameters dr[0,2];
equations d[nasutum,t,1] = -1 * dr * nasutum
process holling_1:
parameters ar[0.01, 10], ef[0.001, 0.8]
equations d[aurelia,t,1] = -1 * ar * aurelia * nasutum
d[nasutum,t,1] = ef * ar * aurelia * nasutum
IPM Algorithm (3)
3. Find best fitting parameters w/Least squares
fitting that does 2nd order gradient descent
thru parameter space
Full simulation
Applies standard tricks for avoiding local mins
/or/ when all variables observable
Does teacher forcing
No full simulation
Given attributes at time t, minimize error at t+1
IPM Discussion
Finds sophisticated predator/prey relationships
in synthetic and actual data
Multiple predators and prey species
Successes
Applies domain knowledge in scientist-friendly
fashion
Groups equations and equation fragments together in
clumps that a scientist would find intuitive
Improvement?
It's an algorithm, not an architecture
Algorithm could be incorporated in a larger arch.