Computational Discovery of Communicable Knowledge

Download Report

Transcript Computational Discovery of Communicable Knowledge

The Computational Discovery of
Communicable Knowledge
Pat Langley
Computational Learning Laboratory
Center for the Study of Language and Information
Stanford University, Stanford, CA 94304
http://hypatia.stanford.edu/cll/
[email protected]
Also affiliated with the DaimlerChrysler Research & Technology Center and
the Institute for the Study of Learning and Expertise.
The Problem and the Potential
Our society is collecting increasing amounts of data in
commercial and scientific domains.
These include complex spatial/temporal data sets like:
 traces of traffic behavior from GPS and cell phones
 prices of stocks and currencies from exchanges
 measurements of climate and ecosystem variables
Computational techniques should let us find relations in
these data that are useful for business and society.
Drawbacks of Current Approaches
The fields of machine learning and data mining have
developed methods to find regularities in data.
Despite many successful applications, most techniques:
 assume attribute-value representations that cannot
handle time or space
 cannot tell interesting discoveries from mundane ones
 state the discovered knowledge in some opaque form
This indicates the need for alternative methods that can
address these issues.
Paradigms for Machine Learning
decision-tree
induction
induction of
logical rules
case-based
learning
neural
networks
probabilistic
induction
Paradigms for Scientific Discovery
taxonomy
formation
qualitative law
discovery
equation
discovery
structural model
construction
process model
formation
Discovering Numeric Laws
Statement of the task:
• Given: Quantitative measurements about objects or events in
the world.
• Find: Numeric relations that hold among variables that
describe these items and that predict future behavior.
Historical examples:
• Kepler’s three laws of planetary motion
• Archimedes’ principle of displacement in water
• Black’s law relating specific heat, mass, and temperature
• Proust’s and Gay-Lussac’s laws of definite proportions
BACON on Kepler’s Third Law
BACON carries out heuristic search through a space of numeric
terms, looking for constant values and linear relations.
moon
d
p
d/p
d2/p
d3/p2
A
B
C
D
5.67
8.67
14.00
24.67
1.77
3.57
7.16
16.69
3.20
2.43
1.96
1.48
18.15
21.04
27.40
36.46
58.15
51.06
53.61
53.89
This example shows the system’s progression from primitive
variables (distance and period of Jupiter’s moons) to a complex
term that has a nearly constant value.
Some Laws Discovered by BACON
Basic numeric relations:
• Ideal gas law
PV = aNT + bN
• Kepler’s third law
D3 = [(A - k) / t]2 = j
• Coulomb’s law
FD2 / Q1Q2 = c
• Ohm’s law
TD2 / (LI - rI) = r
Relations with intrinsic properties:
• Snell’s law of refraction
sin I / sin R = n1 / n2
• Archimedes’ law
C =V + i
• Momentum conservation
m1V1 = m2V2
• Black’s specific heat law
c1m1T1 + c2m2T2 = (c1m1+ c2m2 ) Tf
Temporal Laws of Ecological Behavior
(Todorovski & Dzeroski, 1997)
Input:
time
phyt
zoo
phosp
temp
time 1
time 2
phyt 1
phyt 2
zoo 1
zoo 2
phosp 1
phosp 2
temp 1
temp 2
..
..
..
time m
phyt m
zoo m
..
phosp m
..
temp m
Input:
a context-free grammar of domain constraints
Output:
phosp
phyt = c1 • phyt •
– c3 • phyt
c2 + phosp
•
Formulating Structural Models
Statement of the task:
• Given: Qualitative or numeric empirical laws that describe
observed phenomena.
• Find: Explanatory models of these phenomena in terms of
component objects and their relations.
Historical examples:
• Dalton’s and Avogadro’s molecular models of chemicals
• Mendel’s genetic model of inherited traits
• Quark models of elementary particles
• Structural models of planets, comets, and stars
DALTON on Chemical Reactions
Initial state:
(reacts in {hydrogen oxygen} out {water})
(reacts in {hydrogen nitrogen} out {ammonia})
(reacts in {oxygen nitrogen} out {nitrous oxide}) . . .
Final state:
2 hydrogen + 1 oxygen  2 water
3 hydrogen + 1 nitrogen  2 ammonia
2 oxygen + 1 nitrogen  2 nitrous oxide
hydrogen  {h h} water  {h h o}
oxygen  {h h}
ammonia  {h h h n}
nitrogen  {h h}
nitrous oxide  {n o o} . . .
DALTON finds these structural models through a depth-first
search process constrained by conservation assumptions.
Constructing Process Models
Statement of the task:
• Given: Qualitative or numeric empirical laws that describe
temporal phenomena.
• Find: Explanatory models of these phenomena in terms of
processes among component objects.
Historical examples:
• Caloric and kinetic theories of heat phenomena
• Reaction pathways in chemistry and nucleosynthesis
• Models of continental drift and plate tectonics
• Process models of stellar evolution and destruction
ASTRA on Nucleosynthesis
Inputs:
- quantum properties for elements and isotopes
- conservation relations among these properties
- an element to be explained (e.g., O or C)
- elements to be assumed (e.g., H or He)
Outputs:
- elementary reactions that obey conservation laws
- reaction pathways that explain the element’s evolution
ASTRA uses depth-first search to find reaction pathways for:
- proton and neutron captures
- neutron and deuteron production
- generation of helium (He) from hydrogen (H)
- generation of carbon (C) and oxygen (O)
Three Pathways for Carbon Synthesis
Standard pathway:
4He + 4He  8Be
4He + 8Be  12C
Alternative pathways:
4He + D  6Li
3He + 6Li  9Be
4He + 9Be  12C + n
+ D  6Li
4He + 6Li  10Be
4He + 10Be  12C + D
4He
ASTRA generates many pathways novel to astrophysics, some
of which have viable reaction rates.
Proposed Research
We plan to develop and evaluate discovery methods that:
 are designed to process temporal and structured data
 use techniques from computational scientific discovery
 describe new knowledge in a communicable form
Likely notations for the discovered knowledge include:
 structural models of relations among entities
 process models of change over time
 sets of simultaneous differential equations
We will apply our methods to domains that benefit from
such communicable representations.
Benefits of the Approach
Unlike most previous work on data mining and knowledge
discovery, our methods will:
 support discoveries in domains that involve complex
spatial, temporal, or relational data
 use domain knowledge to filter only discoveries that
are interesting and novel to the domain user
 present the new knowledge in some understandable
notation that can be communicated among humans
Such techniques will improve the way we manipulate and
understand complex data.