History and Philosophy of Science

Download Report

Transcript History and Philosophy of Science

Causal Models
Lecture 12
Causal Models: Introduction
Causal models highlight interactions among variables,
often without specifying mechanisms for those
relationships.
Although causal models may be deterministic, they often
use probability theory to address uncertainty.
This
lecture equation
discussesmodels,
three approaches:
• structural
• Bayesian networks, and
• qualitative approaches based on model-checking.
The first two formalisms have been used for decades and
are supported by multiple informatics tools.
Support for the scientific use of qualitative approaches is in
its early stages.
Causal Models: Historical Use
Structural Equation Models
Structural equation models express linear, causal
relationships among variables and include
• observed, or manifest, variables that may be measured
and that are generally associated with data;
• unobserved, or latent, variables that are typically not
measurable in principle;
• causal links among the variables; and
• error terms that account for intrinsic randomness or
unknown causal factors.
As a methodology, structural equation models let scientists
connect causal models to observational data.
Structural Equation Models
A structural equation models is a system of linear equations
with error terms.
The equations are often shown as graphs to highlight the
causal relationships among the variables.
x1 = 0.56x4 + 0.90x2 + N(0, 1.40)
x2 = N(0, 1.11)
x3 = 1.39x1 + N(0, 1.22)
x4 = -0.52x2 + N(0, 1.07)
TETRAD: Creating Models
TETRAD is an informatics environment that supports
structural equation modeling.
In addition to creating the structure, researchers can either
provide parameters or generate them probabilistically.
Specifying the structure
Specifying parameters
TETRAD: Interacting with Models
TETRAD also lets researchers simulate their models, plot
the results, and compare model structures to each other.
Simulation results
Histogram of a
variable’s values
TETRAD is a showcase for structural and parametric
search and does not support data analysis or comparison.
http://www.phil.cmu.edu/projects/tetrad/
Bayesian Networks
Bayesian networks replace the equations of structural
equation models with conditional probability tables.
Note the conditional probability
table for the coma node.
Possible values for the node are
presented in rows.
Possible states for the node’s
parents appear in columns.
GeNIe
GeNIe is an informatics environment that supports building,
running, and learning Bayesian networks.
GeNIe: Creating and Interacting with Models
Creating models in GeNIe is similar to working in TETRAD,
except users fill in conditional probability tables.
For a given Bayesian network, GeNIe can calculate the
probabilities of each node state.
Researchers can also input evidence (set the value of one
or more nodes) and see the effect on probabilities.
Belief state before
entering any evidence.
Belief state after asserting
that a coma is present.
Uses of Structural Equation Models
and Bayesian Networks
Scientists in several disciplines use structural equation
models and Bayesian networks to explain observations.
Moreover, Bayesian networks provide the foundation for
informatics tools where they
• diagnose lymph node diseases in patients (e.g.,
Pathfinder: Heckerman, 1990); and
• monitor and direct attention to details in the space
shuttle propulsion system (e.g., Vista: Horvitz, 1992).
Qualitative Causal Models
Qualitative causal models represent relationships between
variables as positive or negative influences.
In some cases, these influences come from a richer
relational ontology.
Each environment must handle the distinction between
the qualitative, abstract relationships and quantitative
data.
GenePath: Model Construction
GenePath is an interactive modeling system for qualitative
model construction.
Knowledge of relationships in
the genetic network.
The network reflects how
particular genes affect
aggregation in D. discoideum.
This organism transitions from
uni- to multi-cellular when
hungry by aggregating.
Graphical representation of
the genetic network.
Red lines are inhibition.
Green lines are activation.
Numbers indicate confidence.
GenePath: Incorporating Data
Adding data, experimental or observational, to GenePath
results in an automatic revision of the qualitative model.
The program integrates knowledge and data to find a
network that reflects current knowledge.
An updated model of D. discoideum aggregation that
includes new links supported by various data sets.
Hybrow
Hybrow is designed to evaluate hypotheses against a
knowledge base and data sets.
The hypotheses are actually qualitative models.
The model contains spatial
relationships (Gal4p is in the
nucleus).
The model also includes
biological operators (binds,
transports, etc.)
Hybrow: Model (Hypothesis) Text
Models consist of events:
ev1 = Gal4p binds Gal80p in nuc(leus) in w(ild)t(ype)
ev2 = Mig1p repress gal4 in nuc in wt in presence_of
glucose
ev3 = Mig1p not repress gal4 in nuc in wt in absence_of
glucose
hy1 = ev1 + ev2 + ev3
Submitting this model to Hybrow results in a collection of
confirmatory and contradictory evidence.
ev2:: Ontology: Agent b has to be gene for repress
Data: Mig1p repress gal4 wt nuc (PubMed link)
ev3:: Ontology: Agent b has to be gene for repress
Data: Mig1p repress gal4 wt nuc (PubMed link)
Causal Modeling: Summary
The software presented in this lecture shared several
common features, such as
• all the systems let scientists specify models;
• apart from Hybrow, all systems used a simplified
representation for variable interaction;
• although not discussed here, all the systems incorporate
discovery components.
Curiously, the quantitative systems lacked support for
evaluating models against data.
In contrast, the qualitative systems have minimal utility
without their interaction with external data and knowledge.