Transcript slides

Innovation:
Large Scale Regulatory Networks and
Niche Construction
Manfred D. Laubichler
Arizona State University
Santa Fe Institute
Marine Biological Laboratory
KLI
Max Planck Institute for the History of Science
Challenges
Understanding Innovation in Large Scale
Networks
The Patterns and Dynamics of Innovation in
Science
The Role of Collaboration in Enabling
Scientific Innovation
Extended Evolution:
From Genomes to Technologies
— Integrating regulatory network and niche
construction perspectives;
— Integrating mechanisms related to the origin of
variation (novelty) with evolutionary dynamics;
— developing an adequate conception of historical
transformation for evolutionary and historical processes;
— developing a unified conception of extended
evolution from molecular to cultural and knowledge
evolution
The Concept of Gene Regulatory Networks
After Eric Davidson
Niche Construction as Another Recent Development in
Evolutionary Theory
The Necessary Integration of These Two Perspectives
=> Both of these developments within evolutionary are
based on an underlying conception of complex networks
=> In both cases these networks expand hierarchically to
include additional elements.
=> This leads to two central processes of Extended
Evolution as the transformation dynamics of networks
=> Internalization
=> Externalization
Extended Evolution
Regulatory Networks and Niche Construction
from Genomes to Knowledge Systems
=> Complex Networks and Graphs
=> Invention and Innovation
=> Hierarchical Expansion of Causal Regulatory Networks,
including Social Networks
=> Causal Networks Involving Many Different Kinds of
Elements
=> Contextual Meaning
=> Developmental Evolutionary Dynamics
The Common Theoretical Core also Enables the Use of Shared
Methodologies
=> COMPUTATIONAL HISTORY OF SCIENCE
3.
Conceptual relationships
2.
Topology of research literature
1.
Historical settings & relationships
1920
1930
1940
1950
1960
Change Over Time
1970
1980
• Understanding and detecting innovation has
implications for understanding fundamentals of
knowledge, seeding and designing for discovery, and
science policy
• Advances in computational techniques provide
opportunities to study innovation at scale
• A fundamentally interdisciplinary problem, requiring
– Rigorous understanding of the historical precedent:
a case study approach
– Advanced algorithms
Outline
• Truth: Innovation in the Field of Developmental
Biology
• Graph Analysis for Collaboration Networks
• Preliminary Results: Leveraging Truth
• Open Questions and Next Steps
Outline
• Truth: Innovation in the Field of Developmental
Biology
• Graph Analysis for Collaboration Networks
• Preliminary Results: Leveraging Truth
• Open Questions and Next Steps
An Alternative Trajectory in the History of Evolutionary Biology
Darwin
Common Descent, Natural Selection, Gradualism, Open Question of Inheritance,
Developmental Considerations about the Origin of Variation
Boveri, Cell Biology &
Entwicklungsmechanik
Role of the Nucleus in Development and Heredity,
Experimental Approaches, Speculative Ideas about the
Hereditary Material as a Structured System
governing Development
Kühn, Goldschmidt &
Physiological Gene Action,
Developmental Physiological Macroevolution, Gene
Pathways
Genetics
Regulatory Evolution,
GRNs & Synthetic
Experimental Evolution
A known innovation/disruption in
Evolutionary and Developmental Biology
“Britten & Davidson 1969”
The Britten-Davidson Model Citations
Direct citations
• Clear impact on the field in
1970s
• Second wave of influence
in 200s
Secondary citations
• Clear second wave of
impact in 2000s
Citation networks used to
generate truth co-author
networks
Source of data: Web of Science
13
The Britten-Davidson Model (1969)
The Beginning of a new trajectory in Evolutionary
and Developmental Biology
A regulatory network controlling gene expression
Computational Analysis of Eric Davidson’s Investigative Pathway
A bibliographic-coupling network built from a sub-set of the
Davidson bibliographic dataset. Each node is a scientific
publication, and edges indicate which nodes share multiple
bibliographic references. The colored nodes indicate the
distribution of a “topic” from an LDA-generated topic model.
Outline
• Truth: Innovation in the Field of Developmental
Biology
• Graph Analysis for Collaboration Networks
• Preliminary Results: Leveraging Truth
• Open Questions and Next Steps
Graphs: Formal Definition
1
2
7
8
3
6
4
5
Graph G
G = (V, E)
• V = vertices (entities)
• E = edges (relationships)
1
2
3
4
5
6
7
8
1
0
1
0
0
0
0
1
1
2
1
0
1
0
0
0
0
1
3
0
1
0
1
0
0
0
1
4
0
0
1
0
1
0
0
0
5
0
0
0
1
0
1
1
1
6
0
0
0
0
1
0
1
1
7
1
0
0
0
1
1
0
1
8
1
1
1
0
1
1
1
0
Adjacency Matrix A
A(i,j) ≠ 0 if
• Edge exists between
vertex i and vertex j
17
Graphs: Representing Collaboration Networks
1
2
7
8
3
6
4
5
Graph G
G = (V, E)
• V = authors
• E = co-author relationship
1
2
3
4
5
6
7
8
1
0
1
0
0
0
0
1
1
2
1
0
1
0
0
0
0
1
3
0
1
0
1
0
0
0
1
4
0
0
1
0
1
0
0
0
5
0
0
0
1
0
1
1
1
6
0
0
0
0
1
0
1
1
7
1
0
0
0
1
1
0
1
8
1
1
1
0
1
1
1
0
Adjacency Matrix A
A(i,j) ≠ 0 if
• Author i and j have
published together
18
Brief History of Graphs
Classical Era
1700
1800
Modern Era
1970
1735
1980
1990
2000
1977
2010
1999
2011
1993
Seven Bridges
of Konigsberg
Zachary’s
Karate Club
Internet
Topology
4 Vertices
7 Edges
34 Vertices
78 Edges
88,107 Vertices
99,664 Edges
1985
1856
Les Miserables
Characters
2005
Proxy Logs
209,848 V
7,594,393 E
1998
77 Vertices
254 Edges
Enron E-mail
Corpus
2011
93,526 Vert.
344,264 Edg.
0
Hamilton’s
Game
AIDS Sex
Partners
Western U.S.
Power Grid
20 Vertices
30 Edges
40 Vert.
42 Edges
4,941 Vertices
6,594 Edges
10
20
30
40
50
60
70
80
1,000
10,000
Citation Data
42,155,602 V
736,037,092 E
100,000
1,000,000
10,000,000
Signal Detection
• The signal detection problem: Given an observation x,
determine whether H0 or H1 is true, where
H0: x was drawn from the noise distribution
H1: x also includes a signal
Topics covered in Kay, 1998
Signal
Noise
Gaussian
Known PDF
Gaussian
Unknown PDF
NonGaussian
Known PDF
NonGaussian
Unknown PDF
Deterministic
Known
Deterministic
Unknown
Random
Known PDF
Random
Unknown PDF
•
•
Using random signals in non-Gaussian noise is a much more
specialized area than typical signal detection
Working outside of vector spaces significantly complicates analysis
20
Regression on Graphs
Linear Regression
Graph Regression
21
Graph Analysis Framework
DIMENSIONALITY REDUCTION
MODEL FITTING
MATRIX
DECOMPOSITION
INTEGRATION
Input
• Graph
• No cue
COMPONENT
SELECTION
ANOMALY
DETECTION
IDENTIFICATION
Graph Analysis Framework
DIMENSIONALITY REDUCTION
MODEL FITTING
MATRIX
DECOMPOSITION
INTEGRATION
Input
• Graph
• No cue
COMPONENT
SELECTION
ANOMALY
DETECTION
Output
• Statistically anomalous
subgraph(s)
IDENTIFICATION
Graph Analysis Framework
DIMENSIONALITY REDUCTION
MODEL FITTING
MATRIX
DECOMPOSITION
INTEGRATION
Input
• Graph
• No cue
COMPONENT
SELECTION
ANOMALY
DETECTION
Output
• Statistically anomalous
subgraph(s)
IDENTIFICATION
Outline
• Truth: Innovation in the Field of Developmental
Biology
• Graph Analysis for Collaboration Networks
• Preliminary Results: Leveraging Truth
• Open Questions and Next Steps
Dimensionality Reduction
DIMENSIONALITY REDUCTION
Vertex 1
Vertex 2
…
COMPONENT
SELECTION
ANOMALY
DETECTION
Dimension 2
MATRIX
DECOMPOSITION
INTEGRATION
Dimension 1
MODEL FITTING
Vertex N
Project the data into a 2-dimensional space and compute the test statistic
IDENTIFICATION
Evolution of the Field
Evolution of the field can be
observed in 2D projection of the
co-author network
Observations as field matures:
1970: field is diverse
1979: coherence around singlegene research
Temporal Integration
MODEL FITTING
INTEGRATION
MATRIX
DECOMPOSITION
COMPONENT
SELECTION
ANOMALY
DETECTION
IDENTIFICATION
Time Series = A1 A2 A3 A4 A5 A6 A7 A8 A9
Filter = w1 w2 w3 w4 w5 w6 w7 w8 w9
A = Σ(wi*Ai)
Dynamic Integration
Monroy, A
Studied regulation
of gene expression
A = Σ(wi*Ai)
• Ramp filter
• No separation of known truth
A = Σ(wi*Ai)
• Max Eigenvalue of truth filter
• Clear separation of a key
individual
29
Dynamic Integration
Monroy, A
Studied regulation
of gene expression
A = Σ(wi*Ai)
1975 Subgraphs Truth
• Subgraph degree << background degree
• Dynamics allow detection
• Max Eigenvalue of truth filter
• Clear separation of a key
individual
30
Collaboration Model
• Agents
– Interests
– Social Awareness
• # of starting agents
• # of incoming agents per iteration
Model Generated Networks
David Haig
Real World Networks vs. Model
David Haig, Second Order Neighbors
Model Generated
So What? Science Policy.
• Rossini and Porter, 1979 – “involves a
shared, over-arching theoretical
framework which welds components into a
unit.”
• Increased interest in interdisciplinary,
international, and academia/industry
collaborations
• Focus on Innovation
Conclusions
• In application domains, from cyber security to biology, graphs
provide an effective way of studying entities and their
relationships
• Classical graph algorithms have poor scaling / efficiency
properties, while early numerical techniques are difficult to tie
to applications
• A statistical signal processing approach provides a novel
approach to large scale graph analysis
• An application of a novel algorithmic framework to elucidate
emergence of innovation in contexts of science and the
history of science
36
Conclusions
1. Innovation/Inventions in CAS are the product of a
complex interplay between internal and external
conditions (regulatory networks and niche construction)
2. The origin of variation (phenotypic of scientific) is a
consequence of changes to the (extended) complex
regulatory networks that govern CAS
3. These isomorphic properties enable a transfer of
both concepts and methods between different fields
concerned with innovation
Acknowledgments
For intellectual discussions/collaborations:
Eric Davidson
Günter Wagner
Jane Maienschein
Robert Page
Bert Hölldobler
Jürgen Renn
Doug Erwin
Colin Allen
Hans-Jörg Rheinberger
Horst Bredekamp
Olof Leimar
Sander van der Leeuw
Nadya Bliss
Graduate Student and Post-docs:
Erick Peirson
Kate MacCord
Guido Caniglia
Yawen Zhou
Lijing Jiang
Deryc Painter
Steve Elliott
Julia Damerow
Mark Ulett
Ken Aiello
For Financial Support:
National Science Foundation
Stiftung Mercator
Volkswagen Foundation
Smart Family Foundation
Max Planck Society
Wissenschaftskolleg zu Berlin
Arizona State University