An Introduction to Monte Carlo Methods

Download Report

Transcript An Introduction to Monte Carlo Methods

C-DEM: A Multi-Modal Query System
for Drosophila Embryo Databases
Fan Guo, Lei Li, Eric Xing, Christos Faloutsos
Carnegie Mellon University
{fanguo, leili, epxing, [email protected]}
http://www.db.cs.cmu.edu:8080/cdem/demo.html
1
Background
• Fruit-fly development in genetic study:
– Genes controlling the body plan and patterning organs are
similar to higher animals including human.
• Objective: a framework for applying data mining
techniques to assist biological research.
2
The Graph Representation
Images
Genes
Keywords
embryonic hindgut
• Image-layer edges: nearest neighbors in feature space
3
Proximity Measure
• Random Walk with Restart
– Starting from a node s;
– Randomly walk to a neighbor,
with probability 1-c;
– Restart at s, with probability c;
– Compute the steady-state
probability vector.
– Complexity:
O(E), but faster methods exist (Tong et al., ICDM’06)
4
Proximity Measure
• Random Walk with Restart
– Starting from a node s
– Randomly walk to a neighbor, with probability 1-c
– Restart at s, with probability c
Proximity Measure
• Computing the Steady-State Probability
Desired probability
vector
Adjacency matrix
Vector w/ non-zero
entry for restart nodes
Complexity: O(E), but faster methods exist (Tong et al., ICDM’06)
Multi-Modal Query Results
2D Expression
Images
Annotation
Terms
Genes
7
More Mining Tasks
• Image Auto-Caption
• Gene function identification
8
Related Work
• Berkeley Drosophila Genome Project (www.fruitfly.org)
• FlyExpress (www.flyexpress.net)
• Berkeley Drosophila Transcription Network Project
(bdtnp.lbl.gov)
9
System Architecture
Browser-based UI
HTTP
Queries
Result Pages
Tomcat Web Server
JSP Application
RMI
Remote
Function Calls
Results
Computing Engine
10