M01 Presentation: Introduction File

Download Report

Transcript M01 Presentation: Introduction File

Genomic Data Manipulation
Thinking about data visually
Curtis Huttenhower
[email protected]
Xochitl Morgan
[email protected]
http://huttenhower.sph.harvard.edu/bio508
Harvard School of Public Health
Department of Biostatistics
The usual suspects
2
Small changes, big differences
3
Fig. 3. Comparison of Sargasso Sea scaffolds to Crenarchaeal clone 4B7.
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Only one of many ways to think
about DNA sequence data...
5
Fig. 7. Phylogenetic tree of rhodopsinlike genes in the Sargasso Sea data along
with all homologs of these genes in GenBank.
(Almost)
everything can
be clustered into
a tree, even DNA
sequences
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Aerobic, microaerobic and
anaerobic communities
But not every
tree is a
clustering
Model of microbial biomarkers
Why are
networks so
popular in
biology?
8
Wordles
Don’t be afraid to
get creative when
representing data!
9
Don’t be afraid to
get creative when
representing data!
Godzilla
Lego Movie
Captain
America
SpiderMan 2
X-Men:
DFP
Guardians
of the Galaxy
Transformers:
AE
Mockingjay
http://xach.com/moviecharts/2014.html
10
Four 11-pair datasets with the same...
Anscombe's quartet
X mean, X standard deviation,
Y mean, Y standard deviation,
Correlation, and regression coefficients
μ(x)=9
σ(x)=11
μ(y)=7.5
σ(y)=4.1
ρ=0.816
y=3+0.5x
Looking at
data – it’s not
just fun, it’s
important, too!
11