M01 Presentation: Introduction File

Download Report

Transcript M01 Presentation: Introduction File

Genomic Data Manipulation
Thinking about data visually
Curtis Huttenhower
[email protected]
http://huttenhower.sph.harvard.edu/bio508
Harvard School of Public Health
Department of Biostatistics
01-27-14
The usual suspects
2
Small changes, big differences
3
Fig. 3. Comparison of Sargasso Sea scaffolds to Crenarchaeal clone 4B7.
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Only one of many ways to think
about DNA sequence data...
5
Fig. 7. Phylogenetic tree of rhodopsinlike genes in the Sargasso Sea data along
with all homologs of these genes in GenBank.
(Almost)
everything can
be clustered into
a tree, even DNA
sequences
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Aerobic, microaerobic and
anaerobic communities
But not every
tree is a
clustering
Model of microbial biomarkers
Why are
networks so
popular in
biology?
8
Fast and
Furious 6 (!?!)
Iron Man 3
Man of
Steel
Don’t be afraid
to get creative
when
representing
data!
Hunger
Thor Games
http://xach.com/moviecharts/2013.html
9
Wordles
10
Four 11-pair datasets with the same...
Anscombe's quartet
X mean, X standard deviation,
Y mean, Y standard deviation,
Correlation, and regression coefficients
μ(x)=9
σ(x)=11
μ(y)=7.5
σ(y)=4.1
ρ=0.816
y=3+0.5x
Looking at
data – it’s not
just fun, it’s
important, too!
11