Transcript ENCODE2012

ENCODE 2012
• The Human Genome project sequenced “the
human genome”
• “the human genome” that we have labeled as
such doesn’t actually exist
• What we call the human genome sequence is
really just a reference
• Furthermore, the current reference genome
sequence is haploid
Whose genome did Celera sequence?
Supposedly:
African-American Asian-Chinese
Hispanic-Mexican
Caucasian
Caucasian
Actually:
Celera’s genome is Craig Venter’s
Science v. 291, pp 1304-1351
• Every time an individual cell divides, new
mutations arise; no two cells even within any
individual have the identical sequence.
ENCODE
• The Encyclopedia of DNA Elements (ENCODE)
is a public research consortium initiated by
the US National Human Genome Research
Institute (NHGRI) in September 2003.
• The goal is to find all functional elements in
the human genome.
• All data generated in the course of the project
will be released “rapidly” into public
databases.
• Pilot phase – 2003-2007 – method evaluation
– 1% of genome
• Production phase 2007-2012
–
–
–
–
September 2012 – 30 papers published
442 scientists
31 labs
147 different types of cells with 24 types of
experiments
– 1,642 experiments
– Data released
• Identification and quantification of RNA
species in cells and subcellular compartments
• Mapping of noncoding and protein-coding
genes
• Delineation of chromatin and DNA
accessibility
• Mapping of histone modifications and
transcription factor-binding sites
• Measurement of DNA methylation
Credits: Darryl Leja (NHGRI), Ian Dunham (EBI)
What did they find?
• Controversy!
• Assigned biochemical functions to over 80% of
the genome.
•
•
•
•
Junk DNA or no?
What is a biochemical function?
“a reproducible biochemical signature”
“millions of switches”
• The vast majority (80.4%) of the human genome participates in at
least one biochemical RNA- and/or chromatin-associated event in at
least one cell type.
• Primate-specific elements as well as elements without detectable
mammalian constraint show, in aggregate, evidence of negative
selection; thus, some of them are expected to be functional.
• Classifying the genome into seven chromatin states indicates an
initial set of 399,124 regions with enhancer-like features and 70,292
regions with promoter-like features, as well as hundreds of
thousands of quiescent regions.
• It is possible to correlate quantitatively RNA sequence production
and processing with both chromatin marks and transcription factor
binding at promoters, indicating that promoter functionality can
explain most of the variation in RNA expression.
• Many non-coding variants in individual genome
sequences lie in ENCODE-annotated functional
regions; this number is at least as large as those
that lie in protein-coding genes.
• Single nucleotide polymorphisms (SNPs)
associated with disease by GWAS are enriched
within non-coding functional elements, with a
majority residing in or near ENCODE-defined
regions that are outside of protein-coding genes.
In many cases, the disease phenotypes can be
associated with a specific cell type or
transcription factor.
Changing how we view a gene?
• Genes should be defined by transcripts.
• Transcripts are the basic unit that’s affected by
mutation and selection.
• A “gene” then becomes a collection of
transcripts, united by some common factor.
• Another related challenge is understanding the genome’s threedimensional shape. Far from being arranged in a line, chromosomes
are folded in fantastically complicated fractal patterns, and these
topographies appear to shape network interaction.
• “Every gene is surrounded by an ocean of regulatory elements.
They’re everywhere. There are only 25,000 genes, and probably
more than 1 million regulatory elements,” said Job Dekker, a
molecular biophysicist at the University of Massachusetts Medical
School who worked on ENCODE’s structural descriptions of the
genome.
• He continued, “It’s not just one gene touching one regulator. It can
touch and interact with a whole collection of them. It must involve
a very complicated three-dimensional structure. At this scale,
chromosomes topography turns out to be incredibly dynamic,
complex and cell type-specific.”
• http://selab.janelia.org/people/eddys/blog/?p
=683
• http://arstechnica.com/staff/2012/09/mostof-what-you-read-was-wrong-how-pressreleases-rewrote-scientific-history/
• http://blogs.discovermagazine.com/notrocket
science/2012/09/05/encode-the-rough-guideto-the-human-genome/#ENCODEgene
• http://www.nature.com/news/encode-thehuman-encyclopaedia-1.11312
• http://www.nature.com/nature/journal/v489/
n7414/full/nature11247.html