Transcript Document
Introduction to the
Tsinghua University
ENCODE Journal Club
Monica C. Sleumer (苏漠)
2012-09-24
Tsinghua ENCODE Journal Club Objectives
• Read and discuss all 31 ENCODE papers
• Discuss the 13 “Threads” in the ENCODE explorer
• Discuss the overall meaning of the ENCODE project
– Media reactions
• Understand how to apply ENCODE findings to our
own research
• Generate a long-term repository for our findings
on our journal club website:
bioinfo.au.tsinghua.edu.cn/encode/
Human Genome
•
•
•
•
•
3,101,804,739 base pairs
22 chromosomes plus X and Y
21,224 protein-coding genes
15,952 ncRNA genes
3–8% of bases are under selection
– From comparative genomic studies
• Question: What is the genome doing?
ENCODE Project Objectives
• Find all functional elements
–
–
–
–
Bound by specific proteins
Transcribed
Histone modifications
DNA methylation
• Use this information to annotate functional regions
–
–
–
–
–
–
–
Genes (coding and non-coding)
Promoters
Enhancers
Specific transcription factor binding sites
Silencers
Insulators
Chromatin states
• Cross-reference data from other studies
– Comparative genomics
– 1000 Genomes Project
– Genome-wide association studies (GWAS)
Different
combination in
each cell type
ENCODE projects
•
•
•
•
ENCODE pilot project: 1% of the genome 2003-2007
modENCODE: Drosophila and C. elegans
Mouse ENCODE in progress?
ENCODE main project 2007-2012
–
–
–
–
–
1649 dataset-generating experiments
147 cell types
235 antibodies and assay protocols
450 authors
32 institutes
• 31 publications 2012-09-06
–
–
–
–
6 in Nature – all discussed on 2012-09-19
18 in Genome Research
6 in Genome Biology – one of these discussed today
1 in BMC Genetics
www.nature.com/encode/category/research-papers
Materials
• 147 types of human cell lines, 3 priority levels
• Tier 1 cell lines: top priority for all experiments
Name
Description
Lineage
Tissue
Karyotype
GM12878
B-lymphocyte, lymphoblastoid,
Epstein-Barr Virus,
mesoderm
1000 Genomes Project
blood
normal
H1-hESC
embryonic stem cells
inner cell
mass
embryonic stem cell
normal
K562
leukemia, 53-year-old female
with chronic myelogenous
leukemia
mesoderm
blood
cancer
• Tier 2 cell lines to be done after Tier 1 (next slide)
• Tier 3: any other cell lines
Tier 2 Cell Lines
Name
Description
Lineage
lung carcinoma epithelium, 58-yearendoderm
old caucasian male
donor B cells: RO01778 and
CD20+
mesoderm
RO01794
CD20+_RO01778 B cells, caucasian
mesoderm
CD20+_RO01794 B cells, African American
mesoderm
neurons derived from H1
H1-neurons
ectoderm
embryonic stem cells
HeLa-S3
cervical carcinoma
ectoderm
HepG2
hepatocellular carcinoma
endoderm
HUVEC
umbilical vein endothelial cells
mesoderm
IMR90
fetal lung fibroblasts
endoderm
skeletal myoblasts from pectoralis
LHCN-M2
mesoderm
major muscle, 41 year old caucasian
MCF-7
mammary gland, adenocarcinoma ectoderm
MonocytesMonocytes-CD14+, leukapheresis
mesoderm
CD14+
from RO 01746 and RO 01826
SK-N-SH
neuroblastoma, 4 year old
ectoderm
A549
Tissue
Karyotype
epithelium
cancer
blood
normal
blood
blood
normal
normal
neurons
normal
cervix
liver
blood vessel
lung
skeletal muscle
myoblast
breast
cancer
cancer
normal
normal
monocytes
normal
brain
cancer
http://encodeproject.org/ENCODE/cellTypes.html
cancer
Methods
RNA-Seq
Different fractions of RNA -> sequencing
CAGE
5’ Capped RNA sequencing
RNA-PET
Sequencing 5’ Cap plus poly-A tail
ChIP-seq
Chromatin immunoprecipitation of a DNA binding
protein -> sequencing
DNase-seq
Cut exposed DNA with DNase I -> sequencing
FAIRE-seq
Nucleosome-depleted DNA -> sequencing
RRBS
Bisulphite treatment: unmethylated C->U ->
sequencing
3C,5C,
ChIA-PET
Chromatin interactions -> sequencing
Wu Dingming
2012-09-19
Ma Xiaopeng
2012-09-19
Guo Weilong
He Chao
2012-09-19
Li Yanjian
2012-09-19
• All methods (DNA or RNA sequencing) can be traced back to a genomic location
• Findings vary between cell types
Primary Findings
• 80.4% of the human genome is doing at least one of the
following:
– Bound by a transcription factor
– Transcribed
– Modified histone
• 99% is within 1.7 kb of at least one of the biochemical events
• 95% within 8 kb of a DNA–protein interaction or DNase I
footprint
• 7 chromatin states:
– 399,124 enhancer-like regions
– 70,292 promoter-like regions
• Correlation between transcription, chromatin marks, and TF
binding
• Functional regions contain lots of SNPs
– Disease-associated SNPs in non-coding regions tend to be in
functional elements
Applications
• Visible as genome tracks in UCSC
• Gene or pathway of interest
• Mutation from
– Cancer sequencing
– Genome-wide association studies
– Find out what that part of the genome is doing
• Compare with your cancer data (RNA-seq)
• Comparative genome analysis
Online Resources
• Interactive app on Nature ENCODE main page
www.nature.com/encode/
• Journal club website: bioinfo.au.tsinghua.edu.cn/encode/
bioinfo.au.tsinghua.edu.cn/encode/
Next ENCODE Journal Club Meeting
Suggested meeting day:
Thursday (周四) 2012-10-11
LIANG Zhengyu?
One more volunteer speaker needed