Failures in biological interpretation (pptx)

Download Report

Transcript Failures in biological interpretation (pptx)

Failures in interpretation of
sequencing results
v1.0
Laura Biggins
Interpretation
Technical
Tracking
Interpreting results
Library
QC and visualisation still
important
Contamination
Biological
Interpretation
Easy to draw wrong
conclusions from the data
Interpretation
RNA-seq data
Interpretation exercise
Interpretation
Sample 1 genome view
Interpretation
Sample 2 genome view
Interpretation
Genes upregulated in
sample 1
Interpretation
Genes downregulated in
sample 1
Interpretation
Interpretation exercise
Interpretation
Functional Enrichment
Analysis
• Gene ontology analysis
• Pathways analysis
• Any set of predefined functional categories
with genes assigned to the categories
• Enrichment test – gene list vs background
• Useful and powerful but easy to produce false
positives
Interpretation
Location bias
GO analysis of all genes on chromosome (mouse)
Chr
Chr 1
Chr 2
Chr 3
Chr 4
Chr 5
Chr 6
Chr 7
Chr 8
Chr 9
Chr 10
Chr 11
Chr 12
Chr 13
Chr 14
Chr 15
Chr 16
Chr 17
Chr 18
Chr 19
Chr X
Category
GO:0050662~coenzyme binding
GO:0007608~sensory perception of smell
GO:0005509~calcium ion binding
GO:0009615~response to virus
GO:0001730~2'-5'-oligoadenylate synthetase activity
GO:0005529~sugar binding
GO:0004984~olfactory receptor activity
GO:0042742~defense response to bacterium
GO:0007608~sensory perception of smell
GO:0008227~amine receptor activity
GO:0045111~intermediate filament cytoskeleton
GO:0034097~response to cytokine stimulus
GO:0000786~nucleosome
GO:0004522~pancreatic ribonuclease activity
GO:0045095~keratin filament
GO:0004869~cysteine-type endopeptidase inhibitor activity
GO:0042611~MHC protein complex
GO:0007156~homophilic cell adhesion
GO:0005506~iron ion binding
GO:0045449~regulation of transcription
BH adj p-value
1.61E-02
1.54E-64
2.32E-01
2.58E-07
2.44E-04
1.54E-25
1.08E-30
2.36E-16
2.68E-12
3.24E-05
1.79E-23
2.59E-04
2.47E-17
2.09E-22
1.28E-17
8.62E-08
5.39E-19
1.01E-26
2.12E-07
6.88E-04
Michael Reik
Interpretation
Mapping to genome
multi-mapping
GO enrichment for “differentially expressed” genes
- ribosomal categories (p < 1E-20)
- histones & chromatin assembly (p < 1E-7)
Michael Reik
Interpretation
GC content
Interpretation
GC content
GOrilla analysis using mouse genes with GC content > 60% (~170 genes)
Interpretation
GC content
GOrilla analysis using mouse genes with GC content < 35% (~200 genes)
Interpretation
Public RNA-seq
• Public RNA-seq data from a range of mouse
tissues
• Called differentially expressed genes between
replicates within datasets
• Gene lists enriched in functional categories:
–
–
–
–
–
Ribosome
Extracellular
Secreted
Glycoprotein
Myofibril, cytoskeleton
Interpretation
GO analysis of
genes that
appeared in > 5
datasets
Extracellular,
glycoprotein
categories absent –
large, diverse
categories
Public RNA-seq
Interpretation
Membrane associated transcripts
Re-sequencing library = same result
Remaking library from tissue = changes gone
Interpretation
Misbehaving genes
Sfi1 – spindle associated transcript
Interpretation
•
•
•
•
Misbehaving genes
Titin, USH2A (big!)
Mucin, Mid1, Sfi1 (duplication events)
Olfactory receptors (big families)
Poorly annotated (RIKEN, EST, Gm123,RP11
etc)
Interpretation
• RNA-seq
– Gene/transcript length
– Expression level
• Bisulphite-seq
– CpG content
Power
Interpretation
Confounding factors