Class analysis - Processing of Prokaryotic Genome and

Download Report

Transcript Class analysis - Processing of Prokaryotic Genome and

Anne de Jong
RNA-seq analysis
|
for Prokaryotes
RNA-seq
Anne de Jong
2015
1
Anne de Jong
Measuring gene expression
• What can we do with RNA-seq analysis
• Transcription Start points (TTS)
• Transcription Termination (TT)
• Operon structures (Transcription Active Regions (TARs))
• tRNAs
• rRNAs
• Discover ncRNA’s
• Gene Expression
• Here we focus on the last item: “Gene Expression”
2
Anne de Jong
Measuring gene expression
• What to do
• Grow cells and freeze (liquid Nitrogen) them at point X
• Isolate total RNA
• Optional rRNA depletion
• Library Prep (cDNA)
• Sequencing (Illumina, IonProton)
• Filter, trim, map the sequence reads to a reference genome
• Gene expression calling
All steps above can be standardized, just follow the protocols
3
Anne de Jong
Gene expression values
Starting point: Excel file with gene expression values
( RPKM/FPKM/TPM/Counts )
Rows are the features (genes)
Columns are the experiments (samples)
Tutorial
Step1:
Goto http://genome2d.molgenrug.nl
In menu RNA-seq analysis;
download the “example data set”
Open the file RPKM.txt in Excel
What do the numbers represent?
key
BSU00010
BSU00020
BSU00030
BSU00040
BSU00050
BSU00060
BSU00070
BSU00080
BSU00090
BSU00100
BSU00110
BSU00120
BSU00130
BSU00140
BSU00150
..
WT1
6376
4033
1075
5400
641
11391
17374
12
32990
13208
9353
5885
5107
992
1282
..
WT2
8682
6400
1816
8661
1045
12344
23051
20
42404
28177
9554
7064
7408
1941
2180
..
F71Y1
7756
5470
1561
7443
814
11106
20981
14
29072
22074
12377
7990
6571
1682
1624
..
F71Y2
9785
6470
1662
7906
1117
11956
22922
24
31289
25142
12362
9209
6810
1766
1805
..
R61K1
4676
3006
750
4159
425
8576
13153
10
14675
10919
9881
5836
3728
812
916
..
R61K2 R61H1 R61H2 null1 null2
8684 4161 9004 4384 6756
6108 2598 5902 2659 4535
1688
793 1808
757 1459
8573 4379 8247 4412 7312
970
437
969
422
891
11860 8368 12462 8715 10157
21162 12294 22524 12245 17628
26
7
20
17
15
28162 10290 18064 7936 10371
23351 9161 24268 9958 20054
13691 9320 13896 10307 12115
9562 5496 9534 6200 8691
6851 3449 7262 3900 5950
1679
752 1709
712 1342
1897
839 1721
831 1439
..
..
..
..
..
4
Anne de Jong
The factors
• The factors describe the experiment
• What are the replicates
• What is the biological meaning
• Multiple factors possible
Factor-1
experiment
WT1
WT2
F71Y1
F71Y2
R61K1
R61K2
R61H1
R61H2
null1
null2
strain
WT
WT
A_F71Y
A_F71Y
B_R61K
B_R61K
C_R61H
C_R61H
Null
Null
Factor-2
type
WT
WT
Mutant
Mutant
Mutant
Mutant
Mutant
Mutant
knock-out
knock-out
5
Tutorial
Step 2:
In this example we only use Factor-1. Open Factos.txt in Excel
What do these Factors mean?
Anne de Jong
Contrasts
• The factors describe the data, next step is to ask questions
• Which genes are differential expressed between WT and one or more mutants?
• Is there a global effect?
• Which mutants are highly correlated?
• To answer these questions the contrasts needs to be defined
A_F71Y-WT
B_R61K-WT
C_R61H-WT
null-WT
• In this example all samples are compared to the WT
Factors file
Tutorial
Step 3:
Open the file Contrasts.txt in Excel
Make a Contrasts file if you use Factor-2 (type) instead of Factor-1 [see previous slide]
experiment
WT1
WT2
F71Y1
F71Y2
R61K1
R61K2
R61H1
R61H2
null1
null2
strain
WT
WT
A_F71Y
A_F71Y
B_R61K
B_R61K
C_R61H
C_R61H
Null
Null
type
WT
WT
Mutant
Mutant
Mutant
Mutant
Mutant 6
Mutant
knock-out
knock-out
Anne de Jong
Classes
• Adding literature data to the analyses
• One way is to define groups of genes/proteins that have a biological relation
• Metabolic pathway; KEGG
• Related protein domains; e.g. ABC transporters
• Regulons
• Related processes; e.g. sporulation
• Any defined group of genes is possible
Class file
• These groups of genes are called Classes
BSU00490 green
Tutorial
Step 4:
Open the file Classes.txt in Excel
Define your own class for at least 20 genes
e.g. the best hits found by Brinsbane
BSU01650
BSU01660
BSU01670
BSU01680
etc…
BSU03981
BSU03982
BSU03990
BSU04160
BSU04470
etc…
green
green
green
green
…
red
red
red
red
red
..
CodY
CodY
CodY
CodY
CodY
…
CcpA
CcpA
CcpA
CcpA
CcpA
..
7
Anne de Jong
Overview
• Now we have 4 files
Gene expression file
key
BSU00010
BSU00020
BSU00030
BSU00040
BSU00050
BSU00060
BSU00070
BSU00080
BSU00090
BSU00100
BSU00110
BSU00120
BSU00130
BSU00140
BSU00150
..
WT1
6376
4033
1075
5400
641
11391
17374
12
32990
13208
9353
5885
5107
992
1282
..
WT2
8682
6400
1816
8661
1045
12344
23051
20
42404
28177
9554
7064
7408
1941
2180
..
F71Y1
7756
5470
1561
7443
814
11106
20981
14
29072
22074
12377
7990
6571
1682
1624
..
F71Y2
9785
6470
1662
7906
1117
11956
22922
24
31289
25142
12362
9209
6810
1766
1805
..
R61K1 R61K2 R61H1
4676 8684 4161
3006 6108 2598
750 1688
793
4159 8573 4379
425
970
437
8576 11860 8368
13153 21162 12294
10
26
7
14675 28162 10290
A_F71Y-WT
10919
23351 9161
B_R61K-WT
9881 13691 9320
C_R61H-WT
5836
9562 5496
null-WT
3728 6851 3449
812 1679
752
916 1897
839
..
..
..
R61H2 null1 null2
9004 4384 6756
5902 2659 4535
1808
757 1459
8247 4412 7312
969
422
891
12462 8715 10157
22524 12245 17628
20
17
15
18064 7936 10371
experiment
24268 9958
20054
13896 10307
WT1 12115
9534 6200
WT2 8691
7262 3900
F71Y1 5950
1709 F71Y2
712 1342
1721 R61K1
831 1439
.. R61K2
..
..
R61H1
R61H2
null1
null2
Contrasts file
Factors file
strain
WT
WT
A_F71Y
A_F71Y
B_R61K
B_R61K
C_R61H
C_R61H
Null
Null
Tutorial
Step 4:
Open the file Classes.txt in Excel
Define two or more classes for at least 10 genes in total
Class file
BSU00490
type
BSU01650
WT
BSU01660
WT
Mutant BSU01670
Mutant BSU01680
Mutant etc…
Mutant BSU03981
Mutant BSU03982
Mutant BSU03990
knock-outBSU04160
knock-outBSU04470
etc…
green
green
green
green
green
…
red
red
red
red
red
..
CodY
CodY
CodY
CodY
CodY
…
CcpA
CcpA
CcpA
CcpA
CcpA
..
8
RPKMs Factors Contrasts Class
Project name
User input
RNA-seq Analysis Pipeline (Genome2D webserver or R-script)
RESULTS
Global Analysis
Normalization
Library Sizes
PCA/MDS
Tables
Tab delimited
Html formatted
Contrasts
Experiment Analysis Class Analysis
Differential Expression Correlation Matrix
Correlation Matrices
Volcano Plots
Heatmap of Experiments Mean Signal Plots
MA Plots
K-means Clustering
Heatmaps of
Heatmaps
• Top Hits
• Signals
• Class Groups
K-means Clustering
Downstream Analysis
• Functional Analysis on the
Genome2D webserver
• TIGR Multi Experiment Viewer
• Etc..
Anne de Jong
Flow chart of the Analysis
Anne de Jong
Performing a RNA-seq analysis
• The pipeline is available as R-script or as webserver
• The R-script allows modification of settings and parameters
• The webserver is parameter free
• parameters are predefined, will be calculated or estimated on the fly
Tutorial
Step 6:
1. Open the webserver http://genome2d.molgenrug.nl
2. Goto to RNA-seq analysis and download the example data set
3. Subsequently, upload these four files for analysis
4. Give the project a logical (short) name
5. Press start run and wait 1-2 min for the results
10
Anne de Jong
Mining the results
• The results are divided in 5 sections
•
•
•
•
•
•
Global analysis
Contrasts analysis
Experiment analysis
Class analysis
Data tables
Functional analysis
Tutorial
Step 7: Global analysis
1. For this RNA-seq experiment we asked for at least 4M (Million) reads per experiment. Did all
samples passed this criteria?
2. Which sample duplicates showed the lowest dispersion
11
Anne de Jong
Mining the results
• The results are divided in 5 sections
•
•
•
•
•
•
Global analysis
Contrasts analysis
Experiment analysis
Class analysis
Data tables
Functional analysis
Tutorial
Step 8: Contrasts analysis
1. Which CodY mutant showed the lowest number of significant changed genes?
2. What is the highest fold change of a gene when the Wild Type was compared to the knock-out
3. Volcano plots are used to visualize Fold change and there cognate p-value. Open a volcano plot
and write a good legend for this Figure.
4. On the left side of Heatmaps of TopHits, you see a Dendrogram. What is the meaning of the length
of lines in a Dendrogram?
12
Anne de Jong
Mining the results
• The results are divided in 5 sections
•
•
•
•
•
•
Global analysis
Contrasts analysis
Experiment analysis
Class analysis
Data tables
Functional analysis
Tutorial
Step 9: Experiment analysis
1. Correlation matrix of experiments is a visualization method to show the overall Pearson’s correlation
between experiments. Write a legend for his Figure and include a description what the shades of
blue represent.
2. K-means clustering groups genes having a good correlation over multiple experiments. The
Threshold of separation groups is always arbitrary, which k-means groups could optionally be
merged to one group?
13
Anne de Jong
Mining the results
• The results are divided in 5 sections
•
•
•
•
•
•
Global analysis
Contrasts analysis
Experiment analysis
Class analysis
Data tables
Functional analysis
Tutorial
Step 10: Class analysis
1. ‘Correlation matrix of Classes’ gives a quick view on the behavior of Class members (genes) over
the various experiments. What do the colors in these matrices mean?
14
Anne de Jong
Mining the results
• The results are divided in 5 sections
•
•
•
•
•
•
Global analysis
Contrasts analysis
Experiment analysis
Class analysis
Data tables
Functional analysis
Tutorial
Step 11: Data tables
1. The data that is produced and used by the pipeline to draw graphs can be used for further analysis.
Such as the popular freeware programs TMEV and Cytoscape
2. The file ‘Edge list for a gene network of Contrasts’ is compatible with Cytoscape but will not be
discussed further.
3. Save the file ‘TIGR MEV TopHits log2FC’ for TMEV
4. Download MeV: http://www.tm4.org/mev.html
15
Anne de Jong
MeV; Multi experiment Viewer
Tutorial
Step 11: Using MeV
1. Start MeV and load the file ‘TIGR MEV TopHits log2FC’ as dual channel data (because this is ratio data)
2. Deselect “Load Annotation”
3. Press load and now the data is imported and ready to analyze using MeV
4. Optional: try to do a k-means clustering, here you have to estimate the number of clusters yourself
16
Anne de Jong
Functional Analysis
Tutorial
Step 12: Perform a functional analysis on the TopHits of one or multiple Contrasts
1. Change the ‘Current active genome’ to your genome of interest
2. Upload a list of locus tags to analyze
3. Examine the results and describe shortly your findings/conclusion