Transcript Document

Exercise 1: Importing Illumina data
 Using the Import tool
•
•
•
•
•
•
File / Import folder. Select the folder IlluminaTeratospermiaHuman6v1_BS1
In the Import files -window choose the action “Use import tool" and click OK
Click the Mark title row –button and click on the title row of the data file. Click
Next.
Click the Identifier –button and click on the TargetID column.
Click the Sample –button and click on the AVG column.
Click Finish
 Alternative: Importing a whole BeadStudio data file directly
•
•
File / Import files. Select the file IlluminaForLumiHuman6v1_BS1.tsv
In the Import files -window choose the action "Import directly" and click OK.
This way the file is imported as it is.
Exercise 2: Normalizing Illumina data
 Using the IlluminaTeratospermiaHuman6v1_BS1 dataset (separate files)
•
•
•
•
•
In the workflow view, double click on the box ”13 files” to select all of them
In the analysis tool section, choose Normalization and Illumina
Click Show parameters and set the chiptype to Human-6v1
Click Run
Repeat the run using the same chiptype, but setting the normalize.chips to none.
 Using the file IlluminaForLumiHuman6v1_BS1.tsv (one whole BS file)
•
•
•
•
•
Select the file IlluminaForLumiHuman6v1_BS1.tsv
Choose Normalization and Illumina – lumi pipeline
Click Show parameters and set the chiptype to Human-6v1
Click Run
Repeat the run using the same chiptype, but setting the normalize.chips to none.
Exercise 3: Describe the experiment
 Using the IlluminaTeratospermiaHuman6v1_BS1 dataset (separate files)
•
•
Double click the phenodata file
In the phenodata editor, enter 1 in the group column for the control samples and 2
for the affected samples
 Using the file IlluminaForLumiHuman6v1_BS1.tsv (one whole BS file)
•
•
Double click the phenodata file
In the phenodata editor, click on the original name –column to sort the samples. In
the group column mark the replicates with the same number (1, 2 and 3)
Exercise 4: Illumina quality control
 Using the IlluminaTeratospermiaHuman6v1_BS1 dataset
•
•
Run the tools Statistics / NMDS and Visualization / Dendrogram for both the
normalized and the ”mock-normalized” data files
View the result files side by side (use the Detach button)
 Using the IlluminaForLumiHuman6v1_BS1.tsv dataset
•
As above
Exercise 5: Filtering
 Select the normalized data and play with different filters
•
•
•
Preprocessing / Filter by SD
Preprocessing / Filter by CV
Preprocessing / Filter by IQR
Exercise 6: Statistical testing
 t-test
•
•
Select the sd-filter.tsv of the teratospermia dataset
Run Statistics / Two group test using the method t-test
 Empirical Bayes
•
•
•
•
Select the normalized.tsv of the teratospermia dataset
Run Statistics / Two group test using the method empirical Bayes and turning
the P-value adjustment off
Run Preprocessing / Filter by SD on the result file two-group.tsv
Run Statistics / Adjust P-values on the result file sd-filter.tsv (you have to
specify the P-value column in the parameters)
 Compare the results using the Venn diagram
 Save the analysis session
•
File / save session
Exercise 7: Linear modelling - taking
several covariants into account at the same
 Use a kidney cancer dataset of 17 samples
•
•
•
Start a new session
File / Import folder, select the folder AffyNormalized and Import directly
Right-click the normalized.tsv and link it to the phenodata.tsv. Look what
columns you have in the phenodata.
 Linear modelling
•
•
Select the normalized.tsv and Statistics / Linear modelling. Set group, kidney
side and gender as the three main effects. Set donor as the pairing information.
Select the result file pvalues.tsv and run the tool Utilities / Extract genes using
a P-value for all the main effect P-value columns (= three times)
 Save the session
Exercise 8: Clustering
 Open your Illumina session
 Hierarchical clustering
•
•
•
•
Select the adjust-pvalues.tsv
Run Clustering / Hierarchical with default parameters.
Repeat the run using bootstrapping: Set the resampling parameter to bootstrap
and number of replicates to 10.
How reliable are the branches?
 K-means clustering
•
•
•
•
•
Select the adjust-pvalues.tsv
Run the tool ”K-means – estimate K”
Run K-means clustering setting the parameter number of clusters according to
your estimated K.
View the clusters using the visualization method Expression profiles
Extract the genes from cluster 1 using Utilities / Extract genes from clustering
Exercise 9: Annotation
 Annotate genes
•
•
•
•
Select the file adjust-pvalues.tsv
Run Annotation / Illumina gene list
Open the result file annotations.html and click the links in the gene and
pathway columns to read more about one of the genes
Open the result file annotations.tsv and sort it by the pathway column. Slide the
pathway column next to the description column and make it wider
Exercise 10: Pathway analysis
 Gene enrichment analysis
•
•
•
•
Select the file adjust-pvalues.tsv
Run Pathways / Hypergeometric test for KEGG
Are any KEGG pathways enriched in your list of differentially expressed genes?
Using the file annotations.tsv, figure out what are the genes that contributed to
the top pathway
 Gene set test
•
•
Select the file normalized.tsv
Run Pathways / Gene set test and set the parameter pathways.or.genelist to
KEGG.
Exercise 11: Promoter analysis
 Pattern discovery: do the promoters of similarly expressed genes
share a sequence motif?
•
•
•
Select the file extract.tsv containing the genes from cluster 1
Run Promoter analysis / Weeder. What is the most interesting motif? Check in
the matrix (Best occs) what positions are most conserved.
Run Promoter analysis / Cosmo. As judged by the sequence logo, do you find
similar motifs?
Exercise 12: Saving and running a workflow
 Save a workflow
•
•
Prune your teratospermia dataset workflow if necessary
Select the file normalized.tsv and click on the Workflow / Save starting from
selected. Give your workflow a meaningful name and save it.
 Run a workflow
•
•
Open the session called sessionIlluminaTeratospermia.cs
Select the file normalized.tsv and Workflow / Run recent