Talk by Dr Paul Lewis - 17/12/03

Download Report

Transcript Talk by Dr Paul Lewis - 17/12/03

Dr Paul Lewis
• Lecturer in Bioinformatics
• Cardiff University
• Biostatistics & Bioinformatics Unit
Biostatistics & Bioinformatics Unit
(BBU)
• Bioinformatics resource for Institutions across Wales
• Backing of the Higher Education Funding Council for Wales
- £1.5 million grant through the Research Capacity Development Fund
• 13 new posts in statistics & bioinformatics
• UWCM, Cardiff University, Aberystwyth
• MSc/Postgraduate Diploma/Postgraduate Certificate:
 Bioinformatics
 Genetic Epidemiology and Bioinformatics
• Brief Overview of Microarray Bioinformatics
• Introduce My Microarray Research Interests
• My Microarray Analysis Software
Bioinformatics in Microarray Experiment
Differential
Gene
Expression
Experimental
Design
Pattern
Discovery
Hybridisation
Class
Prediction
Data
Normalisation
Annotation
Normalization
Remove non-biological influences on data (systematic variation)
3 categories of Normalisation
• Normalisation
– transform data to make more like a normal distribution
log, lowess, linlog
• Standardisation – expand or contract distribution so data from
different experiments can be compared
calculate Z-scores
• Centralisation
– move distribution so its centered around expected mean
mean / median / mean trimmed centering
Bioinformatics in Microarray Experiment
Differential
Gene
Expression
Experimental
Design
Pattern
Discovery
Hybridisation
Class
Prediction
Data
Normalisation
Annotation
Find Differentially Expressed Genes
Is fold change significant?
With Replicates
Parametric tests
t-test (ANOVA)
Bayesian t-test
Mixture modelling & bootstrapping (SAM)
Regression modelling
J. Comput. Biol. 2000 7: 817-838
Bioinformatics 2001 17: 509-519.
P.N.A.S. 2001 98: 5116-5121
Genome Res. 2001 11: 1227-1236.
All give similar results but SAM reduces false positives
Non Parametric Tests
Wilcoxon rank sum test
Non-parametric t-test
Ideal discriminator method
low false positive rate but less power
Bioinformatics 2002 18: 1454-1461
Bioinformatics 2002 18: 1454-1461
Bioinformatics 2002 18: 1454-1461
Bioinformatics in Microarray Experiment
Differential
Gene
Expression
Experimental
Design
Pattern
Discovery
Hybridisation
Class
Prediction
Data
Normalisation
Annotation
Pattern Discovery & Class Prediction
Explore how genes or samples group:
Clustering
Hierarchical Cluster Analysis
K-Means
Self Organising Maps (SOM)
Fuzzy ART
Principal Components Analysis (PCA)
Multidimensional Scaling (MDS)
Correspondence Analysis (CoA)
Assign genes to known groupings:
Classification
logistic regression
neural networks
linear discriminant analysis
HIERARCHY
PARTITION
REDUCTION
Hierarchical Cluster Analysis
Partitioning Clustering Methods
K-Means & SOM
• Need To Tell Methods Number of Clusters
• Genes Partitioned into Clusters
• What are Relationships Between Clusters?
2D & 3D Mapping Methods
Data Projected onto 2 or 3 Dimensions
CoA
MDS
But….What
are Cluster
Boundaries?
PCA
Bioinformatics in Microarray Experiment
Differential
Gene
Expression
Experimental
Design
Pattern
Discovery
Hybridisation
Class
Prediction
Data
Normalisation
Annotation
Annotation
Online Tools:
ARROGANT
DAVID
DRAGON
EASE
FANTOM
GoMiner
MatchMiner
Onto-Express
RESOURCERER
Affymetrix GO
http://lethargy.swmed.edu/
http://apps1.niaid.nih.gov/david/
http://207.123.190.10/dragon.htm
http://apps1.niaid.nih.gov/david/
http://www.gsc.riken.go.jp/e/FANTOM/
http://discover.nci.nih.gov/gominer/
http://discover.nci.nih.gov/matchminer/
http://vortex.cs.wayne.edu/Projects.html
http://pga.tigr.org/tigr-scripts/magic/r1.pl
http://www.affymetrix.com
Databases:
Gene Ontology
OMIM
LocusLink
UniGene
LocusLink
http://www.geneontology.org/
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
http://www.ncbi.nlm.nih.gov/LocusLink/
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
http://www.ncbi.nlm.nih.gov/LocusLink/
My Research Interests
Pattern Discovery
Take - 2D & 3D Mapping Methods
Methods - Define Cluster Boundaries
Make FUZZY
Algorithm Development
2D & 3D Visualisation Tools
EAS-I
Biologist-Friendly Software Tools
Cluster Boundaries
MDS
CoA
PCA
Fuzzy Clustering
• Differs to standard clust by assigning membership of a gene to all clusters
• Allows you to see the association of each gene within a cluster
• Can calculate the number of clusters in Partitioning methods (Fuzzy ART)
• Helps Combine Clusters
• Helps to clear Ambiguity
Fuzzy Mapping
Add Membership values of each gene to clusters
Fuzzy Partitioning
K-Means & SOM
Microarray Pattern
Discovery
• Need for Comprehensive Pattern
Discovery Software Suite
• Fuzzy Data Analysis Suite
• Visualisation Tools to explore data
• Easy to use
• Free
BBUnit
• Web based version
• Service by BBU
• Increase traffic to BBU web site
• Establish BBU for microarray
• Cross platform
INTERFACE
Normalisation
•Log
•Normalise
•Mean Centre
•Median centre
Differential
Gene
Expression
•T test
•ANOVA
•Regression
Pattern
Discovery
Utilities
•Hierarchical Cluster Analysis
•SOM
•K-Means
•Fuzzy Art
•PCA
•MDS
•CoA
•Fuzzy C-Means
Contact
[email protected]
http://bbu.uwcm.ac.uk
Acknowledgements
•
•
•
•
•
•
Pete Kille
Alan Clarke
Gareth Hughes
Karen Reed
Lesley Jones
BBU
(EASI team)
(Data)
(Data, & EASI Collaborator)