Microarray Data Analysis Day 2

Download Report

Transcript Microarray Data Analysis Day 2

Microarray Data Analysis
Day 2
Microarray Data Process/Outline
1. Experimental Design
2. Image Analysis – scan to intensity measures (raw
data)
3. Normalization – “clean” data
4. More “low level” analysis-fold change, ANOVA,
(Z-score) --data filtering
*
5. Data mining-how to interpret > 6000 measures
– Databases
– Software
– Techniques-clustering, pattern recognition etc.
– Comparing to prior studies, across platforms?
6. Validation
Today we will be using Spotfire software to filter and
search your data.
10928 records in Spotfire
-5999 S. pombe specific
-166 Affy controls
5763 S. cerevisiae specific
6603
4377
1407
819
The Affy detection oligonucleotide sequences are frozen at the time
of synthesis, how does this impact downstream data analysis?
Biology and Data Mining
Subcellular Localization, Provides a simple goal for
genome-scale functional prediction
Determine how many of the ~6000 yeast
proteins go into each compartment
Subcellular Localization,
a standardized aspect of function
Cytoplasm
Nucleus
Membrane
ER
Extracellular
[secreted]
Golgi
Mitochondria
"Traditionally" subcellular localization is
"predicted" by sequence patterns
NLS
Nucleus
Cytoplasm
Membrane
TM-helix
ER
HDEL
Extracellular
[secreted]
Sig. Seq.
Golgi
Import Sig.
Mitochondria
Subcellular localization is associated with the
level of gene expression
[Expression Level
in Copies/Cell]
Cytoplasm
Nucleus
Membrane
ER
Extracellular
[secreted]
Golgi
Mitochondria
Combine Expression Information & Sequence
Patterns to Predict Localization
[Expression Level
in Copies/Cell]
NLS
Nucleus
Cytoplasm
Membrane
TM-helix
ER
HDEL
Extracellular
[secreted]
Sig. Seq.
Golgi
Import Sig.
Mitochondria
Major Objective: Discover a comprehensive theory
of life’s organization at the molecular level
– The major actors of molecular biology: the
nucleic acids, DeoxyriboNucleic Acid (DNA)
and RiboNucleic Acids (RNA)
– The central dogma of molecular biology???
Epigenetics
RNA editing
Post-translational
modification
Proteins are very complicated molecules with 20
different amino acids.
Translational regulation
Biology Application Domain
Validation
Data Analysis
Microarray
Experiment
Experiment
Design and
Hypothesis
Image
Analysis
Data
Mining
Data Warehouse
Artificial
Intelligence (AI)
Statistics
Knowledge discovery
in databases (KDD)
Higher Level
Microarray data analysis
• Clustering and pattern detection
• Data mining and visualization
• Linkage between gene expression data and
gene sequence/function/metabolic pathways
databases
• Discovery of common sequences in coregulated genes
• Meta-studies using data from multiple
experiments
Scatter plot of all genes in a
simple comparison of two
control (A) and two
treatments (B: high vs. low
glucose) showing changes in
expression greater than 2.2
and 3 fold.
Types of Clustering
• Herarchical
– Link similar genes, build up to a tree of all
• Self Organizing Maps (SOM)
– Split all genes into similar sub-groups
– Finds its own groups (machine learning)
Cluster by
color/expression
difference
Self Organizing Maps
Public Databases
• Gene Expression data is an essential
aspect of annotating the genome
• Publication and data exchange for
microarray experiments
• Data mining/Meta-studies
• Common data format - XML
• MIAME (Minimal Information About a
Microarray Experiment)
The 3 Gene Ontologies
• Molecular Function = elemental activity/task
–
the tasks performed by individual gene products; examples are carbohydrate binding
and ATPase activity
• Biological Process = biological goal or objective
–
broad biological goals, such as mitosis or purine metabolism, that are accomplished
by ordered assemblies of molecular functions
• Cellular Component = location or complex
–
subcellular structures, locations, and macromolecular complexes; examples include
nucleus, telomere, and RNA polymerase II holoenzyme
One Last Note
• Microarrays are “cutting edge”
technology
• You now have experience doing a
technique that most Ph.D.s have never
done
• Looks great on a resume…