Using Gene Ontology Annotations to Interpret DNA Array Data

Download Report

Transcript Using Gene Ontology Annotations to Interpret DNA Array Data

Using Gene Ontology
Annotations to Interpret DNA
Array Data
Stefan Pierrou PhD, AstraZeneca
Spotfire Users Conference 2001-05-03
Department
Author
© 2001, AstraZeneca, Inc. - All Rights Reserved.
COPD Genomics
In collaboration with Southampton University
Department
Author
AstraZeneca-Southampton Collaboration
AZ R&D Lund and AZ R&D Charnwood
Stephen Holgate, Donna Davies & Ratko Djukanovic et al.,
Univ. of Southampton, U.K.
Analysis of Epithelial Gene Expression in COPD
Hypotheses:
 COPD - caused by smoke and exacerbated by infections
 COPD - characterised by altered epithelial genotypes and phenotypes
 In the absence of epithelial activation there is no development of
chronic bronchitis, and no progression of airways remodeling which
ultimately leads to irreversible obstruction
 Epithelial responses to stress (smoke) determine the pathological and
clinical presentations of COPD
Molecular Sciences R&D Lund
Stefan Pierrou
3
Objective
To identify candidate genes associated with disease to
provide opportunities for development of novel
treatments of COPD.
Molecular Sciences R&D Lund
Stefan Pierrou
Cycles of Tissue Damage in Pathogenesis of COPD
Chronic irritation (Smoking, infections, etc)
Genetic predisposition?
Mucociliary
Dysfunction
Epithelial activation,
injury & remodeling
Disease
progression
Mucus hypersecretion
Bacterial
products
Inflammatory
Response
• Proteases • Chemokines
• Cytokines • Oxidants
Molecular Sciences R&D Lund
Stefan Pierrou
Bacterial
colonization
COPD Pathology
Chronic airflow obstruction due to
chronic bronchitis and/or emphysema
Molecular Sciences R&D Lund
Stefan Pierrou
Analysis of Epithelial Gene Expression in COPD
“Critical Path”
Smokers
with/without
COPD
Tissue source
Brushings
Bronchial biopsies
Lung resection
Primary cell-based model
Non-smokers
Microarrays
Identify differentially
expressed genes
Bioinformatics
Tissue expression pattern
RT-PCR
IHC/in situ
Functional assays
Cytokine production
Differentiation
Proliferation
Secretion
Motility
Molecular Sciences R&D Lund
Stefan Pierrou
Candidate Targets
7
Stress related to COPD
Smoke
Oxidants
GFs
Define the
biochemical
pathways initiated
by COPD related
stresses
Analysis of Epithelial Gene Expression in COPD
Study Design
Day 0: Clinical assessment
Day 14: Reversibility testing (salbutamol)
Day 21: Sputum induction to characterize inflamm. cells & mediators
Day 28: Bronchoscopy to obtain samples of bronchial epithelium
Day 70: Bronchoscopy to obtain bronchial biopsies
Molecular Sciences R&D Lund
Stefan Pierrou
8
Analysis of Epithelial Gene Expression in COPD
General Exclusion Criteria
(1) Atopy (positive skin prick tests and history)
(2) Asthma (reversibility to salbutamol <12%)
(3) Respiratory diseases other than COPD
(4) Other conditions which might compromise bronchoscopy
(5) Recent respiratory or other infections (6 weeks)
(6) Recent treatment with oral or inhaled corticosteroids
Molecular Sciences R&D Lund
Stefan Pierrou
9
Tests Performed
Clinical screening
MRC scale
St.Georges´questionnarie
Allergy testing
Histamine challenge
Diary card and peak flow
Serum, DNA
Sputum induction
Blood gases
Full Lung function
Salbutamol reversibility
CT Thorax
Bronchoscopies
Brushings
Biopsies-IHC, ISH
Molecular Sciences R&D Lund
Stefan Pierrou
10
Data Analysis
Sort and Select
p-value
P call
E-lab
Excel Results Sheet
Molecular Sciences R&D Lund
Stefan Pierrou
Brushing+control+stimulated
(Sammon) incl. ALI
Molecular Sciences R&D Lund
Stefan Pierrou
12
Clinical Data Overlay
Molecular Sciences R&D Lund
Stefan Pierrou
GOAC - The Gene Ontology
Annotation Campaign
- some background
Department
Author
Analysis and clustering of gene
expression data
generates most often lists of incomprehensible gene names
Molecular Sciences R&D Lund
Stefan Pierrou
15
Clustering of gene expression data
according to protein function classification
• Classification is currently
done manually
• Need for automatisation
• Gene Ontology is a tool
to make this happen
Molecular Sciences R&D Lund
Stefan Pierrou
16
History - why GO?
• Need for data reduction based on biological
information
• Data overlay tool discussions with Spotfire
• Spotfire created a plug in for GO
• GO has since then started to become the de
facto standard for annotation of genes.
Molecular Sciences R&D Lund
Stefan Pierrou
17
GO Consortium - www.geneontology.org
•
Drosophila (fruitfly) - FlyBase
•
Saccharomyces (budding yeast) - Saccharomyces Genome
Database (SGD)
•
Mus (mouse) - Mouse Genome Database (MGD) & Gene
Expression Database (GXD)
•
Arabidopsis (brassica or mustard family) - The Arabidopsis
Information Resource (TAIR)
•
Caenorhabditis (nematode) - WormBase
Molecular Sciences R&D Lund
Stefan Pierrou
18
GOC Collaborators
• Academic
•
SwissProt - annotations ongoing
•
Interpro - annotations ongoing - currently 1/2 done
• Corporate
•
Celera - uses GO for Drosophila
•
Incyte - sponsor to Stanford group
• GOC Sponsor - AstraZeneca
Molecular Sciences R&D Lund
Stefan Pierrou
19
The Gene Ontology
• Molecular function describes the tasks performed by
individual gene products; examples are transcription
factor and DNA helicase.
• Biological process describes broad biological goals,
such as mitosis or purine metabolism, that are
accomplished by ordered assemblies of molecular
functions.
• Cellular component encompasses subcellular
structures, locations, and macromolecular complexes;
examples include nucleus, telomere, and origin
recognition complex
Molecular Sciences R&D Lund
Stefan Pierrou
20
GOC Annotation status
- as of April 29 2001
SGD
FlyBase
MGI
•
Biological Process
5,684
1,306
3,461
•
Molecular Function
5,780
5,290
4,574
•
Cellular Component
2,350
1,347
3,545
•
Total gene prod. Annot.
6,373
5,628
5,603
Molecular Sciences R&D Lund
Stefan Pierrou
21
The GOAC development project
How to make use of Gene Ontology annotations
a reality
Department
Author
Starting point for the GOAC
development project
Overlay expression data &
visualise
Gene ontology DB
& browser
Molecular Sciences R&D Lund
Stefan Pierrou
23
GOAC components
Excel Support file
Gene ontology DB
& browser
GO
mySQL
DB
Molecular Sciences R&D Lund
Stefan Pierrou
Annotation database
with GO terms
Oracle
DB
24
Overlay expression data &
visualise
Molecular Sciences R&D Lund
Stefan Pierrou
Molecular Sciences R&D Lund
Stefan Pierrou
Summary
• DNA micro array data can be analysed for a
selected set of genes or complete profiles
• We suggest the use of controled vocabulary
such as GO for profile analysis
• The possibility of overlaying expression data
on a structure such as GO, will be essential
• The Spotfire plugin developed in Göteborg will
fill this need
Molecular Sciences R&D Lund
Stefan Pierrou
Acknowledgement
•
AstraZeneca
•
Bo Servenius
•
Robert Virtala
•
Krzysztof Pawlowski
•
Jacob Sjöberg
•
Dan Gustavsson
• Spotfire
•
Tobias Fändriks
Molecular Sciences R&D Lund
Stefan Pierrou