Dr. Barry Flinn`s talk (March 11, 2003)

Download Report

Transcript Dr. Barry Flinn`s talk (March 11, 2003)

Potato Genomics In Fredericton
Dr. Barry Flinn
Co-Lead Investigator - Genome Atlantic CPGP
Research Director - Solanum Genomics International Inc.
Economic Importance Of The Potato
• Integral part of the diet of a large
proportion of the world’s population
• Supplies at least 12 essential vitamins
and minerals
• Still much unknown regarding the
control of potato development and
processing/quality traits
(ie. disease resistance, stress tolerance, carbohydrate metabolism, tuber shape)
What Does Genomics Mean?
• “Genomics” is a science that studies the genetic material
of a species at the molecular level
• A scientific approach that seeks to identify and define
the function of genes, as well as uncover when and how
genes work together to produce traits
• “Structural Genomics” approaches (mapping) generally
focus on traits controlled by one or a few genes, and
often only provide information regarding the location
of a gene or genes
• We can examine the interrelationships and interactions
between thousands of genes
How do we do this?
Genome Organization
Chromosome
DNA
Leaf
Tuber
Genome Organization
DNA
Gene 1
Gene 2
Etc.
....TATACAGCAAAATAGAAAGATCTAGTGTCCCATGGCGATGAGTCGTGTAGCTTCT….
Promoter
“Switch”
Coding ORF
“Message”
cDNA Collections (Libraries)
• Various tissues are collected from the plant,
and messages are extracted from each of these
Leaf
Messages
Tuber
Messages
cDNA Collections (Libraries)
• The messages are “copied” to form doublestranded DNA copies (cDNA) of each message
Leaf cDNA
Tuber cDNA
• Each copy is “glued” into a piece of bacterial DNA
for easier storage, handling and propagation, resulting
in a collection or “library” of cDNAs for each tissue
cDNA Collections (Libraries)
• The cDNAs are then read or “sequenced”, to give the
order of A’s, C’s, G’s or T’s for each
• We are left with the sequence of each gene that is
active (expressed) in each cell, tissue or organ studies
• These are “Expressed Sequence Tags” or ESTs
• Using complex computer resources, these ESTs can
be analyzed and compared with known sequences
and proteins
• Look for messages associated with specific organs or
characteristic/traits
Take Home Points
• Messages from various genes are important, as they
dictate which proteins are produced
• Promoters are also important, as they dictate where
a specific message and protein is produced
• “Genomics” involves the study of all of the
messages produced by the various plant cells
• A lot of information which must be organized
and analyzed
Project Description
Identification Of A Differential Gene Expression Pattern
And Genes Related To Resistance In Potato Late Blight
• One of the most devastating disease of potato worldwide
• If left unmanaged, complete destruction of crops can occur
• Attacks leaves and tubers; large necrotic lesions on leaves
and dry rot that spreads through tubers; 2o bacterial and
fungi often infect through late blight lesions
Late Blight Project
• Collaborative effort with AAFC Potato Research Centre
• Population of blight-sensitive and blight-resistant plants
of near isogenicity
• cDNA libraries made from leaves of a blight-sensitive and
a blight resistant plant
• 2500 messages were sequenced from each library
(5000 total ESTs)
• Different ESTs to be profiled for expression
• The tremendous amounts of data generated will need to be
managed efficiently
Bioinformatics
• Intranet Website
• Database
• Analysis Tools
SGII Intranet Website
Links (IBM Patent, NCBI, PubMed, etc)
Blast Search (on site)
Sequence Manipulation Suite
ClustalW
Modifying Sequences
Database
Access
Database
• Contains all the EST’s sequences
• Contains useful annotations
–
–
–
–
–
Blast Searches
Contig Assemblies
Transmembrane Spanning Regions
Gel Pictures
EST Information
Database
Database - Sequence Info
Database - Sequence Info
Database - Sequence Info
Data Analysis
• Tens of thousands of ESTs available for study
• Most methods to study message distributions are
low throughput AND time consuming
• “Genomics” necessitates the large scale study of
gene expression
How can we do this?
Microarray Analysis
Microarray Analysis
Microarray Analysis
Microarray Analysis
Microarray Analysis - Processing
Image Processing
Intensity Dependence Comparison
12
R2 = 0.6185
10
8
Log(R/G)
6
Slide3
4
Slide70
Poly. (Slide70)
R2 = 0.2014
2
Poly. (Slide3)
0
0
2
4
6
8
10
12
14
16
18
Data Normalization
-2
-4
-6
0.5*(Log(G) + Log(R))
Analysis
Differential
Gene
Expression
Cluster
Analysis
Pathway
Analysis
Microarray Analysis - Processing
Microarray Analysis - Processing
Signal
Background
Microarray Analysis - Processing
• Irregular size or shape
• Irregular placement
• Low intensity
indistinguishable
saturated
• Saturation
• Spot variance
• Background variance
bad print
miss alignment
artifact
Microarray Analysis - Processing
• Calculate numeric characteristics of each spot
• Throw out spots that do not meet minimum
requirements for each characteristic
• Throw out spots that do not have minimum overall
combined quality
Microarray Analysis - Data
Normalization
• Normalize data to correct for variances
– Dye bias
– Location bias
– Intensity bias
– Pin bias
– Slide bias
• Control vs. non-control spots
Microarray Analysis - Data
Normalization
• Assumptions
– Overall mean average ratio should be 1
– Most genes are not differentially expressed
– Total intensity of dyes are equivalent
Microarray Analysis - Data
Normalization (LOWESS)
Intensity Dependence Comparison
12
R2 = 0.6185
10
8
Log(R/G)
6
Slide3
4
Slide70
Poly. (Slide70)
2
R = 0.2014
2
Poly. (Slide3)
0
0
2
4
6
8
10
-2
-4
-6
0.5*(Log(G) + Log(R))
12
14
16
18
Microarray Analysis - Data
Normalization
Differential Gene Expression:
• n-fold change
– n typically >/= 2
– May hold no biological relevance
– Often too restrictive
• 2 expression
– Calculate standard deviation 
– Genes with expression more than 2 away are
differentially expressed
Microarray Analysis -Clustering
• Cluster genes based on expression profiles
– Gene expression across several treatments
• Hypothesis: Genes with similar function have similar
expression profiles
Expression Profile Clustering
Microarray Analysis - Data Management
Project
Database
Engine
Late Blight Project
cDNA Microarray Using SGII Clones
• hybridized with Cy3 (resistant) + Cy5 (susceptible) probes
(reciprocal labelling experiments)
Late Blight Project
cDNA Microarray Using SGII Clones
• hybridized with Cy3 (resistant) + Cy5 (susceptible) probes
(reciprocal labelling experiments)
ANDLBRLF02345HTF.01 - Class II chitinase
ANDLBRLF01256HTF.01 - Pathogenesis-related protein
P23 precursor
ANDLBRLF02041HTF.01 - Unknown protein
Late Blight Project
cDNA Microarray Using SGII Clones
RT-PCR Using PR1-1 Primers
MW
S
R
Top 5 Expression Profiles
Clone ID
384
1256
857
922
2345
Ratio Of
Resistant/Susceptible
Expression
21.8
19.9
11.3
10.0
8.1
BLAST Homology
Pathogenesis-related protein PR-1
Osmotin-like protein
Hypothetical protein
Unknown
Class II chitinase
What Use Is All Of This Information?
• Transgenics:
- Enhance tuber quality, processing traits, disease
resistance, stress tolerance more rapidly than breeding
• Expression Assisted Selection:
- Obtain expression profiles for thousands of genes
associated with specific traits or characteristics
- Use these profiles as a baseline to compare with
the expression profiles of unknown clones; crosses
• New Protein Products :
- Identify genes encoding secreted proteins/ligands
- Test these for growth-promoting/other effects
- Express genes in batch cultures and purify proteins
Example Of Gene Use
GA-20 oxidase in potato:
GFP expression in tobacco cells
• GA-20 oxidase
knockouts with
enhanced tuber
production
• GA-20 oxidase
knockouts with
reduced tuber
sprouting
Information Processing and Handling
• Assembly and annotation of genomic data
• EST analysis and databases
• Cluster analysis of microarray data
• Comparisons of various transcriptomic methods
• Integration of sequence, transcriptomic, proteomic,
metabolomic, transgenic data