Bioinformatics for Microarray Studies
Download
Report
Transcript Bioinformatics for Microarray Studies
Bioinformatics for
Microarray Studies at IBS
Pei-Ing Hwang, Ph.D.
Mar. 24, 2005
3/24/2005
TIGP
1
Different aspects
for life science research
genomics
transcriptomics
proteomics
3/24/2005
TIGP
2
Building blocks for DNA or RNA
DNA: A, T, G, C
RNA: A, U, G, C
3/24/2005
TIGP
3
DNA: deoxyribonucleic acid
Double stranded
3/24/2005
Antiparallel
TIGP
4
Why microarray?
Gene Expression
To simultaneously study multiple genes
To obtain an overview of gene expression at
transcriptional level under specific experimental
conditions
To study gene interaction network from the
transcriptional aspect
Genome
3/24/2005
SNP detection
To find out recombination site in the
chromosome/genome
Hopefully to discover the gene responsible for a
genetic disease
TIGP
5
Outline
Introduction to Microarray experiments
Experiences at IBS for the cDNA arrays
Data
generated with microarray
DNA annotation
Data Analysis
Data Management
3/24/2005
TIGP
6
About Microarray Technology-1
Up to hundreds of thousands of spots in a
fixed area on a glass slide or a membrane
One species of DNA molecules per one
spot
Spot is also named as “feature”
DNA fixed on the chip or membrane is also called “probe
The sequence or/and function of each
DNA species on the spot is known .
3/24/2005
TIGP
7
About Microarray Technology-2
Making use of “hybridization method”
A : T, U
G:C
Image processing
Data analysis
Result interpretation from biology aspect
3/24/2005
TIGP
8
Types of Microarray
Types of DNA immobilized on the solid
support
Manufacturing methods
Printing vs. photolithography
Solid support
cDNA vs. oligonucleotides
Glass slides
Membrane
Nucleotide labeling (slide scanning condition)
3/24/2005
One color vs. two colors
TIGP
9
GeneChip® Array Manufacuturing
Figure 1. Affymetrix uses a unique combination of photolithography and
combinatorial chemistry to manufacture GeneChip® Arrays.
3/24/2005
TIGP
10
Microarray printing machine
http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano-Print-new-600.jpg
3/24/2005
TIGP
11
Procedure for
one-channel
array
3/24/2005
TIGP
12
Experimental
Procedure for
2-channel
Microarray
3/24/2005
TIGP
13
Data Analyses
Feature intensity acquisition
Image analyses:
To identify differentially expressed genes
Normalization (global, local, print-tip, btwn array etc.)
Clustering or Classification
Analyses from biology aspect
Significant genes
Transcriptional regulation study
Cellular pathway or network finding
3/24/2005
TIGP
14
Experiences at IBS for the
cDNA arrays
3/24/2005
TIGP
15
About IBS tomato arrays
~13000 spots/features per chip
1 clone per spot
cDNA clones from ~a dozen of various
cDNA libraries
At least two different protocols were
followed and six different vectors were
used
More than ten technicians involved
3/24/2005
TIGP
16
Bioinformatics for Microarray at IBS
(cont’d)
IBS tomato EST database construction
Installation, management and
maintenance of data analyses software
Reference information searching
Batch Submission of EST sequences
3/24/2005
TIGP
17
Bioinformatics Needs for Microarray
Studies at IBS
Pre-arraying
cDNA info collection, vector trimming, sequence annotation, EST
submission……..etc.
Array
data management
information management
Gene set characterization, data storage, data retrieval
Post-hybridization
management
3/24/2005
data analysis and
array data analyses, storage of the scanning result, biologyoriented bioinformatics analyses
TIGP
18
Bioinformatics Service Work for
Microarray studies at IBS
Data pre-processing for the cDNAs
Clone
id assignment
Sequence trimming
gene annotation
Function classification
Data sheet preparation for commercial
software to analyze microarray data
Gal
file preparation for GenePixPro
Master Gene List preparation for GeneSpring
3/24/2005
TIGP
19
Vector trimming
cDNA clones
sequencing
Assembly
Database
Function annotation
PCR
Biological meaning :
Spotfire,
GeneSpring
Pathway analysis
Transcription network
Data analysis:
Normalization,
Variance
Clustering
Gene-gene interaction
3/24/2005
TIGP
GenePix
Feature intensities
normalization
20
Pre-array Bioinformatics
1. Clone id generation
clones from labs
2. Vector Trimming
3. Sequence assembly
sequencing
4. Seq annotation (BLAST)
5. EST submission to NCBI
6. Database construction
Raw EST seq
Data Processing and Management
3/24/2005
TIGP
21
Clone id generation
Data centralization following sequencing
Rules for re-arraying
96
well plate to/from 384 well
PCR from 96 well and spotting from 384 well
Order of A1, A2, B1, B2
3/24/2005
TIGP
22
96 or 384 well
cDNA clones
sequencing
96 well
PCR
96 well
384 well
3/24/2005
TIGP
23
96-well to 384 well plates
B2
B1
A2
A1
3/24/2005
TIGP
24
Data collection
Raw sequencing data obtained from the
sequencing company
Organized and stored both ABI and text files by
labs and by date
Confirmed with each sequence contributor for
clone info
Clone id matched with raw sequences
3/24/2005
TIGP
25
Processing the sequencing data
cDNA libraries procedures confirmed with
each single lab
Vector/linker/primer trimming (Seqclean)
Function annotation
Blast
against different database
Gene Ontology annotation
Sequence Assembly (Phrap)
3/24/2005
TIGP
26
Procedure to generate cDNA clones
3/24/2005
TIGP
27
IBS tomato EST Database
Cloning information
Sequencing data
Vector/adaptor Trimming
information
EST assembly
Function annotation
Cross Reference
3/24/2005
TIGP
28
The Tomato Database
Entity-Relationship model
Untrimmed Sequence
Trimmed Sequence
1. Seq id
2. Trimmed Sequence
1. Seq id
2. Trimmed Sequence
3. Method
4. Trim set
Assembly Information
1. Contig _ id
2. Contig Sequence
3. BLAST Result
4. Position
5. Component seq id
1. Seq id
2. At number
3. E-Value
4. Description
5. Identity
6. Other result
NCBI BLAST Result
Lab info
1. Seq id
2. Comment
3. Primer
4. Biotech
5. Sender
6. Collect From
TAIR Result
1. Seq id
2. NCBI _id
3. E-Value
4. Description
5. Identity
6. Other result
Seq _ id
TOM 4
TOM 3
TIGR Result
Clone _ id
Clone _ id
ID MAP
cDNA Library Information
1
1. Clone _ id(3)(4)
8. Host.
2. Name
9. Species
3. Date made
10. Vector
4. Developmental stage 11. Antibiotic.
5. Cloning sites
12. Authors
6. Description
13. Tissue
3/24/2005
7. Library
14. Primer
n
Clone _ id
1
n
1. Seq id
2. Clone _ id
3. Contig id
4. Lab_id#1
5. Lab_id#2
6. NCBI_sbmt_id93
7. NCBI_sbmt_id94
8. dbEST _ accn _no
9.TIGP
note
Gene Ontology
1. TC number
2. EC number
3. Process
-GO_id
-Description
4. Function
-GO_id
-Description
5. Component
-GO_id
-Description
1. Seq id
2. TC number
3. E-Value
4. Description
5. Identity
6. Other result
TC number
29
Information to be further analyzed
Gene set characterization
Number
of unique genes on the array
Number of known/ unkown genes
Coordination of each spotted sequence
Statistics about spotted cDNA
grouped
by function/pathway
grouped by sequence similarity
3/24/2005
TIGP
30
Post-hybridization data
analysis and management
3/24/2005
TIGP
31
Post-hybridization data analysis
Software for Microarray Analysis At IBS
GenePix
Pro5.0 – image processing
GeneSpring – microarray data analysis
Spotfire – microarray data analysis and data
storage
TransPath – pathway searching
3/24/2005
TIGP
32
Image Processing
GenePix Pro5.0
GAL (GenePix
Array List) file
3/24/2005
TIGP
33
From multi-well plate to microarray
3/24/2005
TIGP
34
GAL online
3/24/2005
TIGP
35
GeneSpring at IBS
for microarray data analyses
standalone software
providing statistical methods for data analysis
Some bioinformatics
providing visaulization
licensed annually
rigid format requirement for input data
requiring installation of a master gene list
(master table) prior to data analysis
3/24/2005
TIGP
36
Master table for GeneSpring
Master table contains information of
Id
Source
of DNA
Gene name
Gene function annotation (from Blast results)
GO annotation
Each array needs its own master table
Format of master table may vary with
different version of the software.
3/24/2005
TIGP
37
To generate master table for
GeneSpring
Batch blast against three sequence
database
Parsing Blast results
Incorporating EC number, GO number and
other related data from the best BLAST
matched results
Integrate all required data from various
files and generate the master table
checking
3/24/2005
TIGP
38
Spotfire
for microarray data analyses
server-client software
linked to Oracle database for data storage
providing various statistical methods for data
analysis
capability in establishing links to more
bioinformatics tools
can record analysis procedure
more flexible format requirement for input data
3/24/2005
TIGP
39
One color array for Arabidopsis
Affymetrix ATH1 chip
Annotation information provided by
company and available on internet
3/24/2005
TIGP
40
Bioinformatics support at
Affymetrix
3/24/2005
TIGP
41
Projects for now and the near future
Infrastructure build-up
Microarray data management system
Platform for Bioinformatics analyses
Plant Signaling Pathway Database
3/24/2005
TIGP
42
Team
3/24/2005
TIGP
43
Thank you!
3/24/2005
TIGP
44