Bioinformatics for Microarray Studies

Download Report

Transcript Bioinformatics for Microarray Studies

Bioinformatics for
Microarray Studies at IBS
Pei-Ing Hwang, Ph.D.
Mar. 24, 2005
3/24/2005
TIGP
1
Different aspects
for life science research
genomics
transcriptomics
proteomics
3/24/2005
TIGP
2
Building blocks for DNA or RNA
DNA: A, T, G, C
 RNA: A, U, G, C

3/24/2005
TIGP
3
DNA: deoxyribonucleic acid
Double stranded
3/24/2005
Antiparallel
TIGP
4
Why microarray?

Gene Expression




To simultaneously study multiple genes
To obtain an overview of gene expression at
transcriptional level under specific experimental
conditions
To study gene interaction network from the
transcriptional aspect
Genome



3/24/2005
SNP detection
To find out recombination site in the
chromosome/genome
Hopefully to discover the gene responsible for a
genetic disease
TIGP
5
Outline
Introduction to Microarray experiments
 Experiences at IBS for the cDNA arrays

 Data
generated with microarray
 DNA annotation
 Data Analysis
 Data Management
3/24/2005
TIGP
6
About Microarray Technology-1
Up to hundreds of thousands of spots in a
fixed area on a glass slide or a membrane
 One species of DNA molecules per one
spot




Spot is also named as “feature”
DNA fixed on the chip or membrane is also called “probe
The sequence or/and function of each
DNA species on the spot is known .
3/24/2005
TIGP
7
About Microarray Technology-2

Making use of “hybridization method”
A : T, U
 G:C

Image processing
 Data analysis
 Result interpretation from biology aspect

3/24/2005
TIGP
8
Types of Microarray

Types of DNA immobilized on the solid
support


Manufacturing methods


Printing vs. photolithography
Solid support



cDNA vs. oligonucleotides
Glass slides
Membrane
Nucleotide labeling (slide scanning condition)

3/24/2005
One color vs. two colors
TIGP
9
GeneChip® Array Manufacuturing
Figure 1. Affymetrix uses a unique combination of photolithography and
combinatorial chemistry to manufacture GeneChip® Arrays.
3/24/2005
TIGP
10
Microarray printing machine
http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano-Print-new-600.jpg
3/24/2005
TIGP
11
Procedure for
one-channel
array
3/24/2005
TIGP
12
Experimental
Procedure for
2-channel
Microarray
3/24/2005
TIGP
13
Data Analyses


Feature intensity acquisition
Image analyses:
To identify differentially expressed genes
 Normalization (global, local, print-tip, btwn array etc.)
 Clustering or Classification

Analyses from biology aspect
Significant genes
 Transcriptional regulation study
 Cellular pathway or network finding

3/24/2005
TIGP
14
Experiences at IBS for the
cDNA arrays
3/24/2005
TIGP
15
About IBS tomato arrays
~13000 spots/features per chip
 1 clone per spot
 cDNA clones from ~a dozen of various
cDNA libraries
 At least two different protocols were
followed and six different vectors were
used
 More than ten technicians involved

3/24/2005
TIGP
16
Bioinformatics for Microarray at IBS
(cont’d)
IBS tomato EST database construction
 Installation, management and
maintenance of data analyses software
 Reference information searching
 Batch Submission of EST sequences

3/24/2005
TIGP
17
Bioinformatics Needs for Microarray
Studies at IBS
 Pre-arraying

cDNA info collection, vector trimming, sequence annotation, EST
submission……..etc.
 Array

data management
information management
Gene set characterization, data storage, data retrieval
 Post-hybridization
management

3/24/2005
data analysis and
array data analyses, storage of the scanning result, biologyoriented bioinformatics analyses
TIGP
18
Bioinformatics Service Work for
Microarray studies at IBS

Data pre-processing for the cDNAs
 Clone
id assignment
 Sequence trimming
 gene annotation
 Function classification

Data sheet preparation for commercial
software to analyze microarray data
 Gal
file preparation for GenePixPro
 Master Gene List preparation for GeneSpring
3/24/2005
TIGP
19
Vector trimming
cDNA clones
sequencing
Assembly
Database
Function annotation
PCR
Biological meaning :
Spotfire,
GeneSpring
Pathway analysis
Transcription network
Data analysis:
Normalization,
Variance
Clustering
Gene-gene interaction
3/24/2005
TIGP
GenePix
Feature intensities
normalization
20
Pre-array Bioinformatics
1. Clone id generation
clones from labs
2. Vector Trimming
3. Sequence assembly
sequencing
4. Seq annotation (BLAST)
5. EST submission to NCBI
6. Database construction
Raw EST seq
Data Processing and Management
3/24/2005
TIGP
21
Clone id generation
Data centralization following sequencing
 Rules for re-arraying

 96
well plate to/from 384 well
 PCR from 96 well and spotting from 384 well
 Order of A1, A2, B1, B2
3/24/2005
TIGP
22
96 or 384 well
cDNA clones
sequencing
96 well
PCR
96 well
384 well
3/24/2005
TIGP
23
96-well to 384 well plates
B2
B1
A2
A1
3/24/2005
TIGP
24
Data collection

Raw sequencing data obtained from the
sequencing company
 Organized and stored both ABI and text files by
labs and by date
 Confirmed with each sequence contributor for
clone info
 Clone id matched with raw sequences
3/24/2005
TIGP
25
Processing the sequencing data

cDNA libraries procedures confirmed with
each single lab
 Vector/linker/primer trimming (Seqclean)
 Function annotation
 Blast
against different database
 Gene Ontology annotation

Sequence Assembly (Phrap)
3/24/2005
TIGP
26
Procedure to generate cDNA clones
3/24/2005
TIGP
27
IBS tomato EST Database
Cloning information
 Sequencing data
 Vector/adaptor Trimming
information
 EST assembly
 Function annotation
 Cross Reference

3/24/2005
TIGP
28
The Tomato Database
Entity-Relationship model
Untrimmed Sequence
Trimmed Sequence
1. Seq id
2. Trimmed Sequence
1. Seq id
2. Trimmed Sequence
3. Method
4. Trim set
Assembly Information
1. Contig _ id
2. Contig Sequence
3. BLAST Result
4. Position
5. Component seq id
1. Seq id
2. At number
3. E-Value
4. Description
5. Identity
6. Other result
NCBI BLAST Result
Lab info
1. Seq id
2. Comment
3. Primer
4. Biotech
5. Sender
6. Collect From
TAIR Result
1. Seq id
2. NCBI _id
3. E-Value
4. Description
5. Identity
6. Other result
Seq _ id
TOM 4
TOM 3
TIGR Result
Clone _ id
Clone _ id
ID MAP
cDNA Library Information
1
1. Clone _ id(3)(4)
8. Host.
2. Name
9. Species
3. Date made
10. Vector
4. Developmental stage 11. Antibiotic.
5. Cloning sites
12. Authors
6. Description
13. Tissue
3/24/2005
7. Library
14. Primer
n
Clone _ id
1
n
1. Seq id
2. Clone _ id
3. Contig id
4. Lab_id#1
5. Lab_id#2
6. NCBI_sbmt_id93
7. NCBI_sbmt_id94
8. dbEST _ accn _no
9.TIGP
note
Gene Ontology
1. TC number
2. EC number
3. Process
-GO_id
-Description
4. Function
-GO_id
-Description
5. Component
-GO_id
-Description
1. Seq id
2. TC number
3. E-Value
4. Description
5. Identity
6. Other result
TC number
29
Information to be further analyzed

Gene set characterization
 Number
of unique genes on the array
 Number of known/ unkown genes
Coordination of each spotted sequence
 Statistics about spotted cDNA

 grouped
by function/pathway
 grouped by sequence similarity
3/24/2005
TIGP
30
Post-hybridization data
analysis and management
3/24/2005
TIGP
31
Post-hybridization data analysis

Software for Microarray Analysis At IBS
 GenePix
Pro5.0 – image processing
 GeneSpring – microarray data analysis
 Spotfire – microarray data analysis and data
storage
 TransPath – pathway searching
3/24/2005
TIGP
32
Image Processing
GenePix Pro5.0
 GAL (GenePix
Array List) file

3/24/2005
TIGP
33
From multi-well plate to microarray
3/24/2005
TIGP
34
GAL online
3/24/2005
TIGP
35
GeneSpring at IBS








for microarray data analyses
standalone software
providing statistical methods for data analysis
Some bioinformatics
providing visaulization
licensed annually
rigid format requirement for input data
requiring installation of a master gene list
(master table) prior to data analysis
3/24/2005
TIGP
36
Master table for GeneSpring

Master table contains information of
 Id
 Source
of DNA
 Gene name
 Gene function annotation (from Blast results)
 GO annotation
Each array needs its own master table
 Format of master table may vary with
different version of the software.

3/24/2005
TIGP
37
To generate master table for
GeneSpring
Batch blast against three sequence
database
 Parsing Blast results
 Incorporating EC number, GO number and
other related data from the best BLAST
matched results
 Integrate all required data from various
files and generate the master table
 checking

3/24/2005
TIGP
38
Spotfire







for microarray data analyses
server-client software
linked to Oracle database for data storage
providing various statistical methods for data
analysis
capability in establishing links to more
bioinformatics tools
can record analysis procedure
more flexible format requirement for input data
3/24/2005
TIGP
39
One color array for Arabidopsis
Affymetrix ATH1 chip
 Annotation information provided by
company and available on internet

3/24/2005
TIGP
40
Bioinformatics support at
Affymetrix
3/24/2005
TIGP
41
Projects for now and the near future
Infrastructure build-up
 Microarray data management system
 Platform for Bioinformatics analyses
 Plant Signaling Pathway Database

3/24/2005
TIGP
42
Team
3/24/2005
TIGP
43
Thank you!
3/24/2005
TIGP
44