milova_032405_glass

Download Report

Transcript milova_032405_glass

Microarray experiments:
Database and Analysis Tools.
Kate Milova
cDNA Microarray Facility
March 24, 2005
Kate Milova
MolGen retreat
March 24, 2005
1
Outline.
 Microarray platforms and services at AECOM:
 cDNA
 Long Oligo
 Affymetrix
 Database (cDNA & Long Oligo) structure and content:
 Printing information
 Chip layout
 Annotation
 Annotation algorithms and data mining
 On-line Analysis Tools:
 Normalization
 Signal filtering
 Data sets comparison
 Statistical packages and Analysis software
 Summary
Kate Milova
MolGen retreat
March 24, 2005
2
Microarray Platforms at AECOM.
Kate Milova
MolGen retreat
March 24, 2005
3
How to choose a microarray platform.
Kate Milova
MolGen retreat
March 24, 2005
4
Before starting your microarray experiment.
Kate Milova
MolGen retreat
March 24, 2005
5
cDNA Microarray Facility. Home page.
Standart & Custom Arrays. Description & Prices
Hybridization, labeling, bioinformatics,
workshops
Database for cDNA & Long Oligo Arrays.
Analysis Pipeline
AECOM cDNA microarray facility. Supported
publications
Useful links of analysis tools
Kate Milova
MolGen retreat
March 24, 2005
6
Database for Analysis of Microarrays at
AECOM. Contents.
Printing Information
Chip layout
 Chip name
 Specie
 Number of spots
 Number of controls
 Number of pen
domains
 Number of slides
 Printing pattern
 Distance between
spots
 Number of rows
 Number of columns
 Printing date
 Master chip
Kate Milova
 Chip name
 Spot
information
(Accession or
clone id or
bacterial
control)
 Spot location
 Library name
 Clone location
on 384 plate
 Clone location
on 96 plate
MolGen retreat
Gene Annotation
 Accession
 Clone ID
 Clone end
 Vector name
 Clone name
 UniGene cluster ID
 Best blast hit
 Main blast parameters
(score, E-value, %
identity, blast date, etc.)
 Gene ID
 Gene symbol
 Gene synonyms
 Chromosome
 Map location
 GO IDs
 GO Annotation
March 24, 2005
7
Annotation sources: NCBI.
UniGene ID 
Accession
UniGene
NCBI
UniGene ID  Blast
against UniGene
clusters
Entrez Gene
UniGene ID 
Gene ID 
GO ID
Blast Software
Blast Search
Refseq & NT
databases 
Annotation
Kate Milova
MolGen retreat
March 24, 2005
8
Annotation sources: NCBI.
UniGene ID 
Accession
UniGene
UniGene ID  Blast
against UniGene
clusters
 NCBI  UniGene  UniGene ID:
NCBI
 UniGene ID for cDNA arrays is obtained from the
UniGene source file for each particular accession number
of the clone.
 NCBI  UniGene  Blast:
 UniGene ID for Long Oligo arrays is obtained from blast
results
 Blast search was done with the set of oligo sequences
against UniGene clusters with cutoff 99% for sequence
identity and 90% for overlapping.
 UniGene ID for the oligo hitting multiple UniGene
clusters is marked as an “Ambiguous cluster ID”.
Kate Milova
MolGen retreat
March 24, 2005
9
Annotation sources: NCBI.
 UniGene ID  Gene ID:
 All information retrieved from ‘Enrez Gene’ project is based
on the UniGene cluster ID and corresponding Gene ID.
 Gene ID is ambiguous in ‘Gene ID’ to ’UniGene cluster ID’
connection.
 Parsing filter was used to eliminate ambiguous Gene IDs.
NCBI
Entrez Gene
Unigene ID 
Gene ID 
GO ID
 Gene ID  GO ID:
 For each Gene ID corresponding Gene Ontology IDs
were retrieved from Entrez Gene source file
 There might be a few or more then 10 different GO IDs
for a Gene ID. All of them are collected.
Kate Milova
MolGen retreat
March 24, 2005
10
Annotation sources: NCBI.
 Blast Software package is installed on the
microarray server.
 This software allows to format databases and run batch
homology search for any combination of custom
databases and query sequences.
 Refseq & NT databases. Annotation
 Loaded
NCBI
formatted and periodically updated on the
microarray server.
 When databases are updated we run blast search of
cDNA and Long Oligo sequences.
Blast results are parsed using our algorithm for
annotation extraction.
Blast Software
Blast Search
Refseq & NT
databases
Annotation
Kate Milova
MolGen retreat
March 24, 2005
11
Annotation Extraction Algorithm
Sequences
Raw Data
Database of cDNA &
Long Oligo sequences
Formatted
Data
Homology search against RefSeq & NT
Alignment
90% quality
check
80%
Kate Milova
MolGen retreat
March 24, 2005
12
Annotation Extraction Algorithm
EDMUSDFMUSKULUSDETRIKENGLLCLONEJF
FPROTEINRFTYROSINEMNWZMKINASEJHMIW
Linguistic
Filter
Overlapping:
1st RefSeq hit:
1st NT hit:
< 90%
> 90%
OUT
Identity:
Kate Milova
< 80%
MolGen retreat
March 24, 2005
> 80%
13
Annotation sources: Gene Ontology.
Biological
process
Molecular
function
Gene
Ontology
 Gene Ontology.
 Multiple GO IDs for each
Gene ID are retrieved in the
previous step from Entrez
Gene ( if available).
Cellular
compartment
 Gene Ontology annotation for all GO IDs is kept in three
different information fields: biological processes,
molecular function and cellular compartment. For each of
the fields all available annotation was prefiltered with
redundancy check and concatenated.
Kate Milova
MolGen retreat
March 24, 2005
14
cDNA Microarray Facility. Database.
Kate Milova
MolGen retreat
March 24, 2005
15
Database Search.
 Database Annotation
Search with:
 Accession
 Gene annotation
 Gene symbol & synonyms
 UniGene cluster ID
 Chromosome number
 Gene ID
 GO ID
 Function
 Cellular compartment
Kate Milova
MolGen retreat
March 24, 2005
16
Microarray Data Analysis Pipeline.
Kate Milova
MolGen retreat
March 24, 2005
17
Pipeline: LOWESS Normalization.
Kate Milova
MolGen retreat
March 24, 2005
18
Pipeline: LOWESS Normalization.
Kate Milova
MolGen retreat
March 24, 2005
19
Pipeline: Filtering.
Kate Milova
MolGen retreat
March 24, 2005
20
Pipeline: Data set Comparison.
Kate Milova
MolGen retreat
March 24, 2005
21
Statistical packages and Analysis software.
Microarray Analysis Software:
 GeneTraffic: client-server systems for microarray data
analysis. Iobion
 GeneSpring: cutting-edge tools for expression analysis.
Agilent Technologies
 GeneSifter. GeneSifter
 BASE. Lund University
 Data Mining:
 PathwayAssist: Interaction Explore Software. Stratagen
 Pathways Analysis. Ingenuity
 Tools for Statistical Analysis:
 SAM: Significance Analysis of Microarrays. Stanford
 R statistical package
 S-PLUS. Insightful
Kate Milova
MolGen retreat
March 24, 2005
22
Summary
 Multiple microarray platforms are available at AECOM:
 Affymetrix
 cDNA arrays
 Long Oligo
 Custom arrays
 Data analysis and annotation
Database for Analysis of Microarrays containes all information
about our arrays, cDNA and oligo sets
 Sequences annotation is updated and integrated into the
database
 Web interface of the database makes it easy to search for a
particular gene, synonyms, map location, function, etc…
 Easy to use web based analysis pipeline – get your results in
just 5 minutes. List of ‘Up’, ‘Down’ regulated genes with full gene
annotation.
 We are here for help and consultation !
Kate Milova
MolGen retreat
March 24, 2005
23
BACKUPS
Kate Milova
MolGen retreat
March 24, 2005
24
cDNA Microarray Facility. Services.
Kate Milova
MolGen retreat
March 24, 2005
25
cDNA Microarray Facility. Arrays.
Kate Milova
MolGen retreat
March 24, 2005
26
cDNA Microarray Facility. Publications.
Kate Milova
MolGen retreat
March 24, 2005
27
Gene Correspondence Tables.
Kate Milova
MolGen retreat
March 24, 2005
28
Gene Correspondence Tables.
Kate Milova
MolGen retreat
March 24, 2005
29