milova_032405_glass
Download
Report
Transcript milova_032405_glass
Microarray experiments:
Database and Analysis Tools.
Kate Milova
cDNA Microarray Facility
March 24, 2005
Kate Milova
MolGen retreat
March 24, 2005
1
Outline.
Microarray platforms and services at AECOM:
cDNA
Long Oligo
Affymetrix
Database (cDNA & Long Oligo) structure and content:
Printing information
Chip layout
Annotation
Annotation algorithms and data mining
On-line Analysis Tools:
Normalization
Signal filtering
Data sets comparison
Statistical packages and Analysis software
Summary
Kate Milova
MolGen retreat
March 24, 2005
2
Microarray Platforms at AECOM.
Kate Milova
MolGen retreat
March 24, 2005
3
How to choose a microarray platform.
Kate Milova
MolGen retreat
March 24, 2005
4
Before starting your microarray experiment.
Kate Milova
MolGen retreat
March 24, 2005
5
cDNA Microarray Facility. Home page.
Standart & Custom Arrays. Description & Prices
Hybridization, labeling, bioinformatics,
workshops
Database for cDNA & Long Oligo Arrays.
Analysis Pipeline
AECOM cDNA microarray facility. Supported
publications
Useful links of analysis tools
Kate Milova
MolGen retreat
March 24, 2005
6
Database for Analysis of Microarrays at
AECOM. Contents.
Printing Information
Chip layout
Chip name
Specie
Number of spots
Number of controls
Number of pen
domains
Number of slides
Printing pattern
Distance between
spots
Number of rows
Number of columns
Printing date
Master chip
Kate Milova
Chip name
Spot
information
(Accession or
clone id or
bacterial
control)
Spot location
Library name
Clone location
on 384 plate
Clone location
on 96 plate
MolGen retreat
Gene Annotation
Accession
Clone ID
Clone end
Vector name
Clone name
UniGene cluster ID
Best blast hit
Main blast parameters
(score, E-value, %
identity, blast date, etc.)
Gene ID
Gene symbol
Gene synonyms
Chromosome
Map location
GO IDs
GO Annotation
March 24, 2005
7
Annotation sources: NCBI.
UniGene ID
Accession
UniGene
NCBI
UniGene ID Blast
against UniGene
clusters
Entrez Gene
UniGene ID
Gene ID
GO ID
Blast Software
Blast Search
Refseq & NT
databases
Annotation
Kate Milova
MolGen retreat
March 24, 2005
8
Annotation sources: NCBI.
UniGene ID
Accession
UniGene
UniGene ID Blast
against UniGene
clusters
NCBI UniGene UniGene ID:
NCBI
UniGene ID for cDNA arrays is obtained from the
UniGene source file for each particular accession number
of the clone.
NCBI UniGene Blast:
UniGene ID for Long Oligo arrays is obtained from blast
results
Blast search was done with the set of oligo sequences
against UniGene clusters with cutoff 99% for sequence
identity and 90% for overlapping.
UniGene ID for the oligo hitting multiple UniGene
clusters is marked as an “Ambiguous cluster ID”.
Kate Milova
MolGen retreat
March 24, 2005
9
Annotation sources: NCBI.
UniGene ID Gene ID:
All information retrieved from ‘Enrez Gene’ project is based
on the UniGene cluster ID and corresponding Gene ID.
Gene ID is ambiguous in ‘Gene ID’ to ’UniGene cluster ID’
connection.
Parsing filter was used to eliminate ambiguous Gene IDs.
NCBI
Entrez Gene
Unigene ID
Gene ID
GO ID
Gene ID GO ID:
For each Gene ID corresponding Gene Ontology IDs
were retrieved from Entrez Gene source file
There might be a few or more then 10 different GO IDs
for a Gene ID. All of them are collected.
Kate Milova
MolGen retreat
March 24, 2005
10
Annotation sources: NCBI.
Blast Software package is installed on the
microarray server.
This software allows to format databases and run batch
homology search for any combination of custom
databases and query sequences.
Refseq & NT databases. Annotation
Loaded
NCBI
formatted and periodically updated on the
microarray server.
When databases are updated we run blast search of
cDNA and Long Oligo sequences.
Blast results are parsed using our algorithm for
annotation extraction.
Blast Software
Blast Search
Refseq & NT
databases
Annotation
Kate Milova
MolGen retreat
March 24, 2005
11
Annotation Extraction Algorithm
Sequences
Raw Data
Database of cDNA &
Long Oligo sequences
Formatted
Data
Homology search against RefSeq & NT
Alignment
90% quality
check
80%
Kate Milova
MolGen retreat
March 24, 2005
12
Annotation Extraction Algorithm
EDMUSDFMUSKULUSDETRIKENGLLCLONEJF
FPROTEINRFTYROSINEMNWZMKINASEJHMIW
Linguistic
Filter
Overlapping:
1st RefSeq hit:
1st NT hit:
< 90%
> 90%
OUT
Identity:
Kate Milova
< 80%
MolGen retreat
March 24, 2005
> 80%
13
Annotation sources: Gene Ontology.
Biological
process
Molecular
function
Gene
Ontology
Gene Ontology.
Multiple GO IDs for each
Gene ID are retrieved in the
previous step from Entrez
Gene ( if available).
Cellular
compartment
Gene Ontology annotation for all GO IDs is kept in three
different information fields: biological processes,
molecular function and cellular compartment. For each of
the fields all available annotation was prefiltered with
redundancy check and concatenated.
Kate Milova
MolGen retreat
March 24, 2005
14
cDNA Microarray Facility. Database.
Kate Milova
MolGen retreat
March 24, 2005
15
Database Search.
Database Annotation
Search with:
Accession
Gene annotation
Gene symbol & synonyms
UniGene cluster ID
Chromosome number
Gene ID
GO ID
Function
Cellular compartment
Kate Milova
MolGen retreat
March 24, 2005
16
Microarray Data Analysis Pipeline.
Kate Milova
MolGen retreat
March 24, 2005
17
Pipeline: LOWESS Normalization.
Kate Milova
MolGen retreat
March 24, 2005
18
Pipeline: LOWESS Normalization.
Kate Milova
MolGen retreat
March 24, 2005
19
Pipeline: Filtering.
Kate Milova
MolGen retreat
March 24, 2005
20
Pipeline: Data set Comparison.
Kate Milova
MolGen retreat
March 24, 2005
21
Statistical packages and Analysis software.
Microarray Analysis Software:
GeneTraffic: client-server systems for microarray data
analysis. Iobion
GeneSpring: cutting-edge tools for expression analysis.
Agilent Technologies
GeneSifter. GeneSifter
BASE. Lund University
Data Mining:
PathwayAssist: Interaction Explore Software. Stratagen
Pathways Analysis. Ingenuity
Tools for Statistical Analysis:
SAM: Significance Analysis of Microarrays. Stanford
R statistical package
S-PLUS. Insightful
Kate Milova
MolGen retreat
March 24, 2005
22
Summary
Multiple microarray platforms are available at AECOM:
Affymetrix
cDNA arrays
Long Oligo
Custom arrays
Data analysis and annotation
Database for Analysis of Microarrays containes all information
about our arrays, cDNA and oligo sets
Sequences annotation is updated and integrated into the
database
Web interface of the database makes it easy to search for a
particular gene, synonyms, map location, function, etc…
Easy to use web based analysis pipeline – get your results in
just 5 minutes. List of ‘Up’, ‘Down’ regulated genes with full gene
annotation.
We are here for help and consultation !
Kate Milova
MolGen retreat
March 24, 2005
23
BACKUPS
Kate Milova
MolGen retreat
March 24, 2005
24
cDNA Microarray Facility. Services.
Kate Milova
MolGen retreat
March 24, 2005
25
cDNA Microarray Facility. Arrays.
Kate Milova
MolGen retreat
March 24, 2005
26
cDNA Microarray Facility. Publications.
Kate Milova
MolGen retreat
March 24, 2005
27
Gene Correspondence Tables.
Kate Milova
MolGen retreat
March 24, 2005
28
Gene Correspondence Tables.
Kate Milova
MolGen retreat
March 24, 2005
29