Transcript slides

Probe selection for Microarrays

Considerations and pitfalls
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Probe selection wish list
 Probe selection strategy should ensure
 Biologically meaningful results (The truth...)
 Coverage, Sensitivity (... The whole truth...)
 Specificity (... And nothing but the truth)
 Annotation
 Reproducibility
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Technology

Probe immobilization





Oligonucleotide coupling
Synthesis with linker, covalent coupling to surface
Oligonucleotide photolithography
ds-cDNA coupling
cDNA generated by PCR, nonspecific binding to surface
ss-cDNA coupling
PCR with one modified primer, covalent coupling, 2nd strand
removal
Spotting


With contact (pin-based systems)
Without contact (ink jet technology)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Technology-specific requirements

General





Not too short (sensitivity, selectivity)
Not too long (viscosity, surface properties)
Not too heterogeneous (robustness)
Degree of importance depends on method
Single strand methods (Oligos, ss-cDNA)



Orientation must be known
ss-cDNA methods are not perfect
ds-cDNA methods don’t care
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Probe selection approaches
Accuracy
Throughput
Selected Gene
Regions
Selected
Genes
ESTs
Cluster
Representatives
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Anonymous
Non-Selective Approaches

Anonymous (blind) spotting




Using clones from a library without prior sequencing
Only clones with interesting expression pattern are sequenced
Normalization of library highly recommended
Typical uses:



HT-arrays of ‘exotic’ organisms or tissues
Large-scale verification of Differential Display clones
EST spotting


Using clones from a library after sequencing
Little justification since sequence availability allow selection
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Spotting of cluster representatives

Sequence Clustering

For human/mouse/rat EST clones: public cluster libraries



Unigene (NCBI)
THC (TIGR)
For custom sequence: clustering tools



STACK_PACK (SANBI)
JESAM (HGMP)
PCP (Paracel, commercial)
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
A benign clustering situation
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
In the absence of 5‘-3‘ links
!
Two clusters corresponding to one gene
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
!
Overlap too short
Three clusters corresponding to one gene
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
!
Chimeric ESTs
!
One cluster corresponding to two genes
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Chimeric ESTs ... continued







Chimeric ESTs are quite common
Chimeric ESTs are a major nuisance for array probe selection
One of the fusion partners is usually a highly expressed mRNA
Double-picking of chimeric ESTs can fool even cautious
clustering programs.
Unigene contains several chimeric clusters
The annotation of chimeric clusters is erratic
Chimeric ESTs can be detected by genome comparison

There is one particularly bad class of chimeric sequences that
will be subject of the exercises.
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
How to select a cluster representative


If possible, pick a clone with completely known sequence
Avoid problematic regions









Alu-repeats, B1, B2 and other SINEs
LINEs
Endogenous retroviruses
Microsatellite repeats
Avoid regions with high similarity to non-identical sequences
In many clusters, orientation and position relative to ORF are
unknown and cannot be selected for.
Test selected clone for sequence correctness
Test selected clone for chimerism
Some commercial providers offer sequence verified UNIGENE
cluster representatives
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Selection of genes


If possible, use all of them
Biased selection






Selection by tissue
Selection by topic
Selection by visibility
Selection by known expression properties
Selection from unbiased pre-screen
Use sources of expression information



EST frequency
Published array studies
SAGE data
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Selection of gene regions
3‘ UTR
ORF
5‘ UTR
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative polyadenylation
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative polyadenylation

Constitutive polyA heterogeneity



Regulated polyA heterogeneity



3’-Fragments: reduced sensitivity
no impact on expression ratio
Fragment choice influences expression ratio
Multiple fragments necessary
Detection of cryptic polyA signals



Prediction (AATAAA)
Polyadenylated ESTs
SAGE tags
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative splicing
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative splicing

Constitutive splice form heterogeneity



Regulated splice form heterogeneity



Fragment in alternative exon: reduced sensitivity
No impact on expression ratio
Fragment choice influences expression ratio
Multiple fragments necessary
Detection of alternative splicing events



Hard/Impossible to predict
EST analysis (beware of pre-mRNA)
Literature
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative promoter usage
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Alternative promoter usage

What is the desired readout?



If promoter activity matters most: multiple fragments
If overall mRNA level matters most: downstream fragment
Detection of alternative promoter usage



Prediction difficult (possible?)
EST analysis
Literature
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
UDP-Glucuronosyltransferases
UGT1A8
UGT1A7
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
Selection of gene regions

Coding region (ORF)






Annotation relatively safe
No problems with alternative
polyA sites
No repetitive elements or other
funny sequences
danger of close isoforms
danger of alternative splicing
might be missing in short RT
products

3’ untranslated region






Annotation less safe
danger of alternative polyA sites
danger of repetitive elements
less likely to cross-hybridize with
isoforms
little danger of alternative splicing
5’ untranslated region


close linkage to promoter
frequently not available
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11
A checklist










Pick a gene
Try get a complete cDNA sequence
Verify sequence architecture (e.g. cross-species comparison)
Mask repetitive elements (and vector!)
If possible, discard 3’-UTR beyond first polyA signal
Look for alternative splice events
Use remaining region of interest for similarity searches
Mask regions that could cross-hybridize
Use the remaining region for probe amplification or EST selection
When working with ESTs, use sequence-verified clones
Swiss Institute of Bioinformatics
Institut Suisse de Bioinformatique
LF-2001.11