Transcript slides

Probe Selection for Microarrays
Considerations and Pitfalls
Kay Hofmann
MEMOREC Stoffel GmbH
Cologne/Germany
Probe selection wish list
Probe selection strategy should ensure
 Biologically meaningful results (The truth...)
 Coverage, Sensitivity (... The whole truth...)
 Specificity (... And nothing but the truth)
 Annotation
 Reproducability
Technology
Probe immobilization
 Oligonucleotide coupling
Synthesis with linker, covalent coupling to surface
 Oligonucleotide photolithography
 ds-cDNA coupling
cDNA generated by PCR, nonspecific binding to surface
 ss-cDNA coupling
PCR with one modified primer, covalent coupling, 2nd strand removal
Spotting
 With contact (pin-based systems)
 Withoug contact (ink jet technology)
Technology-specific requirements
General
 Not too short (sensitivity, selectivity)
 Not too long (viscosity, surface properties)
 Not too heterogeneous (robustness)
 Degree of importance depends on method
Single strand methods (Oligos, ss-cDNA)
 Orientation must be known
 ss-cDNA methods are not perfect
 ds-cDNA methods don’t care
Probe selection approaches
Accuracy
Throughput
Selected
Genes
Selected Gene
Regions
ESTs
Cluster
Representatives
Anonymous
Non-Selective Approaches
Anonmymous (blind) spotting
 Using clones from a library without prior sequencing
 Only clones with interesting expression pattern are sequenced
 Normalization of library highly recommended
 Typical uses:
 HT-arrays of ‘exotic’ organisms or tissues
 Large-scale verification of DD clones
EST spotting
 Using clones from a library after sequencing
 Little justification since sequence availability allow selection
Spotting of cluster representatives
Sequence Clustering
 For human / mouse / rat EST clones: public cluster libraries
 Unigene (NCBI)
 THC (TIGR)
 For custom sequence: clustering tools
 STACK_PACK (SANBI)
 JESAM (HGMP)
 PCP (Paracel, commercial)
A benign clustering situation
!
In the absence of 5‘-3‘ links
Two clusters corresponding to one gene
!
Overlap too short
Three clusters corresponding to one gene
!
!
Chimeric ESTs
One cluster corresponding to two genes
Chimeric ESTs .. continued
 Chimeric ESTs are quite common
 Chimeric ESTs are a major nuisance for array probe selection
 One of the fusion partners is usually a highly expressed mRNA
 Double-picking of chimeric ESTs can fool even cautious clustering
programs.
 Unigene contains several chimeric clusters
 The annotation of chimeric clusters is erratic
 Chimeric ESTs can be detected by genome comparison
 There is one particularly bad class of chimeric sequences that will
be subject of the exercises.
How to select a cluster representative
 If possible, pick a clone with completely known sequence
 Avoid problematic regions
 Alu-repeats, B1, B2 and other SINEs
 LINEs
 Endogenous retroviruses
 Microsatellite repeats
 Avoid regions with high similarity to non-identical sequences
 In many clusters, orientation and position relative to ORF are
unknown and cannot be selected for.
 Test selected clone for sequence correctness
 Test selected clone for chimerism
 Some commercial providers offer sequence verified UNIGENE
cluster representatives
Selection of genes
 If possible, use all of them
 Biased selection
 Selection by tissue
 Selection by topic
 Selection by visibility
 Selection by known expression properties
 Selection from unbiased pre-screen
 Use sources of expression information
 EST frequency
 Published array studies
 SAGE data
Selection of gene regions
3‘ UTR
ORF
5‘ UTR
Alternative polyadenylation
Alternative polyadenylation
 Constitutive polyA heterogeneity
 3’-Fragments: reduced sensitivity
 no impact on expression ratio
 Regulated polyA heterogeneity
 Fragment choice influences expression ratio
 Multiple fragments necessary
 Detection of cryptic polyA signals
 Prediction (AATAAA)
 Polyadenylated ESTs
 SAGE tags
Alternative splicing
Alternative splicing
 Constitutive splice form heterogeneity
 Fragment in alternative exon: reduced sensitivity
 No impact on expression ratio
 Regulated splice form heterogeneity
 Fragment choice influences expression ratio
 Multiple fragments necessary
 Detection of alternative splicing events
 Hard/Impossible to predict
 EST analysis (beware of pre-mRNA)
 Literature
Alternative promoter usage
Alternative promotor usage
 What is the desired readout?
 If promoter activity matters most: multiple fragments
 If overall mRNA level matters most: downstream fragment
 Detection of alternative promoter usage
 Prediction difficult (possible?)
 EST analysis
 Literature
UDP-Glucuronosyltransferases
UGT1A8
UGT1A7
Selection of gene regions
 Coding region (ORF)
 Annotation relatively safe
 No problems with alternative polyA sites
 No repetitive elements or other funny sequences
 danger of close isoforms
 danger of alternative splicing
 might be missing in short RT products
 3’ untranslated region
 Annotation less safe
 danger of alternative polyA sites
 danger of repetitive elements
 less likely to cross-hybridize with isoforms
 little danger of alternative splicing
 5’ untranslated region
 close linkage to promoter
 frequently not available
A checklist
 Pick a gene
 Try get a complete cDNA sequence
 Verify sequence architecture (e.g. cross-species comparison)
 Mask repetitive elements (and vector!)
 If possible, discard 3’-UTR beyond first polyA signal
 Look for alternative splice events
 Use remaining region of interest for similarity searches
 Mask regions that could cross-hybridize
 Use the remaining region for probe amplification or EST selection
 When working with ESTs, use sequence-verified clones
Exercises
1) Assume that you are interested in the p53-homolog p63, also known as
Ket (TrEMBL: Q9UE10) What kind of fragment(s) would you use for
expression analysis? Why?
2) The cytochrome P450 family is very important for toxicological
microarray analysis since most isoforms repond to different toxic
compounds. Is it possible to design a cDNA fragment (minimal size 200 bp)
that would be able to separate CYP2A6 and CYP2A7? What is the
situation with CYP1A1 and CYP1A2? What region should be used?
3) Check whether probes for p53 (Swissprot: P53_HUMAN), p63 and p73
(P73_HUMAN) are available on the Affymetrix human 35K chip or the
mouse 12K chip. Check whether there are sequence verified clones
available from Research Genetics.
4) Two (hypothetical) papers using different types of microarrays report
very different results for the regulation of the thyroid receptor alpha-2
(Swissprot: THA2_HUMAN). Can you think of a possible explanation?
What could you do to resolve this issue?
Tools for Exercises
1) Literature search with Pubmed:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
2) Sequence search & retrieval (SwissProt, Entrez)
http://www.expasy.ch/sprot/
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Nucleotide
3) BLAST searches at SIB
http://www.ch.embnet.org/software/aBLAST.html
Use specific subdatabase! Mind the ‘repsim‘ filter
4) Two-way sequence alignment
http://www.ch.embnet.org/software/LALIGN_form.html