Transcript Lecture 5
Microarrays
Pauliina Munne
09.10.2014
Biomedicum Functional Genomics Unit
FuGU
• Established in 2006 as a center supporting functional genomics
research in nation and internationwide
• Comprehensive and state of the art functional genomics technology
services (nonprofit)
• Services include e.g. next-generation sequencing, microarrays,
recombinant virus services and genome-scale reagents for gene
knockdown
Microarrays & Next Generation Sequencing
• NGS
– Illumina MiSeq &
HiSeq
– NextSeq
• Microarrays
– Affymetrix
– Illumina
– Agilent
Recombinant Virus Services
• Recombinant Viral Particles
• for gene expression and knock-down studies (shRNA)
• virus titering and biosafety analyses
• BSL II facilities
• Genome Scale TRC1 shRNA Libraries for RNAi
• Q-RT-PCR Services for knock-down efficiency validation
• LightCycler®480 Instrument II
• Universal ProbeLibrary (UPL) probes (Roche)
Microarray Services
• Experimental planning and selection of the most suitable technology
platform (based on project size, organism, number of samples and
genes)
• Fast and high quality service including full data analysis
Plan & design experiment
Perform experiment + QA
Analysis of the results
Biological interpretation
Applications of Microarrays
- gene, exon miRNA, epigenetics, aCGH etc.
• Affymetrix:
•
•
•
•
•
HTA, Exon
Gene
3’ IVT
miRNA
CytoScan
• Agilent:
•
•
•
•
Expression
Exon
CGH + SNP
miRNA
• Illumina:
•
Gene
Microarray Pipeline
Design and perform experiment
Process and normalise data
Statistical analysis
Differentially expressed genes
Biological interpretation
Experimental Design & Replicates
• Biological replicates: how many?
• At least 3 per condition group
• having more replicates increases sensitivity in detecting
differential expression
=> Needed replicate number depends on:
• Strength of the studied effect
• Within group variation
• Level of technical noise
• Technical replicates:
– not often used nowadays (except if comparing experiments
between chips in Agilent and Illumina)
Experimental Design & Replicates
Treatment A
Treatment B
3 biological
replicates
1 sample
= 1 array
Treatment 1
Treatment 2
compare
Experimental Design & Replicates
What kind of samples can be compared?
• Do not try to compare apples and oranges:
– If the samples are too different – all genes will be differentially
expressed
=> no useful information can be gained
• Two different tissues are usually too different to be compared
directly
• If several tissue samples (meant to represent the same tissue)
contain varying amounts of different cell types this can also be a
problem
Experimental Design & Replicates
Other Important Issues:
• RNA sample quality
• Standardize conditions for all samples in the experiment set
(e.g. age, gender, RNA extraction method etc.)
• Choose the correct time point
• Only pool samples when sample material is scarce
• Be prepared to validate your microarray results with some other
technique like RT-QPCR
• Data analysis issues should always be considered when making
experimental design
• Experienced data analyst / bioinformatician should be consulted
cDNA microarray
Oligonucleotide microarrays
cDNA microarray
(Agilent)
• RNA from two different tissues or cell populations is used to synthesize singlestranded cDNA
• in the presence of nucleotides labeled with two different fluorescent dyes (for
example, green Cy3 labeled on sample A and red Cy5 labeled on sample B
• Both samples are mixed in hybridization buffer and hybridized to the array surface
=> competitive binding of differentially labeled cDNAs to the
corresponding array elements
=> High-resolution confocal fluorescence scanning of the array with two
different wavelengths corresponding to the dyes used provides relative
signal intensities and ratios of mRNA abundance for the genes represented
on the array.
• Green spots indicate the genes upregulated in sample A.
• Red spots indicate the genes down-regulated in sample A.
• Yellow spots indicate the equal expressions of those genes in sample
A and sample B
Agilent: two-color gene expression analysis
=> Not recommended any more
Oligonucleotide Microarrays
(Illumina, Affymetrix)
• RNA from different tissues or cell populations is used to generate double-stranded
cDNA carrying a transcriptional start site for T7 DNA polymeras
• biotin-labeled nucleotides are incorporated into the synthesized complementary
RNA (cRNA) molecules, because the oligonucleotides sequence are in the sense
direction and so one has to use antisense RNA which is cRNA
• Each target sample is hybridized to a separate probe array
• The arrays are stained with a streptavidin-phycoerythrin conjugate that binds to
biotin tags and emits fluorescent light when exited with a laser
• Automated image analysis software measures fluorescence by calculating signal
intensity units at each discreet probe site or feature on the array
• Signal intensities of probe array element sets on different arrays are used to
calculate relative mRNA abundance for the genes represented on the array
Oligonucleotide Microarray
cDNA microarray
Oligonucleotide microarrays
Affymetrix Microarrays
photolithographic synthesis of oligonucleotide on microarrays
RNA fragments with fluorescent tags
Affymetrix – 25 mers are in situ sythesized
on a glass wafer nucleotide by nucleotide
using photolitography
probe
Target
= fluorescently labeled
sample mRNA
Millions of DNA strands
build up in each cell
500 thousand cells in each array
a probe, 25 base long
www.affymetrix.com
Principle of Microarray Hybridization
• Probes are printed to the array base by base in a process that employs a
combination of chemistry and photolithography
Affymetrix Microarray Formats
Probes per feature
(median)
11 oligomers in 3'
end
21 oligomers along
the gene
4 oligomers per
exon
3 different
transcripts
5’ end
3’ end
Illumina Expression BeadChips
Probes are bound to magnetic beads randomly distributed across arrays
• 6 – 12 samples on one chip
• 15 – 30 replicate beads per array target on the average
• Most genes are represented by a single probe, some by
two probes for different isoforms of the gene
Extracting information from the image
Raw data file
Feature identifiers
Sample columns
Intensity measurements
Future?
Illumina
• New versions of each array type are published roughly every other
year
=> old arrays are not available for very long.
=> This may be a problem for large studies spanning over
several years
=> impossible to add samples to the old sampleseries
Agilent
• Older, Agilent will be more focused on other areas
Affymetrix
• New array versions are published infrequently
Complete support for any old array is provided
Most widely used platform
NGS will mostly likely subside the microarrays in the future, but for
now the prices are still quite high
Spotted Microarrays
• Oligonucleotides, cDNA or small fragments of PCR products
corresponding to specific genes are spotted on the chip
• A robot spotter normally does the process and one or more probes can be
used for each gene
• Contrary to oligonucleotide arrays, spotted arrays are "customizable"; the
user can choose the probes to be spotted according to specific
experimental needs
• These kinds of arrays are usually hybridized with labeled mRNA, cDNA or
cRNA because both strands are used as probes on the microarray
General Outline of Expression Data Analysis
Design and perform experiment
Process and normalise data
Statistical analysis
Differentially expressed genes
Biological interpretation
Analysis software:
• R/Bioconductor (free)
• GeneSpring (commercial)
• Lots of other free &
commercial tools
Normalization & Pre-processing
• Quantile normalization is typically used to correct between-chip bias
Normalization & Pre-processing
Normalization & Pre-processing
• Quality Inspection (for raw +normalized data)
• Quality control tools and quality plots create outlier chips, which can
easily be detected
• Removal of such arrays can vastly improve results of statistical
testing
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
GSM516
Labels
GSM516
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Colored Dendrogram ( 2 groups)
Jurkat_Dox
groups
Jurkat
Statistical Analysis
• Running statistical tests (t-test)
• p-values and false discovery rates for the reliability of the change
• fold-change (FC) for the size of the change in gene expression
• Filtering differentially expressed (DE) genes
• Genes that have similar behavior within each
sample group but the group means clearly
differ from each other
= To produce a reasonable sized list of the
most differentially expressed genes
• Visualising the results
Functional Analysis
• Carrying out gene functional analysis
– Focus in pathways or other functional categorizations rather than
individual genes
– Different approaches exist for this:
• Detect functional enrichment in the DE target list
• Detect functional enrichment towards the top of the list when
all array targets have been ranked according to the evidence
for being differentially expressed
• Make the statistical test between sample groups not
assuming independence between array targets (as usually)
but taking the dependence between genes belonging to
same functional categorization into account
Functional Analysis
• http://www.geneontology.org
• Classifies genes into a hierarchy, placing gene products with similar
functions together
Three main categories:
Biological process (BP)
Molecular function (MF)
Cellular component (CC)
Functional Analysis
• The Kyoto Encyclopaedia of Genes and Genomes
• http://www.genome.jp/kegg/
• Provides searchable pathways for molecular interaction and reaction
networks for metabolism, various cellular processes and human
diseases
• Manually entered from published materials
Functional Analysis
• Tools for functional analysis
– David
• http://david.abcc.ncifcrf.gov/home.jsp
– Pathway-Express
• http://vortex.cs.wayne.edu/projects.htm#Pathway-Express
– GSEA
• http://www.broad.mit.edu/gsea/
– GOrilla
• http://cbl-gorilla.cs.technion.ac.il/
– GenMapp
• http://www.genmapp.org/
– Cytoscape
• http://www.cytoscape.org/
Publishing Microarray Data
• GEO (Gene Expression Omnibus)
– www.ncbi.nlm.nih.gov/geo/
• ArrayExpress
– http://www.ebi.ac.uk/microarray-as/ae/
• Most journals require the expression data to be submitted to
a public repository
– some even before they will send the manuscript to referees for
evaluation
• The data can be hidden from others than the authors and the
referees before the official publication of the article
[email protected]