Per_CAD_2011(2)

Download Report

Transcript Per_CAD_2011(2)

Computational Laboratory:
aCGH Data Analysis
Feb. 4, 2011
Per Chia-Chin Wu
Today’s Topics
• Review
aCGH and its data analysis
• Homework of aCGH data analysis using tools in
Genboree and ruby
Chromosomal Aberrations
REF: Albertson et al
Array CGH
Label
Patient
DNA with
Cy3
Label
Control
DNA with
Cy5
Hybridize DNA to
genomic clone
microarray
Analyze Cy3/Cy5
fluorescence ratio of
patient to control
(log of Cy3/Y5)
Workflow of aCGH Analysis
Finished chips (scanner) Raw image data
(experiment info )
(image processing software)
Probe level raw intensity data
Background adjustment,
Normalization, transformation
Raw copy number (CN) data [log ratio of tumor/normal intensities]
Segmentation and
boundary determination
Estimation of CN
Characterizing individual genomic profiles
Normalization
• Background Adjustment/Correction
Reduces unevenness of a single chip
Eliminates non-specific hybridization signal
Before adjustment
After adjustment
Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B)
Normalization
• Normalization
Reduces technical variation between chips
Before
After
S – Mean of S
S’ =
STD of S
S’ ~ N(0,1 )
• Log Transformation
S : Probe raw intensity; S’ : Log transformation, S’ = log2(S)
CN = S’tumor - S’normal = log2(Stumor/Snormal)
before Log
transformation
S
after Log
transformation
Log(S)
Segmentation/Smoothing
CN
Clone/Chromosome
Segmentation/Smoothing
CN
Clone/Chromosome
Segmentation/Smoothing
• Goal:To partition the clones into sets with the same copy
number and to characterize the genomic segments.
 Noise reduction
 Detection of Loss, Normal, Gain, Amplification
 Breakpoint analysis
• Biological model: genomic rearrangements lead to
gains or losses of sizable contiguous parts of the
genome. Recurrent (over tumors) aberrations may
indicate an oncogene or a tumor suppressor gene
Segmentation Methods
•
•
•
•
AWS - Adaptive Weights Smoothing
CBS - Circular Binary Segmentation
HMM - Hidden Markov Model partitioning
Many more
All existing methods amount to unsupervised, locationspecific partitioning and operating on individual
chromosomes.
Workflow of aCGH Data Analysis
Finished chips (scanner) Raw image data
(experiment info )
(image processing software)
Probe level raw intensity data
Background adjustment,
Normalization, transformation
Raw copy number (CN) data [log ratio of tumor/normal intensities]
Segmentation and
boundary determination
Estimation of CN
Characterizing individual genomic profiles
Homework: Analyze TCGA Data
The Cancer Genome Atlas Project (TCGA)
• Goal: find genomic alterations that cause cancer
(mutations, CNA, methylation, …)
• Pilot project
1. brain (glioblastoma multiforme): 186 pairs of tumor
and normal samples
2. lung (squamous)
3. ovarian (serous cystadenocarcinoma )
Flowchart of Data Analysis
Raw copy number (CN) data [log ratio of tumor/normal intensities]
Segmenttion and
boundary determination
Estimation of CN
Characterizing individual genomic profiles
Annotation
Identify Recurrent Genes
Ruby: Mapping Probes
Ruby: Mapping Probes
Ruby: Mapping Probes
LFF format
Upload Data
Data Analysis: Segmentation
Data Analysis: Combine Tracks
Data Analysis: Annotation Selector
Data Analysis: Mapping Genes
Data Analysis: Recurrent Genes
Overview of Data Analysis
Raw copy number (CN) data [log ratio of tumor/normal intensities]
Data Preprocessing (Ruby) and uploading data to Genboree
Segmentation (Segmentation Tool)
Characterizing individual genomic profiles
Combing data
Annotation (Annotation Selector; Attribute Lifter)
Identify Recurrent Genes (Ruby)
You Need To Submit
1. ruby script from step 1 that creates your lff file
2. ruby script from step 5 that parses your table
3. two-column final output from step 5