Transcript Slide 1

Analysis of Drug-Gene
Interaction Data
Florian Ganglberger
Sebastian Nijman Lab
Nijman Lab
Nijman Lab
• Working on specialised target-oriented cancer
therapies
• Cancer = cell mutation
Drug
Mutation
Drug
Mutation
Mutation
Drug
Mutation
Mutation
Drug
Mutation
Motivation
• Testing various drugs on various mutated cells
• 100 drugs vs 100 mutations
= 10.000 interactions
• Analyse the generated data to find new
treatments
Overview
• Background
– Biological Background
– Technical Procedure
– Initial State
– Special Aspects
– Previous Approach
• Analysis
– Explorative Data Analysis
– Drug Noisiness
Data generation
Overview
• Hit detection
– Statistical Methods
– Filtering Methods
– The Algorithm
– Evaluation of the result
Biological Background
• Idea behind cancer treatment
– Kill cancer cells while leaving normal cells alive
• Common chemotherapies
– Kill cells with higher division rate
– Problem: moth-, throat-, bowel-mucosa and hair
cells
– Feel sick, loosing hair etc.
Biological Background
• Synthetic lethality approach
– Some biochemical process which are necessary for
cell growth are redundant
– e.g. DNA repair
– Biochemical processes are chained
= “protein pathway”
Protein pathways
Protein A
Drug
Protein B
Protein C
Cell growth
Gene
Synthetic lethality
• Choose a cancer which has a mutation of a
gene in one of that pathways
• Find a drug which inhibits the other pathway
Synthetic lethality
• Produce cells with mutations which are
normally present in cancer
• Find drug
• Possible that this will work in real cancer
– Tumours have more than one mutation  can
influence each other
Technical Procedure
• Standard dataset consists of 38.400
interactions
• 96 drugs x 100 mutations x 4
• Testing would be inefficient
Technical Procedure
• Idea: Testing different cell lines in one well
•  384 wells
Before the experiment
Before the experiment
After the experiment
• Copy the barcodes of the cells by a polymerase
chain reaction (PCR)  amplifies the signal
• Adding a vitamin to the barcode which can
stick on a dye-containing protein
• Amount of barcode correlates with the amount
of remaining cells
After the experiment
Allocation
• Red and infrared emitted light  barcode 
mutation
• Green reflected light  cell amount
– Arbitrary unit which correlates with the cell amount
– Called “Reporter”
• Drug  because of the used well
Initial state
• Because drugs are dissolved in a dilution, we
can use wells without drugs  use as control
Back to statistics....
Special Aspects
• Biological and technical factors cause noisy
and not directly usable data
 Inter- and intraindividual variability
Interindividual Variability
• Variability between observation units
• Cells with the same mutation = one
observation unit = “one virtual cancer patient”
• Variation among different mutated cells
• Reasons
– Mutations can be toxic itself
– Characteristics of the technical process
Interindividual Variability
• Average amount of remaining mutations
Variability of Technical Procedure
• Limited precision
– Precision of drug dosing
– Precision of cell amount
– Quality of the measurement equipment
• Decreased sensitivity to a lower signal
– Detection limit
– Killed cells don’t get a zero signal  background
noise with different variability
Variability of Technical Procedure
• Amplification problems
– Copying the barcodes by PCR needs material
– If some cell lines are completely killed  more
material for other cell lines  higher amplification
of survived cells
Amplification Problems
Previous Approach
• Visual method, based on scatter plots
• Identify outliers visually
Previous Approach
1. Calculating the effect
1. Median normalization of drugs
2. Calculate a relative ratio
Previous Approach
• Plotting the ratio against the median of a
mutation
There are some problems....
• If two lines overlap, hits can be obscured
• No comparable value that estimates the
significance of outliers
• Intraindividual variability referred to replicates
is ignored
• Human errors  outlier-detection is subjective
• Slow, not automatable method
Overview
• Background
– Biological Background
– Technical Procedure
– Initial State
– Special Aspects
– Previous Approach
• Analysis
– Explorative Data Analysis
– Drug Noisiness
Explorative Data Analysis
• Necessary for hit detection
• Analysis of the behaviour of the data
• Closer look at
– Distribution of mutations
– Variability of mutations and replicates
– Skewness of mutations
– Noisiness of Drugs
Distribution of Mutations
• Choosing the right statistical test
• Test will be applied on mutations to see which
drug works best
• Effect is point of interest  Matrix of relative
ratios
Variability of Mutations
• Decreased sensitivity to lower signal
• Maybe a detection limit
• Spread vs Level plot
Replicate Variability
• Important factor is the multiple testing of cells
by the same drugs.
• Indicator for accurateness and reproducibility
of the technical procedure.
Skewness of Mutations
• Another indicator for different behaviour
below the threshold
• Right skewed distributions because of
background noise in lower signal
Drug Noisiness
• Nothing to do with background noise
• Caused by technical procedure
– Overdosing of cells or drugs
– Toxicity (“Dosis facit venenum“)
• Different effect
– Strong resistance
– Strong sensitivity
Amplification Problems
Strong Noisiness
•
•
•
•
Easy to identify
Dedicated outliers
High amount of false positive hits
Idea: Noisiness causes weak correlation to the
control
Weak Noisiness
• Also numerous differences in sensitivity or
resistance
• Contrast to normal drugs is not well defined
• Visual methods failed
• Also a lot of false positive hits
Strong Noisiness vs Weak Noisiness
Overview
• Hit detection
– Statistical Methods
– Filtering Methods
– The Algorithm
– Evaluation of the result
Hit detection
• Definition of a Hit
– Indicate synthetic lethality
– Resistance is also interesting from a biological
point of view
– Not noisy
• 2 Stages:
1. Finding potential hits
2. Filtering false-positive hits and incomparable data
Statistical Test
• Mutations not normally distributed
• Compare the 4 replicates to their mutation
• Mann-Whithney u-test
– Compares two medians
– Needs approximately identical distribution form of
random variables X and Y
– No symmetry or normal distribution needed
Statistical Test
• Disadvantages
– Rank-sum tests are based on the order, not on the
magnitudes
– Weak outlying interactions get the same p-values
as strong outliers
– P-values are not interindividual comparable, but
the significance is an indicator for it.
– Strong noisy drugs are usually extreme outliers
reduce the significance
Multiple testing
• Multiple testing of interactions against their
mutations
• Increases the error
• 100 different interactions
•
=
Multiple testing
• Bonferroni correction needed
• How to achieve significant results?
– Calculate the median of replicates
– Testing just the upper and lower 10% of the data
Filtering Drugs
• Filtering strong noisy drugs by correlation
coefficient
• Filter before the test to increase the
significance
• Note: Drugs shouldn’t be filtered
automatically, just identified. If drugs are
toxic or not is the decision of a biologist
Filtering strong noisy drugs
Filtering weak noisy drugs
• Much harder to identify
• Idea: Weak noisy drugs producing many falsepositive hits with high significance
– Calculating p-value
– Order by significance
– Frequency of drugs in the top hits is an indicator
for weak noisiness
Top Drugs
Filter Mutations
Filter data below a detection limit
Ideas
• Filter by threshold: 30% of the data
just one dataset  no universal validity of the
threshold about 250
• Filter by skewness: 17% of the data
• Filter by variationcoefficient 12%
Threshold Estimation
• Idea: Modification of skewness filter method
• Outliers of skewness are below the threshold
• Last non-outlier above the skewness outliers
are normal data
• Threshold should be approximately in the
middle of these points
The Algorithm
• R-Demo
Results