Data Mining in the Pharmaceutical Industry
Download
Report
Transcript Data Mining in the Pharmaceutical Industry
Data Mining in the
Pharmaceutical Industry
By Jerry Swartz
Introduction
• Since I am a remote student, if there are
questions, feel free to e-mail
[email protected]
Pharmaceutical Development
• Four Stages of Drug Development
– Research finds new drugs
– Development tests and predicts drug behavior
– Clinical trials test the drug in humans
– Commercialization takes drug and sells it to
likely consumers (doctors and patients)
• I’ll show an example for the Research,
Development, and Clinical Trials stages
Research Stage
• Huge user of data mining tools and
techniques
• Scientists run experiments to determine
activity of potential drugs
• Uses high speed screening to test tens,
hundreds, or thousands of drugs very
quickly – this generates microarray data
Research Stage
• “Bioinformatics” is a general term for the
information processing activities on data
generated in Research Stage, especially
microarray data
• General goal is to find activity on relevant
genes or to find drug compounds that
have desirable characteristics (whatever
those may be)
Research Stage
• Data mining techniques used
– Clustering
– Classification
– Neural networks
Research Stage Example 1
• Goal: Determine compounds with similar
•
•
activity
Why: Compounds with similar activity may
behave similarly
When:
– Have known compound and are looking for
something better
– Don’t have known compound but have desired
activity and want to find compound that exhibits
this activity
Research Stage Example 1
• Sample data
Structure\Activity
Alpha
Beta
Delta
Gamma
CO2
0.07
0.88
0.62
0.09
H 2O
0.80
0.54
0.32
0.79
H 2 O2
0.34
0.91
0.44
0.40
Research Stage Example 1
• Cluster compounds that have similar
activity
• We like behavior of H2O and want to see
what compounds have similar activity
• Example derived from Application of
Nearest-Neighbor and Cluster Analyses in
Pharmaceutical Lead Discovery
• Clustering takes place based on similar
activity using Euclidean “distance.”
Research Stage Example 1
• For simplicity, distance in example is simply
•
difference between Beta and Delta values, not
Euclidean
Distances:
CO2
H 2O
H 2 O2
CO2
0.00
0.65
0.16
H 2O
0.65
0.00
0.49
H 2 O2
0.16
0.49
0.00
Research Stage Example 1
• Dendrogram
0.49
0.16
0.00
H2O2
CO2
H2 O
Research Stage Example 1
• Conclusion:
– H2O2 and CO2 are most alike but,
– H2O2 behaves more like H2O than CO2
behaves like H2O
Research Stage Example 1
• Variations
– Example clustering performed on activity
– Clustering could have been performed on
structure (i.e. find chemically similar
compounds)
– Clustering could have been performed on
both structure and activity (called SAR –
Structure Activity Relationship, see next slide)
Research Stage Example 1
Structure
Activity
Development Stage
• Company thinks drug might have some
benefit
• Undergoes testing in animals, human
tissue to observe effect; maybe limited
human tests
• Determine how much drug to consume for
desired effect
• How dangerous is drug?
Development Stage
• Data mining techniques used
– Classification
– Neural networks
Development Stage Example 2
• Goal: Predict if treatment will aid patients
• Why: If drug will not aid patients, what
•
purpose does drug serve?
When:
– Have data supporting use of drug
– Have training data that shows effects of drug
(positive or negative)
– Want to be able to predict which patients will
benefit
Development Stage Example 2
• Will treatment help sickle cell anemia patients?
• We have information like gender, body weight,
•
•
disease state, etc.
Feed these into neural network and predict
whether patient will benefit from drug.
Example derived from Prediction of Sickle Cell
Anemia Patient’s Response to Hydroxyurea
Treatment Using ARTMAP Network
Development Stage Example 2
• Uses ARTMAP network which is similar to
neural network
• Instead of activation function, uses choice
function which compares two values
• Basically matches input to “template” and
generates output
• If input is similar enough to “template” it
generates the corresponding output
Development Stage Example 2
• Imagine training data has one of two
•
•
•
classifications (Yes and No)
Network is trained for the Yes classifications and
a snapshot is taken of the neural network.
Network then trained for the No classifications
and another snapshot is taken.
Output is Yes or No, depending on whether the
inputs are more similar to the “Yes” or the “No”
training data.
Development Stage Example 2
• ARTMAP
Weight
Height
Gender
Blood
Pressure
Imagine array of
weights, one for
each “template”
Template closest
to input chosen.
Patient
Benefits?
Path of “least
resistance” chosen for
output.
Clinical Trials Stage
• Company tests drugs in actual patients on
larger scale
• Must keep track of data about patient
progress
• Government wants to protect health of
citizens, many rules govern clinical trials
• In USA, Food and Drug Administration
oversees trials.
Clinical Trials Stage
• Data mining techniques used
– Neural networks
Clinical Trials Stage
• Data is collected by pharmaceutical company but
•
•
•
undergoes statistical analysis to determine
success of trial
Data reported to FDA inspected closely. Too
many negative reactions might indicate drug is
too dangerous – these are “adverse events”
Adverse event might be medicine causing
drowsiness
Data mining performed by FDA, not as much by
pharmaceutical companies
Clinical Trials Stage Example 3
• Goal: Detect when too many adverse events
•
•
occur or detect link between drug and
adverse event
Why: Too many adverse events linked to a
drug might indicate drug is too dangerous or
health of patient is at risk
When:
– As adverse events are reported to FDA
– Or when link is suspected
Clinical Trials Stage Example 3
• Is a drug causing “too many” adverse events?
• We have number of reports of adverse events
•
•
pertaining to drugs.
Feed these into neural network and let network
lead us to what is “too many.”
Example derived from Data mining in the US
Vaccine Adverse Event Reporting System
(VAERS): early detection of intussusception and
other events after rotavirus vaccination
Clinical Trials Stage Example 3
• Sample data – cells contain number of reports
linking drug and adverse event
Adverse Event\Drug
Tylenol
Motrin
Rotovirus
Coughing
1
2
1
Fever
4
5
2
Intussusception
1
3
5
Clinical Trials Stage Example 3
• Uses Bayesian neural network
• Prior probability is probability that any report
•
•
•
contains reference to adverse event
Posterior probability is probability that report has
link between drug and adverse event
Determines “strength” of link between adverse
event and drug (called Information Component
or IC)
More complicated than appears: patient may
consume multiple drugs – which one caused
adverse event?
Clinical Trials Stage Example 3
• Bayesian Neural Network
Adverse
Event
Drug
Strength
of link
between
adverse
event and
drug
Clinical Trials Stage Example 3
• Could be solved using Bayes Theorem and
correlation techniques
• Number of possible drug/adverse event
combinations is very, very, large
• Training data is from FDA, WHO databases
• Neural network hides statistical complexity
• Unfortunately details of NN like activation
function and hidden nodes are unknown
Data Mining Benefits
• Research Stage – instead of trial and error, data
•
•
•
mining can help find drugs that have desirable
activity
Development Stage – data mining can help
predict who will benefit from drug
Clinical Trials Stage – data mining protects
patients and helps regulate drug testing
Commercialization Stage – data mining can
optimize use of sales resources like manpower,
advertising