BDL Case Studies
Download
Report
Transcript BDL Case Studies
Predictive Analytics Case
Studies
Summer 2014
Case Studies
1.
OpenFDA – new API source for
“adverse events”, how to use it,
how to analyze the data
2.
A/B test of Top 10 digital
newspapers ad targeting strategies
OpenFDA Case: Necessity
“Given that those who will experience adverse events
sometimes number in the thousands, automated algorithms
to reliably identify such individuals and generate alerts to
patients, payers & prescribers may be the only feasible means
to conduct proactive safety enforcement on the necessary
scale.”
“Given the wide application of expert systems in other public
health and safety contexts, it seems likely that safety analysis
would gain in effectiveness by adopting automated
procedures in analyzing and reporting their data.”
Data Fusion to Know a Patient
OpenFDA Queries
https://api.fda.gov/drug/event.json?
End Point
search=patient.drug.openfda.pharm_
class_epc:"nonsteroidal+antiinflammatory+drug”
&count=patient.reaction.reactionme
ddrapt.exact
search for records where
openfda.pharm_class_epc
(pharmacologic class)
contains nonsteroidal
anti-inflammatory drug.
count the field
patient.reaction.reaction
meddrapt (patient
reactions).
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class
_epc:%22nonsteroidal+antiinflammatory+drug%22&count=patient.reaction.reactionmeddrapt.exact
Important OpenFDA data types
What the drug is supposed to fix:
Pharmacologic Class (EPC) - pharm_class_epc
How the drug works:
Mechanism of Action (MOA) - pharm_class_moa
What the drug affects:
Physiologic Effect (PE) - pharm_class_pe
What is in the drug:
Chemical Structure (CS) - pharm_class_cs
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class
_epc:%22Serotonin+and+Norepinephrine+Reuptake+Inhibitor%22
Safety Report ID
Adverse Reactions
Biographical Data
Drug Information
More OpenFDA data types
How serious is the reaction: serious (1 for Yes, 2 for No)
•
•
•
•
•
•
•
"serious": "1",
"seriousnesscongenitalanomali": "1",
"seriousnessdeath": "1",
"seriousnessdisabling": "1"
"seriousnesshospitalization": "1",
"seriousnesslifethreatening": "1",
"seriousnessother": "1”
What is the drug indicated for: drugindication
Circumstances for taking drug: patient.drug.drugadditional
Who Died?
https://api.fda.gov/drug/event.json?search=patient.drug.ope
nfda.pharm_class_epc:%22Serotonin+and+Norepinephrine+
Reuptake+Inhibitor%22&count=patient.reaction.reactionout
come
What is the result of the reaction:
patient.reaction.reactionoutcome
628 people have died taking anti-depressants / anti-
anxiety drugs in last 10 years.
Little Python to find out who died
for line in lines:
line = line.strip()
text = ""
company = line.replace("+"," ")
try:
score_serious = 0
uri =
'/drug/event.json?api_key=AoYBqngujW3osTPo9EKREEK3UqHIjWuZHZe8rFTE&search=patient.drug.openfda.man
ufacturer_name=' + '"' + line + '"' + '&count=serious'
conn = httplib.HTTPSConnection(domain)
conn.request("GET", uri)
res = conn.getresponse()
if res.status == 200:
response = res.read()
json_res = json.loads(response)
results = json_res.get("results")
if results != None:
for result in results:
type = result.get("term")
if type == 1:
score_serious = result.get("count")
Repeat for all levels of reaction
.
.
.
Use Clustering to find Groups
data = read.csv('/Users/jbaker/Documents/BDL/data/Medicare
Data//modeldata/DeathbyMaker2.csv',header=TRUE,sep=',',row.names='Companies')
# Ward Hierarchical Clustering
d <- dist(data, method = "euclidean") # distance matrix
fit <- hclust(d, method="ward.D")
#Plot Results
plot(fit, cex = 0.3) # display dendogram
plot(as.dendrogram(fit),horiz=T)
plot(as.dendrogram(fit),cex = 0.2,type="triangle")
Plot to View
Add some groupings
# draw dendogram with red borders around the 5 clusters
rect.hclust(fit, k=5, border="red")
plot(as.phylo(fit), cex = 0.9, label.offset = 1)
plot(as.phylo(fit), type="cladogram", cex = 0.9, label.offset = 1)
plot(as.phylo(fit), type = "unrooted")
Save Clustering for further modeling
outcome = cbind(data,twogroups,threegroups,fourgroups,fivegroups)
outcome$class[outcome$fivegroups=="5"] = "danger"
outcome$class[outcome$fivegroups=="4"] = "warning"
outcome$class[outcome$fivegroups=="3"] = "margin"
outcome$class[outcome$fivegroups=="2"] = "LowRisk"
outcome$class[outcome$fivegroups=="1"] = "NoRisk"
company = row.names(outcome)
outcome =cbind(company, outcome)
write.csv(outcome, file = '/Users/jbaker/Documents/BDL/data/Medicare
Data//modeldata/outcome.csv')
New Data Looks like this
.
.
.
Do some double checking
Get a measure of fit using “confusion-matrix”
lda_fit<-lda(class~anomaly+disability+hospitalization+lifethreat+death,data=data, CV=TRUE)
data.table = table(data$class, lda_fit$class)
data.summary = sum(diag(prop.table(data.table)))
data.summary
Use D3js for better graphs
https://github.com/mbostock/d3/wiki/Gallery
Dendogram for Clusters
Start by building tree (JSON) of results
Then follow D3 examples for this
Dendogram for Packed Circles
Start by building tree (JSON) of results
{"name": "Adverse Reaction Severity",
"children": [{"name": "problem",
"children": [{"name": "more problem",
"children": [{"name": "danger",
"children": [{"name": "Sandoz", "size": 363843},{"name": "Mylan Pharmaceuticals Inc", "size": 483546},{"name": "Qualitest", "size": 379489}
Then follow D3 examples for this
,
EndPoint Reference
https://open.fda.gov/drug/event/reference/
Still needs better documentation
Data Fusion for Prescription Drugs
Product
Medicare
Beneficiary
Files
NDC/
NDA
OpenFDA
Adverse Drug
Events
Medicare
Drug Events
Files
Behavior
NIH
Structured
Product Label
Place Use for NLP
Doctors Notes in EHR (provides watch words, prognosis)
Structured Product Language -“the label”, “package insert” (only
place for detailed indication, contra-indications, warnings, etc.)
Both are always written long form text
OpenFDA Use Cases
Adverse Effects Risk Model
Active Ingredients - Effectiveness Model
Advanced Drug-Drug Interaction Model
Importation Risk Model
Adherence Model
Gaps in Care Model
Case: A/B Test of Top 10 Digital News
Next Time
Big Data Design Principals
1.
Catalog as many API sources of data as you can
2.
Pick sources with common “folding points”
3.
Research what is novel – e.g. pick your outcome
4.
Sample small to test model(s)
5.
Plan for Scale
6.
Dress for Success (e.g. great graphics + good GUIs)
About Big Data Lens
Products & Custom Algorithms for Big Data problems
Specializing in Natural Language Processing & Machine
Learning
Creates API based big data indices to be used in decision
making
content
data
model
prediction
API
Thank You
Brooke Aker
[email protected]