BDL Case Studies

Download Report

Transcript BDL Case Studies

Predictive Analytics Case
Studies
Summer 2014
Case Studies
1.
OpenFDA – new API source for
“adverse events”, how to use it,
how to analyze the data
2.
A/B test of Top 10 digital
newspapers ad targeting strategies
OpenFDA Case: Necessity
 “Given that those who will experience adverse events
sometimes number in the thousands, automated algorithms
to reliably identify such individuals and generate alerts to
patients, payers & prescribers may be the only feasible means
to conduct proactive safety enforcement on the necessary
scale.”
 “Given the wide application of expert systems in other public
health and safety contexts, it seems likely that safety analysis
would gain in effectiveness by adopting automated
procedures in analyzing and reporting their data.”
Data Fusion to Know a Patient
OpenFDA Queries
 https://api.fda.gov/drug/event.json?
End Point
 search=patient.drug.openfda.pharm_
class_epc:"nonsteroidal+antiinflammatory+drug”
 &count=patient.reaction.reactionme
ddrapt.exact
search for records where
openfda.pharm_class_epc
(pharmacologic class)
contains nonsteroidal
anti-inflammatory drug.
count the field
patient.reaction.reaction
meddrapt (patient
reactions).
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class
_epc:%22nonsteroidal+antiinflammatory+drug%22&count=patient.reaction.reactionmeddrapt.exact
Important OpenFDA data types
 What the drug is supposed to fix:
 Pharmacologic Class (EPC) - pharm_class_epc
 How the drug works:
 Mechanism of Action (MOA) - pharm_class_moa
 What the drug affects:
 Physiologic Effect (PE) - pharm_class_pe
 What is in the drug:
 Chemical Structure (CS) - pharm_class_cs
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class
_epc:%22Serotonin+and+Norepinephrine+Reuptake+Inhibitor%22
Safety Report ID
Adverse Reactions
Biographical Data
Drug Information
More OpenFDA data types
 How serious is the reaction: serious (1 for Yes, 2 for No)
•
•
•
•
•
•
•
"serious": "1",
"seriousnesscongenitalanomali": "1",
"seriousnessdeath": "1",
"seriousnessdisabling": "1"
"seriousnesshospitalization": "1",
"seriousnesslifethreatening": "1",
"seriousnessother": "1”
 What is the drug indicated for: drugindication
 Circumstances for taking drug: patient.drug.drugadditional
Who Died?
 https://api.fda.gov/drug/event.json?search=patient.drug.ope
nfda.pharm_class_epc:%22Serotonin+and+Norepinephrine+
Reuptake+Inhibitor%22&count=patient.reaction.reactionout
come
 What is the result of the reaction:
patient.reaction.reactionoutcome
 628 people have died taking anti-depressants / anti-
anxiety drugs in last 10 years.
Little Python to find out who died
for line in lines:
line = line.strip()
text = ""
company = line.replace("+"," ")
try:
score_serious = 0
uri =
'/drug/event.json?api_key=AoYBqngujW3osTPo9EKREEK3UqHIjWuZHZe8rFTE&search=patient.drug.openfda.man
ufacturer_name=' + '"' + line + '"' + '&count=serious'
conn = httplib.HTTPSConnection(domain)
conn.request("GET", uri)
res = conn.getresponse()
if res.status == 200:
response = res.read()
json_res = json.loads(response)
results = json_res.get("results")
if results != None:
for result in results:
type = result.get("term")
if type == 1:
score_serious = result.get("count")
Repeat for all levels of reaction
.
.
.
Use Clustering to find Groups
data = read.csv('/Users/jbaker/Documents/BDL/data/Medicare
Data//modeldata/DeathbyMaker2.csv',header=TRUE,sep=',',row.names='Companies')
# Ward Hierarchical Clustering
d <- dist(data, method = "euclidean") # distance matrix
fit <- hclust(d, method="ward.D")
#Plot Results
plot(fit, cex = 0.3) # display dendogram
plot(as.dendrogram(fit),horiz=T)
plot(as.dendrogram(fit),cex = 0.2,type="triangle")
Plot to View
Add some groupings
# draw dendogram with red borders around the 5 clusters
rect.hclust(fit, k=5, border="red")
plot(as.phylo(fit), cex = 0.9, label.offset = 1)
plot(as.phylo(fit), type="cladogram", cex = 0.9, label.offset = 1)
plot(as.phylo(fit), type = "unrooted")
Save Clustering for further modeling
outcome = cbind(data,twogroups,threegroups,fourgroups,fivegroups)
outcome$class[outcome$fivegroups=="5"] = "danger"
outcome$class[outcome$fivegroups=="4"] = "warning"
outcome$class[outcome$fivegroups=="3"] = "margin"
outcome$class[outcome$fivegroups=="2"] = "LowRisk"
outcome$class[outcome$fivegroups=="1"] = "NoRisk"
company = row.names(outcome)
outcome =cbind(company, outcome)
write.csv(outcome, file = '/Users/jbaker/Documents/BDL/data/Medicare
Data//modeldata/outcome.csv')
New Data Looks like this
.
.
.
Do some double checking
 Get a measure of fit using “confusion-matrix”
lda_fit<-lda(class~anomaly+disability+hospitalization+lifethreat+death,data=data, CV=TRUE)
data.table = table(data$class, lda_fit$class)
data.summary = sum(diag(prop.table(data.table)))
data.summary
Use D3js for better graphs
 https://github.com/mbostock/d3/wiki/Gallery
Dendogram for Clusters
 Start by building tree (JSON) of results
 Then follow D3 examples for this
Dendogram for Packed Circles
 Start by building tree (JSON) of results
{"name": "Adverse Reaction Severity",
"children": [{"name": "problem",
"children": [{"name": "more problem",
"children": [{"name": "danger",
"children": [{"name": "Sandoz", "size": 363843},{"name": "Mylan Pharmaceuticals Inc", "size": 483546},{"name": "Qualitest", "size": 379489}
 Then follow D3 examples for this
,
EndPoint Reference
 https://open.fda.gov/drug/event/reference/
 Still needs better documentation
Data Fusion for Prescription Drugs
Product
Medicare
Beneficiary
Files
NDC/
NDA
OpenFDA
Adverse Drug
Events
Medicare
Drug Events
Files
Behavior
NIH
Structured
Product Label
Place Use for NLP
 Doctors Notes in EHR (provides watch words, prognosis)
 Structured Product Language -“the label”, “package insert” (only
place for detailed indication, contra-indications, warnings, etc.)
 Both are always written long form text
OpenFDA Use Cases
 Adverse Effects Risk Model
 Active Ingredients - Effectiveness Model
 Advanced Drug-Drug Interaction Model
 Importation Risk Model
 Adherence Model
 Gaps in Care Model
Case: A/B Test of Top 10 Digital News
 Next Time
Big Data Design Principals
1.
Catalog as many API sources of data as you can
2.
Pick sources with common “folding points”
3.
Research what is novel – e.g. pick your outcome
4.
Sample small to test model(s)
5.
Plan for Scale
6.
Dress for Success (e.g. great graphics + good GUIs)
About Big Data Lens
 Products & Custom Algorithms for Big Data problems
 Specializing in Natural Language Processing & Machine
Learning
 Creates API based big data indices to be used in decision
making
content
data
model
prediction
API
Thank You
Brooke Aker
[email protected]