Data Mining on the Farm

Download Report

Transcript Data Mining on the Farm

Data Mining on the Farm
Accelerating the search for a
better pesticide
John B. Kinney, Senior Research Associate
DuPont Biosolutions Enterprise
© 2001, DuPont, Inc. - All Rights Reserved.
Spotfire User’s Conference, May 3-4 2001
DuPont Biosolutions
Enterprise
Crop Protection Products*
Control weeds, insects, and plant diseases
Pioneer Hi-Bred
High performance seeds
Protein Technologies
Soy protein isolates used in the food industry
Qualicon
Food safety
* Focus of today’s talk
CPP Goals
Control pests
Efficaciously
Safely
Environmentally
Cost effectively
CPP Research & Data Styles
in vitro
in vivo
Field
in vivo CPP Research
“Treating the bed to cure the patient”
Plants in pots
Length of test is a factor
“Extra” data
Herbicide Test Unit
Control
Test Substance
MOG
BGC
FTI
VEL
BYG
PWX
CRL
Field Tests
Same as in vivo, but with less control!
“Extra” data
Degradation and movement in the
environment are major issues
Data Issues
Biological variability
(Highly) Multivariate data
EC50 results are uncommon
Historical data is valuable
Successful Applications of
Data Visualization
Sourcing: Preformatted data sets for
sample acquisition analysis
Hit Followup: R-group visualization and
analysis
Lead Optimization: Color-coded reports
for rapid, high-dimensional comparisons
Browsing Acquisition Analysis
Data
Challenge: Characterize and evaluate
offerings from compound brokers and
collaborators
Solution: External system to characterize
offerings and build tables for browsing in
Spotfire
Minimal interface...
User selection from
existing “evaluation
tables”
Spotfire for browsing
Parallel Synthesis Hit
Followup
Visualization and analysis of combinatorial
library
Row and Column layout useful, but not
chemically relevant!
Merging synthetic schemes
combined with biology
Hansch-style characterization often helpful
for identifying trends and features
R2
R1 == methyl, ethyl, propyl, etc
R2 == -F, -Cl, -Br, -I
R1
N
Fragment properties and whole molecule
data can provide insights
Plate layout vs. Fragment
Data
Lead Optimization
Numerous test and characterization values
for each compound
History of complex, printed data reports
PRIMARY PLANT RESPONSE (WEEDS)
INCODE = CPD1
DEPT = 8 DATE = 891127
SUBMITTER =
N.B = 056898
N.B.PAGE = 021
AMT =.21G
% = 100 FORM =
LEAD AREA =
YY/MM/DD TYPE
RATE UNITS MORN
TEST
GLORY
-------- ---- ------- ----- ----90/01/02 POST
1.0 KG/HA 90H
90/01/02 PRE
2.0 KG/HA 0
COCKL
BUR
----70H
10H
VELV PIG
LEAF WEED
----- ----70H
30H
###############################################
#
#
#INCODE= CPD1
#
#
#
#
#
#/MOLNM
#
#/Info= CHEMICAL NAME AVAILABLE UPON REQUEST #
#
#
#
#
###############################################
CRAB
GRASS
----10C
0
GIANT FOXTL B Y
CHEAT DOWNY WILD
FOXTL MILLT GRASS GRASS BROME OATS
----- ----- ----- ----- ----- ----10C
0
40G
0
20H
0
0
0
SOR
GUM
----30C
30C
COMMENT
----------HERBICIDE
HeLo
Project Overview w/Heat
Maps
Future Challenges
Better data extraction/formatting
techniques
Expanding data warehouse to include
non-traditional data sources
Computer screen real estate!
Acknowledgements
At the risk of missing someone...
Kevin Kranis (retired)
Laurie Christianson
Dan Kleier
The entire Discovery Organization
-- They generated the data!