Data Mining on the Farm
Download
Report
Transcript Data Mining on the Farm
Data Mining on the Farm
Accelerating the search for a
better pesticide
John B. Kinney, Senior Research Associate
DuPont Biosolutions Enterprise
© 2001, DuPont, Inc. - All Rights Reserved.
Spotfire User’s Conference, May 3-4 2001
DuPont Biosolutions
Enterprise
Crop Protection Products*
Control weeds, insects, and plant diseases
Pioneer Hi-Bred
High performance seeds
Protein Technologies
Soy protein isolates used in the food industry
Qualicon
Food safety
* Focus of today’s talk
CPP Goals
Control pests
Efficaciously
Safely
Environmentally
Cost effectively
CPP Research & Data Styles
in vitro
in vivo
Field
in vivo CPP Research
“Treating the bed to cure the patient”
Plants in pots
Length of test is a factor
“Extra” data
Herbicide Test Unit
Control
Test Substance
MOG
BGC
FTI
VEL
BYG
PWX
CRL
Field Tests
Same as in vivo, but with less control!
“Extra” data
Degradation and movement in the
environment are major issues
Data Issues
Biological variability
(Highly) Multivariate data
EC50 results are uncommon
Historical data is valuable
Successful Applications of
Data Visualization
Sourcing: Preformatted data sets for
sample acquisition analysis
Hit Followup: R-group visualization and
analysis
Lead Optimization: Color-coded reports
for rapid, high-dimensional comparisons
Browsing Acquisition Analysis
Data
Challenge: Characterize and evaluate
offerings from compound brokers and
collaborators
Solution: External system to characterize
offerings and build tables for browsing in
Spotfire
Minimal interface...
User selection from
existing “evaluation
tables”
Spotfire for browsing
Parallel Synthesis Hit
Followup
Visualization and analysis of combinatorial
library
Row and Column layout useful, but not
chemically relevant!
Merging synthetic schemes
combined with biology
Hansch-style characterization often helpful
for identifying trends and features
R2
R1 == methyl, ethyl, propyl, etc
R2 == -F, -Cl, -Br, -I
R1
N
Fragment properties and whole molecule
data can provide insights
Plate layout vs. Fragment
Data
Lead Optimization
Numerous test and characterization values
for each compound
History of complex, printed data reports
PRIMARY PLANT RESPONSE (WEEDS)
INCODE = CPD1
DEPT = 8 DATE = 891127
SUBMITTER =
N.B = 056898
N.B.PAGE = 021
AMT =.21G
% = 100 FORM =
LEAD AREA =
YY/MM/DD TYPE
RATE UNITS MORN
TEST
GLORY
-------- ---- ------- ----- ----90/01/02 POST
1.0 KG/HA 90H
90/01/02 PRE
2.0 KG/HA 0
COCKL
BUR
----70H
10H
VELV PIG
LEAF WEED
----- ----70H
30H
###############################################
#
#
#INCODE= CPD1
#
#
#
#
#
#/MOLNM
#
#/Info= CHEMICAL NAME AVAILABLE UPON REQUEST #
#
#
#
#
###############################################
CRAB
GRASS
----10C
0
GIANT FOXTL B Y
CHEAT DOWNY WILD
FOXTL MILLT GRASS GRASS BROME OATS
----- ----- ----- ----- ----- ----10C
0
40G
0
20H
0
0
0
SOR
GUM
----30C
30C
COMMENT
----------HERBICIDE
HeLo
Project Overview w/Heat
Maps
Future Challenges
Better data extraction/formatting
techniques
Expanding data warehouse to include
non-traditional data sources
Computer screen real estate!
Acknowledgements
At the risk of missing someone...
Kevin Kranis (retired)
Laurie Christianson
Dan Kleier
The entire Discovery Organization
-- They generated the data!