University Graphic Identity
Download
Report
Transcript University Graphic Identity
Department of Biomedical Informatics
Nanoinformatics: Advancing
in silico Cancer Research
David E. Jones
John D. Morgan Award
Research partially supported by NLM
Training Grant #T15LM007124
1
Department of Biomedical Informatics
What is Nanotechnology?
• The study of controlling and manipulating matter
at the atomic or molecular level
• Focuses on the development of materials,
devices, and other structures at the nanoscale
• Very diverse field that bridges multiple sciences
–
–
–
–
Molecular Biology
Organic Chemistry
Molecular Physics
Material Science
http://www.nanoinstitute.utah.edu/
2
Department of Biomedical Informatics
Nanomedicine Defined
• The medical application of nanotechnology
used in the diagnosis, treatment, and prevention
of diseases in the clinical setting
Department of Biomedical Informatics
Science-to-Informatics
Clinical
Informatics
Bioinformatics
?
Department of Biomedical Informatics
Nanoinformatics
• Defined in 2007 by the United States
National Science Foundation
– Improve research in the field of nanotechnology by
using informatics techniques and tools on
nanoparticle data and information
http://www.nsf.gov/
5
Department of Biomedical Informatics
Background: Nanoinformatics
• National nanotechnology initiative
– Enhance quality and availability of data
• Data acquisition, analysis, and sharing
– Expand theory, modeling, and simulation
• Structural and predictive models
– Informatics infrastructure
• Semantic search and sharing of data/models
• Web-enabled tools for collaboration
http://www.nano.gov/node/681
Department of Biomedical Informatics
Nanomedicine Areas of Focus
In Vitro
Detection
Theranostics
Nanocarriers
http://www.nanotech-now.com
http://www.universityofcalifornia.edu
http://www.wikipedia.org/
7
Department of Biomedical Informatics
Why are Nanocarriers so Important?
• Nanomedicine delivery devices are important to
the future of cancer treatment
– Promising due to their properties
• Suitable size, high solubility, and ability to change design
Tanner P, et. al. Polymeric Vesicles: From Drug Carriers to Nanoreactors and Artificial Organelles. 2011.
8
Department of Biomedical Informatics
Why are Nanocarriers so Important?
• Enhanced permeability and retention (EPR)
effect
Park K. Polysaccharide-based near-infrared fluorescence nanoprobes for cancer diagnosis. 2012.
http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=443431&state:Home=BrO0ABXcTA
AAAAQAADHNlYXJjaFN0cmluZ3QAEG1pUiogYnJhaW4gaGVhcnQ%3D
9
Department of Biomedical Informatics
Types of Nanocarriers
Cho K, et. al. Therapeutic Nanoparticles for Drug Delivery in Cancer. 2008.
Department of Biomedical Informatics
Poly(amido amine) Dendrimers
• PAMAM dendrimers are particularly promising
– Have potential for oral delivery
– Cancer drugs can bind to the surface and interior of
the molecule
– Molecules surface can easily be modified
http://www.dendritech.com
Department of Biomedical Informatics
Design Challenges for Nanocarriers
http://bioserv.rpbs.univ-paris-diderot.fr/services/FAF-Drugs/admetox.html
12
Department of Biomedical Informatics
For Small Molecule Pharmaceutics
• Well known in silico approaches exist
• Quantitative Structure Activity Relationships
(QSAR)
– Analyze the structures and functions of
pharmaceutical and chemical compounds
• Used for many different bioactive molecules in the fields of
medicinal chemistry and cheminformatics
• This method has seen limited application in the ability to
empirically calculate biochemical properties of nanoparticles
13
Department of Biomedical Informatics
Nanoinformatics Challenges
• These approaches have not been used in
nanocarriers for many reasons
– Availability of nanoparticle data
– Actual atomic size of the nanoparticle structures
– Computational capability and algorithms
http://www.nanoinstitute.utah.edu/
14
Department of Biomedical Informatics
Ultimate Goal of this Research
• Demonstrate that in silico aided design of
nanocarriers is possible by developing and
adapting advanced informatics techniques
• Utilize state of the art data mining and machine
learning techniques to develop a model linking
PAMAM dendrimer cytotoxicity to molecular
descriptors and structure of the nanoparticle
15
Department of Biomedical Informatics
Where Do We Start?
• Availability of Nanoparticle Data
– Databases containing information relevant to
biomedical nanoparticles are critical for secondary
uses such as data mining and predictive modeling
Department of Biomedical Informatics
caNanoLab
• Database containing information relevant to
nanomedicine on nanoparticles and their
properties
• Developed by the National Cancer Institute for
sharing nanoparticle information
https://cananolab.nci.nih.gov/caNanoLab/
Department of Biomedical Informatics
caNanoLab
• Issues
– Limited number of nanoparticles (not all inclusive or
current)
– Incomplete information regarding the chemical and
physical properties of nanoparticles
– No simple way to download the data to apply
machine learning or statistical analyses
– There is no ability to query this system and no data
model exists to compare the properties of the
molecule to its biochemical activity
Department of Biomedical Informatics
Data Not Easily Accessible
• Availability of nanoparticle data
– To our knowledge, there is no authoritative, up-todate database
– Manual extraction is not feasible
Department of Biomedical Informatics
Natural Language Processing (NLP)
• Information extraction method
– Used to automatically extract information from an
unstructured (free-text) document
– Shown to be successful in extracting information from
related biomedical fields
http://www.conversational-technologies.com/nldemos/nlDemos.html
Department of Biomedical Informatics
Nano-NLP
• Garcia-Remesal, Maojo, and colleagues
– Text classification method
– Identified:
•
•
•
•
Nanoparticle names
Routes of exposure
Toxic effects
Particle targets
– Successful, but qualitative not quantitative
Department of Biomedical Informatics
Our Approach
• Two-Step process
Text
Classification
Text
Extraction
Department of Biomedical Informatics
Text Extraction Purpose
• Extract numeric values associated with PAMAM
dendrimer properties from the cancer
nanomedicine literature
– NanoSifter
• 10 properties taken from the NanoParticle Ontology (NPO)
• Hydrodynamic diameter, particle diameter, molecular weight,
zeta potential, cytotoxicity, IC50, cell viability, encapsulation
efficiency, loading efficiency, and transfection efficiency
Jones DE, Igo S, Hurdle J, Facelli JC. Automatic Extraction of Nanoparticle Properties Using Natural
Language Processing: NanoSifter an Application to Acquire PAMAM Dendrimer Properties. PloS one.
2014;9(1):e83932. Epub 2014/01/07.
23
Department of Biomedical Informatics
Properties to be Extracted
VARIABLE
Hydrodynamic
Diameter
Particle Diameter
Molecular Weight
Zeta Potential
Cytotoxicity
IC50
Cell Viability
Encapsulation
Efficiency
Loading Efficiency
Transfection
Efficiency
DEFINITION
The hydrodynamic size which is the diameter of a particle or molecule (approximated
as a sphere) in an aqueous solution.
Diameter which inheres in a particle.
The sum of the relative atomic masses of the constituent atoms of a molecule.
The potential difference between the bulk dispersion medium (liquid) and the
stationary layer of liquid near the surface of the dispersed particulate.
Toxicity that impairs or damages cells, and it is a desired property for the killing of
growing tumor cells.
A measure of toxicity which is the concentration of a drug or inhibitor that is required
to inhibit a biological process or a participant's activity in that process by half.
Viability of a cell to proliferate, grow, divide, or repair damaged cell components.
The efficiency inhering in a nanomaterial or supramolecular structure by virtue of its
capacity to encapsulate an amount of molecular entity, isotope or nanomaterial.
A quality inhering in a material entity by virtue of it having the capacity to carry an
amount of another material entity.
The efficiency inhering in a bearer's ability to facilitate transfection.
24
Department of Biomedical Informatics
NanoSifter Extraction Pipeline
25
Department of Biomedical Informatics
NanoSifter Performance
Nanoparticle Property
Term
TP
FP
FN
Recall
Precision
F-measure
Encapsulation Efficiency
1
0
0
1.00
1.00
1.00
Hydrodynamic Diameter
8
0
0
1.00
1.00
1.00
5
41
124
143
211
47
78
19
0
0
18
23
39
8
31
13
0
1
1
2
1
1
0
1
1.00
0.98
0.99
0.99
1.00
0.98
1.00
0.95
1.00
1.00
0.87
0.86
0.84
0.85
0.72
0.59
1.00
0.99
0.93
0.92
0.91
0.91
0.83
0.73
Loading Efficiency
Zeta Potential
Cytotoxicity
Molecular Weight
Particle Diameter
IC50
Cell Viability
Transfection Efficiency
26
Department of Biomedical Informatics
NanoSifter Performance
Type of
Average
Recall
Precision
F-measure
Macro
0.99
0.87
0.92
Micro
0.99
0.84
0.91
27
Department of Biomedical Informatics
NanoSifter Observations
• Recall vs. precision
– Desire a higher recall because this means that we
are capturing most instances (i.e. missing very few in
the literature)
– Tradeoff is that the number of false positives
increases which in turn reduces the precision
28
Department of Biomedical Informatics
NanoSifter Limitations
• Data extracted by our method is not always
directly associated with a dendrimer
nanoparticle
• Only pair a nanoparticle property term with a
single numeric value annotation before and
after itself (co-reference resolution)
• Cannot extract data from tables and figures
29
Department of Biomedical Informatics
NanoSifter Discussion
• Next steps
– Continue work on text classification methods to
improve the precision of the system
– Expand the property terms and numeric values that
the system targets
– Annotate and extract information from other
subclasses of nanoparticles
– Implement some sort of negation analysis tool into
our system
30
Department of Biomedical Informatics
Text Classification Purpose
• Identify and annotate entities in the unstructured
nanomedicine literature
– Augment the text extraction method
– Improve the precision of extracted property data
Department of Biomedical Informatics
Text Classification Pipeline
Department of Biomedical Informatics
Now Have the Necessary Data…
• Data mining and predictive modeling
– Previous studies
• Liu et al. analyzed a number of attributes of a variety of
nanoparticles in order to predict post-fertilization mortality in
zebrafish
• Horev-Azaria and colleagues used predictive modeling to
explore the effect of cobalt-ferrite nanoparticles on the
viability of seven different cell lines
– This method has not been applied to empirically
calculate a prediction of the cytotoxicity of PAMAM
dendrimers
33
Department of Biomedical Informatics
In Silico Platform
Jones DE, Hamidreza Ghandehari, Facelli JC. Data Mining in Nanomedicine: Predicting Toxicity of
PAMAM Dendrimers by Molecular Descriptors and Structure. Submitted 2014.
34
Department of Biomedical Informatics
PAMAM Dendrimers
G4
G3
35
Department of Biomedical Informatics
PAMAM Dendrimers
G5
36
Department of Biomedical Informatics
Molecular Descriptors
Sample
Name
Molecular Weight Aliphatic Atom
(g/mol)
Count
Refractivity
G3 PAMAM
6908.8403
484
1847.28
G4 PAMAM
14214.1651
996
3798.47
G5 PAMAM
28824.8147
2020
7700.85
37
Department of Biomedical Informatics
Classification Analysis
• Initial analysis
Classifier
J48
Bagging
Filtered
Classifier
LWL
SMO
Classification
via Regression
DTNB
NBTree
Decision Table
Naïve Bayes
Precision
0.838
0.836
0.789
Recall
0.835
0.835
0.748
F-Measure
0.836
0.835
0.750
Accuracy
83.5%
83.5%
74.8%
0.775
0.738
0.724
0.738
0.738
0.728
0.741
0.725
0.723
73.8%
73.8%
72.8%
0.691
0.681
0.678
0.621
0.670
0.670
0.660
0.602
0.674
0.673
0.664
0.607
67.0%
67.0%
66.0%
60.2%
38
Department of Biomedical Informatics
Classification Analysis
• Feature selection analysis
Classifier
Precision Recall
F-Measure ROC Area
Accuracy
J48
0.888
0.883
0.884
0.844
88.3%
Filtered
0.736
0.718
0.722
0.800
71.8%
0.819
0.767
0.769
0.834
76.7%
Classifier
LWL
39
Department of Biomedical Informatics
J48 Decision Tree
40
Department of Biomedical Informatics
Regression Analysis
Prediction of Cell Viability
110
y = 0.4174x + 55.086
RMS Error = 14.21%
100
Predicted
90
80
70
60
50
20
30
40
50
60
70
80
90
100
110
120
Actual
41
Department of Biomedical Informatics
Discussion
• Greatest prediction accuracies were achieved
after supplementing the expert selected features
with experimental conditions
• The properties presented in the decision tree
diagram represent the more general properties
of charge, size, and concentration
• Experimentally, these properties have been
hypothesized to be primary causes of
cytotoxicity
42
Department of Biomedical Informatics
Conclusion
• The results indicate that data mining and
machine learning can be used to predict
cytotoxicity and cell viability of PAMAM
dendrimers on Caco-2 cells with good accuracy
• Nanoinformatics methods could be implemented
to significantly reduce the search space
necessary to create suitable PAMAM
dendrimers which exhibit less cytotoxicity
43
Department of Biomedical Informatics
References
1. Jain K. The Handbook of Nanomedicine. 1st ed. Totowa, New Jersey: Humana; 2008.
2. Staggers N, McCasky T, Brazelton N, Kennedy R. Nanotechnology: the coming revolution and its implications for consumers, clinicians, and informatics. Nursing outlook.
2008;56(5):268-74. Epub 2008/10/17.
3. de la Iglesia D, Maojo V, Chiesa S, Martin-Sanchez F, Kern J, Potamias G, et al. International efforts in nanoinformatics research applied to nanomedicine. Methods of
information in medicine. 2011;50(1):84-95. Epub 2010/11/19.
4. Thomas DG, Pappu RV, Baker NA. NanoParticle Ontology for cancer nanotechnology research. J Biomed Inform. 2011;44(1):59-74. Epub 2010/03/10.
5. National Cancer Institute. caNanoLab. 2011 [cited 2011]; Welcome to the cancer Nanotechnology Laboratory (caNanoLab) portal. caNanoLab is a data sharing portal
designed to facilitate information sharing in the biomedical nanotechnology research community to expedite and validate the use of nanotechnology in biomedicine. caNanoLab
provides support for the annotation of nanomaterials with characterizations resulting from physico-chemical and in vitro assays and the sharing of these characterizations and
associated nanotechnology protocols in a secure fashion.]. Available from: https://cananolab.nci.nih.gov/caNanoLab/.
6. Hunter L, Lu Z, Firby J, Baumgartner WA, Jr., Johnson HL, Ogren PV, et al. OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to
capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC bioinformatics. 2008;9:78. Epub 2008/02/02.
7. Garcia-Remesal M, Garcia-Ruiz A, Perez-Rey D, de la Iglesia D, Maojo V. Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from
the literature. BioMed research international. 2013;2013:410294. Epub 2013/03/20.
8. Cunningham H, al. e. Text Processing with GATE: University of Sheffield Department of Computer Science; 2011.
9. Yang Y. An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval. 1999;1(1-2):69-90.
10. Tropsha A, Golbraikh A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Current pharmaceutical design. 2007;13(34):3494-504.
Epub 2008/01/29.
11. Liu X, Tang K, Harper S, Harper B, Steevens JA, Xu R. Predictive modeling of nanomaterial exposure effects in biological systems. International journal of nanomedicine.
2013;8 Suppl 1:31-43. Epub 2013/10/08.
12. Horev-Azaria L, Baldi G, Beno D, Bonacchi D, Golla-Schindler U, Kirkpatrick JC, et al. Predictive toxicology of cobalt ferrite nanoparticles: comparative in-vitro study of
different cellular models using methods of knowledge discovery from data. Particle and fibre toxicology. 2013;10:32. Epub 2013/07/31.
13. ChemAxon, Berry I, Ruyts B. Future-proofing Cheminformatics Platforms2012 10/31/2013:[1-16 pp.]. Available from: http://www.chemaxon.com/wpcontent/uploads/2012/04/Future_proofing_cheminformatics_platforms.pdf.
14. Ltd. C. Marvin. 2013.
15. Witten I, Frank E, Hall M. Data Mining: Practical Machine Learning Tools and Techniques. 3 ed: Morgan Kaufmann Publishers; 2011. 629 p.
16. Vasumathi V, Maiti PK. Complexation of siRNA with Dendrimer: A Molecular Modeling Approach. Macromolecules. 2010;43:8264-74.
17. Karatasos K, Posocco P, Laurini E, Pricl S. Poly(amidoamine)-based dendrimer/siRNA complexation studied by computer simulations: effects of pH and generation on
dendrimer structure and siRNA binding. Macromolecular bioscience. 2012;12(2):225-40. Epub 2011/12/08.
44
Department of Biomedical Informatics
Acknowledgements
• Morgan Family
• National Library of Medicine Training Grant
• Department of Biomedical Informatics at the University
of Utah
• Ph.D. Committee
–
–
–
–
–
Julio C. Facelli, Ph.D.
Hamidreza S. Ghandehari, Ph.D.
John F. Hurdle, M.D., Ph.D.
Karen Eilbeck, Ph.D.
Bruce E. Bray, M.D.
45
Department of Biomedical Informatics
Questions
46