Transcript BioE_CIT
Biological Signal Detection for Protein Function Prediction
Sequences
Investigators: Yang Dai
Prime Grant Support: NSF
Text File of
Protein
description
Problem Statement and Motivation
Coding
Vector
s
MASVQLY ... …HKEPGV
• High-throughput experiments generate new protein
sequences with unknown function prediction
•In silico protein function prediction is in need
Machine Learner
specific subcellular
and subnuclear localization
Technical Approach
•Protein subcellular localization is a key element in
understanding function
•Such a prediction can be made based on protein
sequences with machine learners
•Feature extraction and scalability of learner are keys.
Key Achievements and Future Goals
• Use Fast Fourier Transform to capture long range
correlation in protein sequence
•Developed highly sophisticated sequence coding
methods
• Design a class of new kernels to capture subtle
similarity between sequences
•Developed an integrated multi-classification system for
protein subcellular localization
•Use domains and motifs of proteins as coding vectors
•Developed a preliminary multi-classification system for
subnuclear localization
•Use multi-classification system based on deterministic
machine learning approach, such as support vector
machine
• Use Bayesian probabilistic model
• Will incorporate various knowledge from other
databases into the current framework
• Will design an integrative system for protein function
prediction based on information of protein localizations,
gene expression, and protein-protein interactions
Computational Protein Topographics for Health Improvement
Jie Liang, Ph.D. Bioengineering
Prime Grant Support: National Science Foundation Career Award, National Institutes of Health R01,
Office of Naval Research, and the Whitaker Foundation.
Protein surface matching
Problem Statement and Motivation
• The structure of proteins provide rich information about
how cells work. With the success of structural genomics,
soon we will have all human proteins mapped to
structures.
• However, we need to develop computational tools to
extract information from these structures to understand
how cell works and how new diseases can be treated.
•Therefore, the development of computational tools for
surface matching and for function prediction will open the
door for many new development for health improvement.
Evolution of function
Technical Approach
Key Achievements and Future Goals
• We use geometric models and fast algorithm to
characterize surface properties of over thirty protein
structures.
• We have developed a web server CASTP (cast.engr.
uic.edu) that identify and measures protein surfaces. It
has been used by thousands of scientists world wide.
• We develop evolutionary models to understand how
proteins overall evolve to acquire different functions
using different combination of surface textures.
• We have built a protein surface library for >10,000
proteins, and have developed models to characterize
cross reactivities of enzymes.
• Efficient search methods and statistical models allow us
to identify very similar surfaces on totally different
proteins
• We also developed methods for designing phage library
for discovery of peptide drugs.
• Probablistc models and sampling techniques help us to
understand how protein works to perform their functions.
• We have developed methods for predicting structures
of beta-barrel membrane proteins.
• Future: Understand how protein fold and assemble, and
designing method for engineering better proteins and
drugs.
Structural Bioinformatics Study of Protein Interaction Network
Investigators: Hui Lu, Bioengineering
Prime Grant Support: NIH, DOL
Protein-DNA complex:
gene regulation
DNA repair
cancer treatment
drug design
gene therapy
Problem Statement and Motivation
• Protein interacts with other biomolecules to perform a
function: DNA/RNA, ligands, drugs, membranes, and other
proteins.
• A high accuracy prediction of the protein interaction
network will provide a global understanding of gene
regulation, protein function annotation, and the signaling
process.
• The understanding and computation of protein-ligand
binding have direct impact on drug design.
Technical Approach
• Data mining protein structures
• Molecular Dynamics and Monte Carlo simulations
• Machine learning
• Phylogenetic analysis of interaction networks
Key Achievements and Future Goals
• Developed the DNA binding protein and binding site
prediction protocols that have the best accuracy
available.
• Developed transcription factor binding site prediction.
• Gene expression data analysis using clustering
• Developed the only protocol that predicts the protein
membrane binding behavior.
• Binding affinity calculation using statistical physics
• Will work on drug design based on structural binding.
• Will work on the signaling protein binding mechanism.
• Will build complete protein-DNA interaction prediction
package and a Web server.