slide - logo Molecular Biomedical Informatics分子生醫資訊實驗室
Download
Report
Transcript slide - logo Molecular Biomedical Informatics分子生醫資訊實驗室
Molecular Biomedical Informatics
分 子 生 醫 資 訊 實 驗 室
Machine Learning and Bioinformatics
機 器 學 習 與 生 物 資 訊 學
Machine Learning & Bioinformatics
1
PPI
Protein-Protein Interaction
Machine Learning & Bioinformatics
2
http://www.biomol.de/details/RL/Akt-Signaling-Pathway.jpg
Notes of Akt signaling pathway
Akt is a kinase
Kinase act on specific molecules (usually other proteins)
– a type of enzyme, thus a type of protein
Enzyme catalyzes the reaction, but does not change during the
reaction (neither reactant nor product)
– like a molecule machine/tool
– a type of protein
Cytokine carry signals between cells
– a type of protein
Protein is a class of molecules with specific chemical structure
– such naming strategy is widely adopted such as carbohydrate and lipid
Machine Learning & Bioinformatics
4
Various PPIs
By contact type
– physical interaction (complex, transient touch, …)
– genetic association (co-functional, co-expressed, …)
By role
– co-work
– work individually (mutually redundant)
– regulate (activate, repress, …)
– act on (catalyze, inhibit, …)
– participate the same pathway (downstream, upstream, …)
Machine Learning & Bioinformatics
5
What’s
the difference to gene?
Machine Learning & Bioinformatics
6
http://www.uic.edu/classes/bios/bios100/lectures/geneticsignaling.jpg
Notes of gene expression
DNA RNA protein
– DNA is the blueprint, hard to damage thus hard to manipulate
– RNA is the transcript, very similar to DNA and more active
– protein is the final product
Gene is a DNA sequence that can perform specific
functions
– usually becomes functional after translating to protein
These terms are sometimes interchangeably
– some PPIs are defined by the interactions among the
corresponding DNAs/RNAs
Machine Learning & Bioinformatics
8
Experimental techniques
since there are various PPIs…
Machine Learning & Bioinformatics
9
Shoemaker and Panchenko, 2007
Notes of experimental techniques
(A) yeast two-hybrid (Y2H) detects interactions between proteins X and Y, where
X is linked to BD domain which binds to upstream activating sequence (UAS) of a
promoter
(B) mass spectroscopy (MS) identifies polypeptide sequence
(C) tandem affinity purification (TAP) purifies protein complexes and removes the
molecules of contaminants
(D) gene co-expression analysis produces the correlation matrix where the dark
areas show high correlation between expression levels of corresponding genes
(E) protein microarrays (protein chips) can detect interactions between actual
proteins rather than genes: target proteins immobilized on the solid support are
probed with a fluorescently labeled protein
(F) synthetic lethality method describes the genetic interaction when two individual,
nonlethal mutations result in lethality when administered together (a-b-)
(all these are high-throughput)
Machine Learning & Bioinformatics
11
We can “see” the interaction
http://www.informaworld.com/ampp/image?path=/713599661/793610806/tfac_a_300921_o_f0001g.png
Computational approaches
what we can do
Machine Learning & Bioinformatics
13
Shoemaker and Panchenko, 2007
Notes of computational approaches
(A) gene cluster and gene neighborhood methods, different boxes showing
different genes
(B) phylogenetic profile method, showing the presence/absence of four
proteins in three genomes
(C) Rosetta Stone method
(D) sequence co-evolution method looking for the similarity between two
phylogenetic trees/distance matrices
(E) classification methods shown with the example of random forest
decision (RFD) method, where five different features/domains are used and
each interacting protein pair is encoded as a string of 0, 1 and 2
– the decision trees are constructed based on the training set of interacting protein
pairs and decisions are made if proteins under the question interact or not
(‘‘yes’’ for interacting, ‘‘no’’ for non-interacting)
Machine Learning & Bioinformatics
15
Classification approaches
Also called machine learning-based
approaches
– classification is so-called “supervised learning”
The most critical step is
– to encode a protein pair as a vector
– (to extract appropriate features)
Machine Learning & Bioinformatics
16
How do you recognize
man and woman?
http://www.sagennext.com/wp-content/uploads/2010/02/Business-Man-and-Woman1.jpg
Notes of feature encoding
Know the problem (domain knowledge)
You may not know which feature is important (e.g. hair length vs. eyesight)
You may not have the key feature
– e.g. no height when given only mug shots
– e.g. collecting body fat is much difficult
– carefully define the problem and what materials are available
You (usually) may not know the key feature
– e.g. suppose that the sex chromosomes are unknown
– depicting the mechanism is much important than just predicting
The key features may change (e.g. hair length)
There are always exceptions (e.g. bisexual)
Machine Learning & Bioinformatics
18
Materials that we can support
Biological process
Orthologous
Cellular compartment
Pathway
DNA sequence
Protein sequence
Domain
TATA box
Expression
Transcription ends
Genomic location
TF binding
No. of references
TFBS
Molecular function
TF knockout expression
Machine Learning & Bioinformatics
19