- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

An Analysis of
“Coronavirus 3CLpro
proteinase cleavage
sites: Possible relevance
to SARS virus pathology”
Connie Wu
Article Resources
BMC Bioinformatics 2004, 5:72
Published on Jun 6, 2004
Article URL
http://www.biomedcentral.com/14712105/5/72
NetCorona URL
http://www.cbs.dtu.dk/services/NetCorona
Outline
SARS outbreak in 2003
Introduction to SARS virus
Experimental database used
Pattern Recognition Method
Neural Network Method
Biological Significance on NetCorona
SARS Outbreak in 2003
A Chinese man was found
to have caught the
infectious respiratory
disease in Hong Kong,
first case emerge from
the general population
since July 2003.
Infected more than
8,000 people in close to
30 nations and killed
more than 750.
SARS Virus
Belongs to the family of human coronavirus,
normally causes mild cold symptoms in human.
The proteolytic cleavage of host proteins by
viral proteinases is found in the pathology of
other virus families such as picornaviruses.
Virus proliferation can be arrested using
specific proteinase inhibitors.
SARS Virus
Experimental database
Seven full-length coronavius genomes
retrieved from the GenBank database.
Each sequence contained eleven 3CLpro
proteinase cleavage sites, given a total 77
identifiable sites.
Identify the main 3CL sites (P1) in
polyproteins using alignment without gaps.
P1 = N-terminal to cleavage site
P1’= C-terminal to cleavage site
Consensus Pattern Recognition
Glutamine (Q) in position P1, and a trend of
strong preference for leucine (L) at position
P2 in found in coronavirus proteinase.
‘LQ’ consensus pattern prediction
60/77 true positives (78%)
196 additional false positives by random
occurrence of this pair of amino acid
‘LQ[S/A]’ consensus pattern prediction
48/77 true positive (62%)
36 additional false positives
Limitations of Pattern Recognition
Simple consensus pattern recognition
(i.e. ‘LQ’)
low specificity
high sensitivity
Sophisticated consensus pattern
recognition (i.e. ‘LQ[S/A]’)
high specificity
low sensitivity
Neural Network
A sequence window of 9 amino acid centered
on the glutamine in the P1 position
A score between 0 and 1 to every glutamine
that is present
Score
> 0.8 = most likely to cleaved
0.5 ~ 0.8 = possibly cleaved
< 0.5 = likely not cleaved
67/77 true positives (87.0%)
1358/1372 true negatives (99.0%)
Neural Network
Three-layered
neural network
Two hidden
neurons
Neural Network Training
Training was done with three-fold
cross-validation and Matthews
correlation coefficients were calculated
by sum up values in all combinations of
training and test sets.
An averaged sum of the score of all
three networks arising from the threefold cross-validation was used for
predition.
Neural Network on Host Cell protein
Cystic fibrosis transmembrane
conductance regulator (CFTR), an ATPdependent chloride channel is predicted
as a cleavage site with a high score
0.842 at Gln762.
Transcription factor OCT-1 is predicted
to be cleaved at Gln62 by the 3CLpro
proteinase with a high confidence score
of 0.874.
Limitation of NetCorona
High specificity
Low sensitivity
Not accurate in predicting sites with
relative low cleavage efficiency in vivo.
Need to disregard high scored cleavage
sites that are inaccessible to the
proteinase.
Significance of NetCorona
Employed by researchers suspecting a
possible viral proteinase cleavage.
Useful if working with coronavirus
function.
May facilitate proteinase inhibitor drug
discovery.
Possible future strategy for drug
development