katres - University College London

Download Report

Transcript katres - University College London

Overview
A Quantum Computation Simulation Language
Anomaly Detection in the Windows Registry
Detecting Splice Sites in Genes
Rotationally Invariant Face Detection
-HSK
A Quantum Programming
Language and Compiler
Katherine Heller, Krysta Svore, Maryam Kamvar
(Al Aho)
What is
-HSK?
Quantum Computation Simulation Language
Quantum Compiler
Q-HSK enables simplified programming of
quantum algorithms with built-in graphics
Many Worlds Interpretation
One formulation of quantum theory
Each universe has a corresponding
amplitude (i.e. complex number)
|amplitude|2 = probability of existence
x
u3
u1
u2
u4
Qubits
Quantum analogue of a classical bit
Takes on values 0, 1, or superposition of states:
›
›
›
where |α|2 + |β|2 = 1
|ω = α|0 + β |1
›
›
›
| ω = cos(θ / 2) | 0 + eiφ sin(θ / 2) | 1
Quantum Gates
Reversible – all unitary operators (U† U=I)
Universal quantum gates – {U2,XOR}, Toffoli
Some common gates – Hadamard, QFT, CNOT
›
|1
H
H
› + | 1›)
1/√2 ( | 0
›
|0
Key Features of
the Q-HSK Compiler
Familiar C-style syntax
Matrix operations via CBLAS
Complex and real data types
A quantum type qreg
A graphical view of quantum algorithms
Lucid representation of quantum qubits, registers, and gates
Interactive user options (start, stop, pause, change
animation rate)
Detailed text output to trace algorithm
A Simple Example
int main( )
{
int a, i;
qreg *q;
q=create(5);
i = 0;
q
while (i < 5)
{
q[i] = (0.0, 0.0);
i = i + 1;
}
q = computeHadamard(q);
a = Measure(q);
printf(“This is the measure:
return 0;
}
0
0
0
0
0
H
%d”, a);
M
Shor’s Algorithm
Factors large numbers
n - number to factorize
x – random number
a – ranges from 0 to q-1
n2<=q<=2n2
r – period of xa (mod n) – exp. classically
one factor of n is gcd(xr/2-1,n) – fast classically
Graphical Interface
Architecture of Q-HSK Compiler
lex.yy.c
Program.q
Lexical Analyzer
Program.cpp
y.tab.c
Syntax Analyzer
translate.c
Semantic Analyzer
g++
Executable
Java
javac
Graphics
Translator
One Class Support Vector Machines
for Detecting Anomalous Windows
Registry Accesses
Collaborators: Krysta Svore, Angelos Keromytis, Sal Stolfo
Host Based Intrusion Detection
Systems
Microsoft Windows – most often attacked
Current method to combat attacks
Virus Scanners and Security Patches
Problem: These do not combat unknown attacks
so frequent updates are needed
Host based IDS
Monitor system accesses to detect intrusions
Application of data mining techniques
The Windows Registry and RAD
Windows Registry
Stores configuration settings for system
parameters – security information, programs, etc.
Programs query the registry for information
Process: EXPLORER.EXE
Query: OpenKey
Key: HKCR\CKSUD\{B41DB860-8EE4-11D2-9906-EA9FADC173CA}\shellex\MayChangeDefaultMenu
Response: SUCCESS
ResultValue: NOTFOUND
Registry Anomaly Detection
audit sensor
model generator
anomaly detector
Probabilistic Anomaly Detection
Algorithm
Computes 25 consistency checks:
P(Xi) and P(Xi|Xj)
Multinomial with Hierarchical Prior
For observed elements i:
P(X = i) = C*(Ni + α)/(k0α+N)
where N - total number of observations
Ni - number of observations of symbol I
α – “pseudo count” for each observed symbol
k0 – number of observed symbols
L – number of possible symbols
For unobserved elements i:
P(X = i) = (1-C)*1/(L-k0)
C= N/(N+L-k0 )
One Class SVMs
Analogous to two class SVM where all data lies in the first class
and the origin is sole member of second class
Solve optimization problem to find rule f with maximal margin
f(x)=‹w,x›+b
Equivalent to solving the dual quadratic programming problem:
minα (1/2) ∑I,j αiαjK(xi,xj)
s.t.
0≤αi≤1/(νl) , ∑i αi = 0
Kernel function projects input vectors into a feature space allowing
for non-linear decision boundaries
Φ: X → RN
K(xi,xj) = ‹Φ(xi), Φ(xj)›
Experiments
Kernels:
Linear: K(x,y) = (x·y)
Polynomial: K(x,y) = (x·y+1)d
Gaussian: K(x,y) = e -║x-y║2/(2σ2)
Feature Vectors:
Binary
Frequency-based
Results
Sequence Information for the
Splicing of Human Pre-mRNA
Identified by Support Vector
Machine Classification
Collaborators: Xiang Zhang, Ilana Hefter, Christina Leslie, Larry Chasin
What Is Splicing?
Donor
Branch
Acceptor
DNA
Exon1
Intron
Exon1
Exon2
Exon2
mRNA
Exon1
Exon2
Pseudo Exons
Consensus Sequences
Donor Site: MAG|gtragt (M=A/C, r=a/g)
Acceptor Site: (y)10ncag|G (y=c/t, n=a/c/g/t)
Donor and acceptor sites scored based on
closeness to consensus
Identifying Pseudo Exons
Intronic segments
Have high scoring “donor” and “acceptor” sites
We look for discriminative signals in intronic
regions near real and pseudo exons
String Kernels
Feature map: number of times each k-length
(contiguous) string occurs in sequence
Dimension of feature space is Nk
Example:
k=2
0
Sequence = ACCTGGTG
1
0
0
0
1
0
1
0
0
1
1
0
0
2
0
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
Splice Kernels
Hypothesis: False splice sites are
intrinsically defective due to bad internal nt
combinations
All possible size k internal nt combinations
are features
Example (k=2): If the internal combination
(3g,5a) occurs, that feature value is 1,
otherwise it is 0
Recursive Feature Selection
Normal vector to the hyperplane:
w=∑i=1..m yiαixi
If |wj| large in absolute value, the jth feature is
important for SVM discrimination
Approximation due to degree 2 polynomial
kernel – calculate wup and wdown separately, then
eliminate bottom 50% of features for each
Stop when ROC score drops below 90% of
original value on untouched test set
Results
Flanks
US
Splice Sites
DS
3’
Exon
Body
ROC
Specificitya
5’
CVb
Splice
Sites
Flanks
Exon
Bodies
True positives detected
32/37
35/37
37/37
0.609
0.484
-
-
-
1225
1225
1225
-
+
-
164
259
668
-
-
+
108
232
383
+
-
+
58
111
180
+
+
+
19
53
90
+
–
–
–
–
0.791
0.638
–
+
–
–
–
0.784
0.618
+
+
–
–
–
0.855
0.695
–
–
+
–
–
0.823
0.672
–
–
–
+
–
0.837
0.698
–
–
+
+
–
0.907
0.777
+
+
+
+
–
0.932
0.825
–
–
–
–
+
0.946
0.841
+
+
–
–
+
0.984
0.956
–
–
+
+
+
0.987
0.964
+
+
+
+
+
0.991
0.976
Rotationally Invariant Face
Detection Using Multi-Resolution
Histograms
Collaborators: Shikher Bisaria, Tony Jebara
Face Detection
Given a picture with faces, how do we
determine where the faces are in the
image? Which pixels are face pixels?
We would like to determine this with a
system that:
Runs in real time
Recognizes rotations of faces
(e.g. when someone tilts their head to one side)
Gaussian Blurring
Face images are greyscale (.pgms)
Successive levels of blur are obtained by
reconvolving previous level of blur images with a
2 dimensional gaussian function
Mathematically equivalent to two passes of a
one dimensional gaussian function
g(i,j) = 1/(2πσ2) ∑m∑n e -(m2+n2)/(2σ2) · f(i-m,j-n)
= 1/(2πσ2) ∑m e -m2/(2σ2) · ∑n e -n2/(2σ2) · f(i-m,j-n)
Multi-Resolution Histograms
Histogram equalize the image
Concatenate histograms of image together after
successive levels of gaussian blurring
Average Histograms
Compute average face and non-face
multi-resolution histograms from training set
Average Non-Face Histogram
Average Face Histogram
Optimization Problem
C(α) = minα ║H
FAVG
Where
– h ║2 + ║H
F
NFAVG
– h ║2
NF
h = (1/∑i αi) ∑i αihi
h = (1/∑i (1- αi)) ∑i (1-αi)hi
0≤ αi ≤ 1 , ∑i αi = 1
F
NF
such that
Let βi = (1- αi)
Q = ‹hi,hj›
cα = ‹hi,H › · constant
cβ = ‹hi,H › · constant
FAVG
NFAVG
= minα,β αTQα + 1/(N-1)2 βTQβ – 2cαTα – 2/(N-1)cβTβ
Solve Using SMO
αiNEW = [ 1/(N-1)2 Qii - 1/(N-1)2 ∑k≠i,jαk Qjj + (1- ∑k≠i,jαk ) Qjj
- (1- ∑k≠i,jαk ) Qij + 1/(N-1)2 ∑k≠i,jαk Qij - 1/(N-1)2 Qij - cαi
+ cβi + cαj - cβj + ∑k≠i,j(αk Qik) - ∑k≠i,j(αk Qjk)
- 1/(N-1)2 ∑k≠i,j(αk Qik) + 1/(N-1)2 ∑k≠i,j(αk Qjk)] / [Qii + Qjj
- 2Qij + 1/(N-1)2 Qii + 1/(N-1)2 Qjj - 2/(N-1)2 Qij]
Bounds for αiNEW :
L=0
H = 1 - ∑k≠i,jαk
αjNEW = (1 - ∑k≠i,jαk ) - αiNEW
Results