2016-01-13__VALSE_XiongHongkaix

Download Report

Transcript 2016-01-13__VALSE_XiongHongkaix

VALSE-Webinar Lecture
Structured Modeling and Learning in
Generalized Data Compression and Processing
Hongkai Xiong
熊红凯
http://ivm.sjtu.edu.cn
电子工程系
上海交通大学
13 Jan. 2016
Sparse Representation

Sparse representation x  
where x  R N 1 ,   R N L ,   R L1 , 
L
0
 K , K  N , L
Σ
=
N
x
Ψ
θ
3
Multimedia Communication
1D signal
(audio)
2D signal
(image)
3D signal
(video)
High dimension
SVC
Sources
独立编码
视图1
DVC
视图2
K
W
Z
Y
Y
视图3
W
Z
K
W
Z
Y
Y
W
Z
K
联合解码
Video coding: advances in higher dimension and higher resolution, in goals of better
R-D behavior and greater compression rate
Networks: develops to multiple data streaming within heterogeneous network structure,
in goals of higher throughput and transmission reliability
Unicast
One-to-one
Networks
Multicast
Many-to-many
Multicast
One-to-many
无线传感器
网络平面
视频摄像机
视场平面
4
Generalized Context Modeling
in Signal Processing
 Wenrui Dai, Hongkai Xiong, J. Wang, S. Cheng, Y. F. Zheng, "Generalized
Context Modeling with Multi-Directional Structuring and MDL-based Model
Selection for Heterogeneous Data Compression", IEEE Trans. Signal
Processing, 2015.
 Wenrui Dai, Hongkai Xiong, J. Wang, et al., “Discriminative structured set
prediction modeling with max-margin Markov network for lossless image
coding,” IEEE Transactions on Image Processing, 2014.
 Wenrui Dai, Hongkai Xiong, X. Jiang, et al., “Structured set intra prediction
with discriminative learning in max-margin Markov network for High
Efficiency Video Coding,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 23, no. 11, pp. 1941-1956, 2013.
Heterogeneous Data Compression: Data


Heterogeneous data are generated by multiple interlacing
sources complied with different incomplete distribution
statistics
Image & Video: Spatial correlations are characterized by
piecewise smooth with local oscillatory patterns like
multiscale edges and textures

Genome sequence: Repeatable patterns of nucleotides in
various regions

Executable files: Multiple interlacing data streams, e.g.
opcode, displacement, and immediate data
Heterogeneous Data Compression:
Framework
Data to be
predicted
Context
Model
Estimated
probability
Structured probabilistic model
 Capture regular patterns
 Optimized for specific data
 Context-based set prediction
Encoded
bitstream
Coder
01011101……
Classic context model
 Variable order
 Sequential prediction
 Weighted estimation
Background
 Context is suffix of data symbols in classical context modeling.
 Define C(s) the set of subsequences whose suffix is s.
 Disjoint property: no string in the context set is a suffix of any
= s0, C(s) \ C(s0) = ; ;
other string in this set, or for s 6
 Exhaustive property: each subsequence of data symbols can
find its suffix in the context set, or [ s2 SC(s) = A D.
7
Foundation: Structured Prediction Model
 Graphical probabilistic model
 Structured prediction model can be represented in the
form of graph.
 Each node for the random variable to predict and each
edge for the interdependencies among nodes.
 Estimating joint or conditional distribution for its nodes.
 Learning methods
 Markov random field (MRF)
 Max-margin Markov network: MRF + SVM
 Reasoning algorithms
 Belief propagation (BP)
 Expectation propagation (EP)
Structured Probabilistic Model: Motivation
 Complex data structure
 Cannot be analytically represented
 Adaptively capture features with learning-based
algorithms
 Incomplete distribution
 Cannot exactly estimate parameters for actual distribution
 Context-based predictive model using learning-based
algorithms
 Structural coherence
 Cannot guarantee structural coherence of prediction task
with isolated prediction
 Structured probabilistic model to constrain prediction task
with structural coherence
10
Motivation
Perceptual Intuition
Intuition 1: Structural coherence for heterogeneous data
An example: Images generated with the same local distribution
Structural
coherence
• A natural images is not a 2D array of pixels generated with probabilistic
distribution. The structural coherence is maintained to keep an image
meaningful.
• Structured prediction model is proposed to maintain such coherence.
Generalized Context Modeling
Motivation
Intuition 2: Complex structure for heterogeneous data
Statistics of heterogeneous data is not sequential and with uniform
distribution, but is flexible and with interlaced complex distribution.
Prediction based on sequential,
contexts with uniform distribution
Prediction based on flexibly constructed
contexts with interlaced complex distribution
12
Problem Definition
Pixel-wise
prediction to
impair the
structure
Parallel prediction to
keep structure
Challenge:


Linear Prediction without
high-order nonlinearity
Independent Prediction
without inter-dependency
Similar PDF
Contribution:
 Structure consistence
→
Structured Set
Structured Probabilistic Model:
Example
4x4 block at
coordinate (401,113)
in LENA
Least Squares
MSE:82.06
Structured Probabilistic Model
MSE:68.75
Motivation
Theoretical support: Sequential Source Coding
 Harish Viswanathan & Berger 2000: Given random variable
X1 and X2, under arbitrary distortion D1 and D2, the rate for
jointly describing them is no greater than the rate to describe
them separately.
h
i
R (D 1; D 2) · min RX 1 (D) + RX 2 j X^ 1 (D 2)
D · D1
X1
X2
Encoder1
Encoder2
Decoder1
Decoder2
X^ 1
X^ 2
Contribution

GCM for heterogeneous data compression

Structured probabilistic model for genome compression
MRF for dependency between side information & optimized with BP

Structured probabilistic model for lossless image coding
M3N for joint spatial statistics and structural coherence

Structured probabilistic model for intra-frame video coding
M3N optimized with EP
Contribution
Universal
Coding
Heterogeneous
Data
Execut
ables
Learning
Data
Dependency
Syntax &
Semantics
Genome
GCM
Input space
Image
Video
Wenrui Dai@SJTU, 2014
16
Feature
space
Background: Heterogeneous Data
 Heterogeneous data
 Genomic data
 Long repeats with exception of insertion, deletion, and
substitution.
 Image and Video
 Spatially spanned along the structures, e.g. edge,
texture, and etc.
 Executables
 Interlaced data streams, e.g. opcodes, immediate data,
and etc.
Definition: Heterogeneous Data
Heterogeneous data xN1 is generated by interlacing M data
streams with a time-varying random process Ð(t)
 The j-th data stream is emitted from a stationary Markov
source with order dj
(j )
 Symbol x t is obtained from the j t -th data stream f x n j t gM
by
jt=1
t

1:j t = Ð(t);
(j t )
2:x t = x nj t ;
3:nj t = nj t + 1:
P
and D = max1· j · M dj
N= M
j = 1 Nj
N
 x 1 is not stationary and wide sense stationary

Clue
In Memory Of Ben Taskar (1977-2013)
A rising star in machine learning, computational
linguistics and computer vision
The founder of Max-Margin Markov Network
Generalized Context Modeling
Scenario: Coding Based on Context Modeling
Predictive models for heterogeneous data compression
Symbols for
predicting
Context
Model
Estimated
probability
Enocded bits
Coding
engine
01011101……
Structured prediction model
 Capture intrinsic structure of complex data
 Find adaptation to specific data
 Optimal set prediction based on observed contexts
Generalized Context Modeling
Topology Illustration
Generalized context modeling (GCM) with
combinatorial structuring & multi-dimensional extension
𝑐𝑀 … 𝑐𝑗 … 𝑐2
𝑐1
Extended context
Context with
𝑥𝑡−𝐷 … 𝑥𝑡−𝑘 … 𝑥𝑡−𝑙 … 𝑥𝑡−2 𝑥𝑡−1 𝑥𝑡 Current symbol
combinatorial
structuring
𝑐 … 𝑐 … 𝑐 … 𝑐
𝑐
Sequential context
𝐷
𝑘
𝑙
2
1
𝑥𝑖−𝑚,𝑗−𝑛 … 𝑥𝑖−𝑚,𝑗−𝑛+𝑙 … 𝑥𝑖−𝑚,𝑗
⋮
𝑥𝑖−𝑚+𝑘,𝑗−𝑛 … 𝑥𝑖−𝑚+𝑘,𝑗−𝑛+𝑙 … 𝑥𝑖−𝑚+𝑘,𝑗 2-D context
⋮
𝑥𝑖,𝑗−𝑛 … 𝑥𝑖,𝑗−𝑛+𝑙 … 𝑥𝑖,𝑗
𝐱
1
𝑖
,…,𝐱 ,…,𝐱
𝑀
M-D context
Context with
multidimensional
extension
Generalized Context Modeling
Graphical Model for Prediction
Contexts
(N )
c1
..
.
(N )
c2
¢¢¢
..
.
( 2)
c1
( 1)
..
.
¢¢¢
c2
x1
x2
ci
( 1)
( 1)
c1
¢¢¢
¢¢¢
ci + 1
(N )
cD
¢¢¢
..
.
( 2)
( 2)
c2
(N )
(N )
ci
• Symbols for predicting are
correlated with their
neighboring symbols.
..
.
( 2)
ci + 2
( 2)
¢¢¢
( 1)
ci
ci + 1
xi
xi + 1
cD
( 1)
¢¢¢
¢¢¢
Symbols for predicting
• Graphical model for GCM
with D-order and Mdirectional context.
cD
xD
• Component of context in
each direction is served as
an observation for the
prediction.
• Conditional random field
represent dependencies
among symbols for
predicting and contextbased correlations.
Definition: Context Set

Given context s, the set of subsequences containing s is
©b
ª
b
C(s) = xajI (s) µ I (xa)
where I (s) is the index set of s.

S is a valid generalized context model, if it satisfies in each
of its direction
n
Exhaustive property:for any subsequence x mj j 2 A ¤ in j-th
direction, there exists s in S³ such´ that
[ s2 S C s(j ) = (A ¤ )
nj
Disjoint property: for any subsequence x m j 2 A ¤ in j-th
direction, given arbitrary s and s’
³
´
³
´
C s(j ) \ C s0(j ) = ;
Modeling & Prediction: Model Graph

Trellis-like graph rooted from M-ary vector (∅, · · · ,∅)

Each node corresponds to an index set for finite order combination
of predicted symbols. Given node ° = f ° (j ) gM
j=1,
0
Its succeeding node ° satisfies that
i ) I (° ) ½ I (° 0) and l(° 0) = l(° ) + 1;
i i ) i l (° ( j ) ) < i l (° 0( j ) ) · D for ° (j ) with l(° (j ) ) < D :
Its preceding node statisfies that

i ) I (° 00) ½ I (° ) and l(° 00) = l(° ) ¡ 1;
i i ) I (° 00(j ) ) = ; or i l (° 00( j ) ) < i l (° 0( j ) ) for non-empty ° (j ) :
2D M ¡ 1 possible context structures locating in DM+1 vertical
slices for GCM with given M and D.
Generalized Context Modeling
Model Tree example
D = 4
M = 1
D= 3
M = 2
Representation & Prediction: Model Graph
Degenerating Contexts
with m < M Directions
(; ; f 1g ; f 1; 2g)
(; ; f 2g ; f 1; 2g)
(; ; f 1; 2g ; f 1g)
(; ; f 1; 2g; f 1; 2g)
(; ; ; ; f 1; 2g)
(; ; f 1; 2g ; f 2g)
0
1
;;
@;; A
f 1g
0
1
;;
@;; A
f 2g
0
0
1
;;
B C
@; ;A
;
1
;;
@f 1g ;A
;
0
1
;;
@f 2g ;A
;
0
1
f 1g ;
@ ;; A
;
0
1
f 2g ;
@ ;; A
;
(f 1; 2g; ; ; f 1; 2g)
(; ; f 1; 2g ; ; )
(f 1g ; ; ; f 1; 2g)
(f 1; 2g; f 1; 2g ; ; )
(f 1; 2g; ; ; ; )
(f 2g ; ; ; f 1; 2g)
(f 2g ; f 1g ; f 1; 2g)
(f 1g ; f 1; 2g ; ; )
(f 2g ; f 2g ; f 1; 2g)
(; ; f 2g ; f 2g)
(f 2g ; f 1; 2g ; ; )
(f 1g ; f 1; 2g ; f 1g)
(f 1g ; ; ; f 1g)
(f 2g ; f 1; 2g ; f 1g)
(f 2g ; f 1; 2g ; f 2g)
0
(f 1g ; f 1; 2g ; f 2g)
(f 1; 2g; f 2g ; ; )
(f 2g ; ; ; f 1g)
(f 1g ; f 1g ; f 1g)
(f 2g ; ; ; f 2g)
(f 1g ; f 1g ; f 2g)
(f 1; 2g ; f 1g ; f 1g)
(f 1g ; f 1g ; ; )
(f 1g ; f 2g ; f 1g)
(f 1; 2g ; f 1g ; f 2g)
(f 1g ; f 2g ; ; )
(f 1g ; f 2g ; f 2g)
(f 1; 2g ; f 2g ; f 1g)
(f 2g ; f 1g ; ; )
(f 2g ; f 1g ; f 1g)
(f 1; 2g ; f 2g ; f 2g)
(f 2g ; f 2g ; ; )
1
f 1; 2g ;
B
C
@ f 1g ; A
f 1; 2g
1
f 1; 2g ;
B
C
@ f 2g ; A
f 1; 2g
(f 1; 2g; f 1g ; ; )
(f 1g ; ; ; f 2g)
1
0
(f 1; 2g ; ; ; f 2g)
(; ; f 2g ; f 1g)
0
(f 1g ; f 2g ; f 1; 2g)
(f 1; 2g ; ; ; f 1g)
(; ; f 1g ; f 2g)
1
f 1g ;
B
C
@ f 1; 2g ;A
f 1; 2g
f 2g ;
B
C
@ f 1; 2g ;A
f 1; 2g
(f 1g ; f 1g ; f 1; 2g)
(; ; f 1g ; f 1g)
0
0
0
1
f 1; 2g ;
B
C
@ f 1; 2g ;A
f 1; 2g
1
f 1; 2g ;
B
C
@ f 1; 2g ;A
f 1g
0
1
f 1; 2g ;
B
C
@ f 1; 2g ;A
f 2g
(f 2g ; f 1g ; f 2g)
(f 2g ; f 2g ; f 1g)
(f 2g ; f 2g ; f 2g)
0
1
;;
@f 1g ;A
;
0
0
1
;;
@f 2g ;A
;
0
1
;;
@f 1g ;A
f 1g
0
0
1
;;
0 1 @;; A
;;
f 1g
B C
@ ; ;A 0
1
;;
;
@;; A
f 2g
0
1
;;
@f 2g ;A
f 1g
0
0
0
0
0
1
f 1g ;
@ ;; A
;
1 0
1
;;
;;
@ ; ; A @ f 1g ; A
f 1; 2g
f 1; 2g
1
f 1g ;
@ ;; A
f 1g
1
f 2g ;
@ ;; A
f 1g
1
;;
@f 1; 2g ;A
f 1g
1
f 2g ;
@f 1g ;A
f 1g
1
f 1g ;
@f 1g ;A
f 1g
0
0
1
f 1g ;
@ f 1g ; A
f 1; 2g
1
f 1g ;
B
C
@ f 1; 2g ;A
f 1; 2g
0
1
1
f 1; 2g ; 0
f 1; 2g ;
B
C
@ f 1g ; A B
C
@ f 1; 2g ;A
0
1
f 1; 2g
f 1g ;
f 1; 2g
@f 1; 2g ;A 0 f 1; 2g ;1
B
C
f 1g
@ f 1; 2g ;A
0
1
f 1g
f 1; 2g ;
@ f 1g ; A
f 1g
0
1
f 2g ;
@ ;; A
;
Contexts with M Directions
Model graph with depth D=3 and M=2 directions
 Solid (red) and dashed (blue) paths share some common nodes

Generalized Context Modeling
Problem Statement
 Model tree to represent generalized context models and
their successive relationship.
 Minimum description length (MDL) principle to select
optimal class of context models.
 Normalized maximum likelihood (NML) to determine
optimal weighted probability for prediction.
Generalized Context Modeling
Model Tree
 For contexts with maximum order D and M directions,
model tree elaborates all the possible combinations of
finite-order predicted symbols and their successive
relationship.
 Its root is M-directional empty vector ∅, … , ∅ , and each
of its node corresponds to the index set of one combination
of predicted symbols.
 There are 2𝐷𝑀−1 nodes in the 2𝑀 𝐷−1 paths from the root
to the leaf nodes, which constraining the context selection.
Model Selection: Separable Context Modeling

Prediction based on contexts with multi-directional structuring can
be made separately in each of its directions. Given R = (L + 1) D ¡ 1
0
1
0
1
(1)
N
N (1)
P r (x 1 js = c1 )
Pr (x 1 js = c1
B
C
B
C
..
..
=
H
¢
@
A
@
A
.
.
(M )
P r (x N1 js = cV
P r (x N1 js(M ) = cU
where H 2 RU£ V ; U = RM ; V = RM and its elements are
(
¡ n(j )
¢
n(j )
Pr s = c
u mod R = dv=Re
huv =
0
otherwise

The size of model class grows linearly with M
Generalized Context Modeling
Model Selection
 NML function with MDL principle for model selection.
¡ ln f N M L
p
n
(y n jp) = ¡ ln f p (y n jµn ) + ln
+ log
2 2¼
Z p
kJ (µ) kdµ + o(1)
µ
Model complexity
Code assignment
function
 Contexts in each direction are compared with the NML
function to find optimal context for predicting current
symbol.
 For M-interlaced autoregressive sources, the model
complexity is constant with data size N.
Generalized Context Modeling
Weighted Probability for Prediction
 Estimated probability for generalized context modeling
Pw
¡
x N1
¢
X
Y
=
w (S)
S2 M
Pr
¡
¢
x N1 js
s2 S
In a sequential way, for each symbol
X
Pw (x t ) =
w (s) Pe (x t js)
s
 For each context s, its weights is
l ( s)
w (s) = P
where l(s) =
P
CM D ´ l ( s)
MD
k= 1
k
k
CM
D´
M
(i )
l(s
)
i= 1
l ( s)
=
CM D ´ l ( s)
(1 + ´ ) M D ¡ 1
Generalized Context Modeling
Model Redundancy
 Given generalized model class M with maximum order D
and M directions, the model redundancy led by multidirectional extension is
½
¹ M E = ¡ log Q
Pw
¡
xN
1
¡
¢
N
P
x
e
1
s2 Sa
¢· ¡ L
js
MD
log
´MD
(1 + ´ )
MD
¡ 1
where L is the size of alphabet, η is the compensation for
various contexts.
 The model redundancy led by multi-directional extension
only depends on the maximum order D and the number of
directions M, but is independent of size of data N.
Generalized Context Modeling
Experimental Results
 In Calgary corpus, GCM outperforms CTW by 7%12% in executable files and seismic data.
 In executable file compression, GCM outperforms
PPMd and PPMonstr by 10% and 4%, respectively.
GCM is comparative to the best compressor PAQ8
with less computational complexity.
 ML-based does not fully exploit the statistics in
heterogeneous data. As an alternative, learning of
structured prediction model is proposed.
Model Redundancy
 Given generalized model class M with maximum order D
and M directions, the model redundancy led by
combinatorial structuring is
½
¹ C S = ¡ log Q
Pw
¡
xN
1
¡
¢
N
s2 Sa Pe x 1
´D
¢ · ¡ L log
D
js
(1 + ´ ) ¡ 1
D
where L is the size of alphabet, η is the compensation for
various contexts.
 The model redundancy led by multi-directional extension
only depends on the maximum order D, but is independent
of size of data N.
34
Conceptual Diagram: Image
• Discriminative prediction
distinguishes the actual values of
pixels with other possible estimations
to the max margin based on contexts,
but cannot utilize the structure for
predictions.
• Markov network maintains the
structural coherence in the regions for
predicting but cannot optimize the
context-based prediction.
• Joint optimization by max-margin
Markov network
Diagram
Flowing diagram for structured set prediction model
Structural coherence
Imaging constraints for
set of pixels
Sampling
Encoding
Context-based
prediction for each pixel
Context-based
prediction for each pixel
Decoding
Reconstruction
Imaging constraints for
set of pixels
Structural coherence
Prediction
 Given y the block of encoding pixels and x the reconstructed
pixels as contexts, its prediction is derived in the concurrent form.
 Local spatial statistics is represented by the linear combination of
the class of feature functions.
Trained model
parameters
Collection of feature
functions
Training
Trained model
parameter
Joint
optimization
Feature
function Loss function
Structural
coherence
 Model parameter w is trained over the collection of training data
S={xi, yi}.
 The feature functions {fi} establish the conditional probabilistic
model for prediction based on the various contexts derived from the
supposed predictive direction

is the loss function that evaluates the prediction and adjusts
the model parameter w.
Loss Function
 The M-ary estimated output ŷ(i) for block of pixels y is measured
over the generated graphical model.
 Log-Gaussian function for node clique and Dirac function for
edge clique.
With prediction error ϵi= ŷ(i) -y(i)
and variance σ2 over errors
Solution
 Standard quadratic programming (QP) for solving the min-max formulation
suffers the high computational cost for the problems with large alphabet
size. As an alternative, its dual is proposed.
8
X
1 X
>
2
>
max
®
(y
)
L
(y
;
y
)
¡
k
®
(y
)
(f
(y
)
¡
f
(y
))
k
>
i
i
i
i
i
i
<
2 i ;y
i
;y
X
>
>
s.t.
®i (y ) = C; ®i (y ) ¸ 0; 8i :
>
:
y
 Sequential minimal optimization (SMO) breaks the dual problem into a
series of small (QP) problems for cliques and takes an ascent step to modify
the least number of variables
where
and αi(y) is the marginal distribution for
the ith clique. SMO iteratively chooses the pairs of y with respect to the
KKT condition for solution.
Solution
• Junction tree for loopy
Markov network. Each
junction is generated by
adding edges to link
cliques.
• Junction tree is unique.
• Each clique is predicted
along the junction tree
• Belief Propagation (BP)
as message passing
algorithm for inference
and update the potential
of each clique
Upper Bound of Prediction Error
Theoretical upper bound for prediction error
Theorem: Given the trained weighting vector w and arbitrary constant η>0,
the prediction error is asymptotically equivalent to the one obtained over
the training data with probability at least 1-exp(-η).
s
µ
¶
32
1
EX L (w ¢f ; y ) · ES L ° (w ¢f ; y ) +
ln 4N 1 (L ; ° i ; S) + ln
N
pi ´
Upper bound for
Upper bound for γ-relaxed
average prediction error average training error
Additional term converges
to zero when N grows
Remark: The prediction error is upper-bounded by the well-tuned training
error. The Theorem ensures the predictive performance of the structured
prediction model.
Upper Bound of Prediction Error
 In view of probability, given the trained weighting vector
w and arbitrary constant η>0, with sufficient sampling,
there exists ε(L,γ,N,η)→0, satisfying
P [sup [EX L (w ¢f ; y) ¡ ES L ° (w ¢f ; y)] · "] > 1 ¡ ´
 The prediction error is upper-bounded by the well-tuned
training error. The Theorem ensures the predictive
performance of the structured set prediction model.
Implementation
• Combining with variancebased predictor for smooth
regions, structured set
prediction serves as an
alternative mode
• Comparing the coding
cost of two alternative
modes for the optimal one
• Log-Gaussian loss
function to obtain optimal
coding of the residual
based on alleged Gaussian
distribution.
Experimental Results
Image
Proposed
MRP
BMF
TMW
CALIC
JPEGLS
JPEG
2000
HD
Photo
Airplane
3.536
3.591
3.602
3.601
3.743
3.817
4.013
4.247
Baboon
5.635
5.663
5.714
5.738
5.666
6.037
6.107
6.149
•Balloon
Performance
exceeds2.579
JPEG-LS
by 10%
and JPEG2000
lossless
in
2.548
2.649
2.649
2.825
2.904 mode
3.031by 14%
3.320
average in bits per pixel.
Barb
3.764
3.815
3.959
4.084
4.413
4.691
4.600
4.836
Barb2
4.175
4.276 rate
4.378
4.530
4.686
4.789
5.024
• Performance
exceeds4.216
the minimum
predictor
(MRP, the
optimal
predictor)
by
1.35%
per pixel.
Camerain average
3.901 in bits
3.949
4.060
4.098
4.190
4.314
4.535
4.959
Couple
3.323
3.388
3.448
3.446
3.609
3.699
3.915
4.318
Goldhill
4.173
4.207
4.238
4.266
4.394
4.477
4.603
4.746
Lena
3.877
3.889
3.929
3.908
4.102
4.238
4.303
4.477
Peppers
4.163
4.199
4.241
4.251
4.246
4.513
4.629
4.850
Conceptual Diagram: Video
Conceptual description for structured prediction model
Trained model
parameter
Joint
optimization
Feature
function
Loss function
Structural
coherence
• Optimal joint prediction by max-margin Markov network
• Max-margin estimation directly conditioned on the predicted
pixels for context-based prediction
• Markov network to maintain the structural coherence in the
regions for predicting.
Loss Function
 Laplacian loss
for the´M-ary
³
³ error. ´
X function
X estimated
X
` i y^ (i ) ¡ y (i ) +
L (^
y; y) =
i
³
I y^ (i ) ; y (i ) I y^ (j ) ; y (j )
i
j 2 ne(i )
 Laplacian errors derived for each node and the state transition of the
neighboring
for each
8 nodes
³
´ edge. For each node, its prediction error is
1
¡ p 2¾
>
log2 1 ¡ e
>
>
>
µ µ j² j¡ 0:5
>
<
¡ ¾i =p 2
¡
1
log
e
¡
e
2 2
` i (² i ) =
>
µ
¶
>
j² i j¡ 0:5
>
>
1 ¡ ¾=p 2
>
log
:
2 2e
¶¶
j² j+ 0:5
i p
¾= 2
²i = 0
0 < j² i j < 255
j² i j = 255
where the error ϵi= ŷ(i) -y(i) and variance σ2 over errors.
 Laplacian loss function meets with DCT transform. The structured
prediction model optimize it for minimal coding length under HEVC
framework.
´
Expectation Propagation for Message Passing
 Utilize SMO for Solving the standard quadratic programming (QP)
for the max-margin Markov network. Accordingly, junction tree is
generated and message passing algorithm is conducted along junction
tree for the most probable states of each pixel.
 The lossy intra video coding does not require to propagate the actual
states along the junction tree. Statistics like means and variance
cannot be selected and propagated for robust message passing with
convergence.
 Expectation propagation (EP) utilizes such statistics so that the actual
distribution is approximated with the exponential family. The metric
for approximation can be varied based on the video data.
 Prediction based on EP is proven to converge to an upper bound.
Implementation
• Structured prediction
model as an alternative
mode: MODE_STRUCT
• Integration into current
HEVC framework without
additional syntax element
• Mode decision by ratedistortion optimization
• Laplacian-based loss
function for the residual
obtaining best coding
performance under DCT
transform
Experimental Results
• Performance exceeds HEVC common test model by 2.26% in BD-rates. The gain in
BD-PSNR
is up to 0.38dB.
Foreman_352×288
BlowingBubbles_416×240
• Performance exceeds HEVC with combined intra coding (CIP) by 1.31% in BD-rates.
BQMall_832×480
Cactus_1920×1080
Conceptual Diagram: Genome Sequence
• The central dogma of molecular biology:
• A framework
for understanding the normal flow for the transfer of
DNA
sequence
informationRNA
between sequential
information
carrier.
Phenotype
Protein
(Genotype)
• Proteins are constructed according to DNA.
Purine Bases: Adenine (A); Guanine (G)
Pyramidine Bases:Thymine (T); Cytosine (C)
51
Background: Compressive Structures in Genomic Data
TGTCTGCAGCAGCCGCT
Reversible
Insertion of ‘G’
• DNA
sequence is repeated patterns of nucleotides, namely ‘A’, ‘T’,
Palindrome
‘G’, and ‘C’
ACAGACGTGTCGGCGA
TGTCTGCAGGCAGCCGCT
• Approximate repeats: Exact repeat, Insertion,
Deletion, and
Substitution
Deletion of ‘G’
Substitution of ‘G’
TGTCTGCACAGCCGCT
• Reversible
palindrome: substitute ‘A’ with ‘T’
and‘A‘G’
with
’ with ‘C’, and
vice versa
TGTCTGCAACAGCCGCT
52
Background
 Reference-based Methods
 RLZ
Relative LZ compression with related reference sequences
indicated by their self-index
Cannot handle alphabet other than {A,T,G,C,N}
 GRS
General Genome ReSequencing tool
Consider chromosome varied sequence percentage
 GReEn
Copy model for matching of exact repeats in reference
Statistical model for estimating the probabilities of matching
Motivation
 Main Concerns
 Approximate or exact repeats of nucleotides
 Variable repeat sizes & offsets of repeats.
 Exception of insertion, deletion and substitution of nucleotides
in repeats
 Motivations
 Differences between target and reference sequence are not
uniformly distributed, but sparse for coding.
 Side information, e.g. sizes and offsets are correlated, which
can be predicted based on structured prediction model.
Framework
• Under the hierarchical prediction structure, the reference is selected
according to the loss function
n
o
³
´
(j
)
(j
)
(j
)
(j
)
(j
)
F^ ; M^
= arg min L j F ; F~ ; M
:
i
i
(j )
F~i ;M
(j )
i
i
i
i
• Difference sequence is the zero sequence with emergence of non-zero
symbol at a low frequency. It is suitable for wavelet transform and
subsequent bit plane coding.
• Markov random field is established for correlated side information,
which is predicted and updated with BP algorithm.
Hierarchical Prediction Structure
Reference
CAAATcttAcccCGCC
Target
OFFSET = 0
Difference
0x0000000000E0E0E000E0E0E0000000
CAAATCTTACCCCGCC
SIZE = 16
 Example for hierarchical prediction structure: A (sub-) fragment
of 16 nucleotides is predicted based on a (sub-) fragment of 16
nucleotides in reference sequence.
 Its offset is 0.
 The difference (sub-)fragment is obtained by
subtracting the selected reference from the target.
Hierarchical Prediction Structure
Reference
TGTCTGCAMCAGCCGCT
Target
OFFSET
=0
OFFSET
=1
Difference
0x000000000000000000000000000000
TGTCTGCACAGCCGCT
SIZE = 8
SIZE = 8
 Example for hierarchical prediction structure: A (sub-)
fragment of 16 nucleotides is predicted based on two subfragment of 8 nucleotides in reference sequence.
 Their offsets are 0 and 1, respectively.
 The difference (sub-)fragment is obtained by subtracting
the selected reference from the target.
Loss Function
 Hamming-like weighted loss function for estimation of
coding cost. Distance (loss) between the target and
reference sub-fragment Fi(j )and F~i(j ) with size m is
N2
³
´
X
(j )
(j )
L Fi ; F~i
=
` H (x n ; x~n~)
n= N 1
 Three kinds of weighted losses for differences between
target and reference nucleotides are proposed.
8
>
< 0 x n ¡ x~n~ = 0x00
` H (x n ; x~n~ ) = 1 jx n ¡ x~n~ j = 0x20
>
:
C other wi se
Side Information Prediction

Side information prediction
(j )
f
M
 Predicting the side information
i gof fragment Fi
simultaneously.
 Establish Markov chain as structured prediction model for
prediction. Current sub-fragment is predicted and updated
based on its neighboring ones.
S1
F1
S2
F2
O1
O2
F^1
F^2
Wenrui Dai@SJTU, 2013
S1 + S2
S1 = S2
O1 = O2
F
O1 = O2
F^
Side Information Prediction
 Markov chain is established to represent the
interdependencies among the states of side information of
the neighboring sub-fragments.
 BP algorithm to propagate the most probable states for side
information, and calculate and update their marginal
distribution.
³
´
³
´
p^ M
(j )
i
M
M^
(j )
i
¹ f j + 1!
= max
(j )
i
M
+ ¹ fj¡
³
= arg max ¹ f j + 1 !
M
(j )
i
(j )
M
(j )
i
1!
+ ¹ fj¡
M
(j )
i
´
1!
M
(j )
i
Experimental Results
Compression Ratio
Time(Second)
700.0
200
180
600.0
160
Proposed Time
• KOREF20090224
and KOREF20090131 are the genomes
from
500.0
GReEn Time
140
same human species.
120
400.0
GRS Time
100
•Compression ratio is about 495 times, which improves
the
Proposed Ratio
300.0
compression
performance of GReEn and GRS by 150%
and
80
GReEn Ratio
200%.
60
200.0
GRS Ratio
40
• The
run-time ratio in comparison to GReEn is 219.7%, and is
100.0
20
comparative to GRS.
0.0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 M X
Y
Experimental Results
Compression Ratio
Time(Second)
350
140
300
120
100
•250 YH and KOREF20090224 are two genomes from various
human species.
200
80
Proposed Time
GReEn Time
•150The compression ratio is about 232 times, which is 60about
150%
Proposed Ratio
the compression performance of GReEn.
100
40
GReEn Ratio
• GRS cannot compress most of the chromosomes.
50
20
• 0The run-time ratio in comparison to GReEn is 252.99%.
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 M X Y
Acknowledgement
PhD Students
Faculty
Botao Wang
Xiaopeng Zhang
Yong Li
Yuchen Zhang
Xing Gao
Shuo Chen
Prof. Hongkai Xiong
Postdoctoral Fellow
Kexin Tang Yangmei Shen Yuehan Xiong
MS Students
Dr. Wenrui Dai Dr. Chenglin Li
EPFL
UCSD
Kuanyu Ju
Can Xu
Saijie Ni
Han Cheng Wenjing Fan
Thanks!