Our generative model

Download Report

Transcript Our generative model

David Andrzejewski, Univ. of Wisconsin-Madison (USA)
David G. Stork, Ricoh Innovations, Inc. and Stanford Univ. (USA)
Xiaojin Zhu, Univ. of Wisconsin-Madison (USA)
Ron Spronk, Queen's Univ. (Canada)
1

Visual arts
 Digital authentication of Bruegel, Perugino
(Lyu et al, 2004)
 Jackson Pollock
(Taylor, 1999)
(Irfan and Stork, 2009)

Writings
 Authorship of the Federalist Papers
(Mosteller and Wallace, 1964)
 Ronald Reagan’s radio addresses
(Airoldi et al, 2007)
2
http://www.artchive.com
Haags Gemeentemuseum, The Hague
3
4

Better understand compositional style
1. Develop a formal representation of the paintings
2. Extract these representations from paintings
3. Train a generative model
4. Learn relative visual weights of colors
5. Classify true Mondrians versus
1. “fakes” created by the generative model in step 3
2. “earlier states” of the Transatlantic paintings
5
•Vertical/horizontal lines
• locations
• extents
• Rectangles
• locations
• sizes
• colors
•can span multiple lines
6
7
8


Hypothesize an underlying probabilistic model
that generates observed data
Many uses in machine learning
 Make predictions (Naïve Bayes)
 Generate new examples (Markov model)
 Interpret parameter values (Linear regression)

Given data, learn/train model parameters
 Our approach: Maximum likelihood estimation (MLE)
9
Canvas aspect ratios (kernel density estimator)
10
Number of horiz/vert lines (Poisson)
Horiz/vert line spacing (Dirichlet)
11
Segments are deleted / invisible / left alone (Polya)
12
Rectangle colors (Multinomial)
13
Don’t allow unrealistic “hanging” lines
Require ≥ 1 vertical line
14
Rectangle color
Multinomial probability
White
0.754
Red
0.085
Yellow
0.062
Blue
0.065
Black
0.034
Line type
Spacing Dirichlet
Vertical
1.80
Horizontal
1.61
15



Calculate visual “center of mass”
Assume true Mondrians centered at [0.5,0.5]
Learn color weights via linear programming
Red
Yellow
Blue
Black
0.237
0.143
0.227
0.392
16


Completed in Europe, but then altered after
Mondrian’s arrival in the United States
A variety of techniques (x-ray, UV, etc) were
used to recover the earlier states
(Cooper & Spronk, 2001 )
17
Composition with Red, Blue, and Yellow (1937-1942)
18
Composition with Red, Yellow, and Blue (1935-1942)
19
No. 9 (1939-1942)
20






Very popular technique in machine learning
At each iteration, choose a rule to “split” on
Resulting partitions should be more “pure” with
respect to target classification
(true Mondrian or computer-generated fake?)
Key feature: resulting trees easy to interpret
Estimate accuracy with leave-one-out crossvalidation
Control over-fitting with pruning
21


45 true Mondrians versus 45 generated “fakes”
Classifier
Accuracy
Majority baseline
50%
Decision tree (no pruning)
70%
Decision tree (with pruning)
68%
45 true Mondrians versus 11 “earlier states”
Classifier
Accuracy
Majority baseline
81%
Decision tree (no pruning)
72%
Decision tree (with pruning)
75%
22

Analysis of results

Transatlantic dataset
 < 1% pixels blue
 # horiz / # vert < 0.9
 Low visual “density”
 THEN Transatlantic
23


Formal representation and feature extraction
Generative model
 Fitting simple statistics of Mondrians cannot create
realistic synthetic paintings


Color weights align well with our intuitions
Classification
 Can reliably discriminate true Mondrians vs. computer-
generated
 Cannot do so for true Mondrians vs Transatlantic “earlier
states”
▪ Underlying images were “nearly complete” (!)
24