Transcript lecture 4

‫פרקים נבחרים בפיסיקת החלקיקים‬
‫אבנר סופר‬
‫אביב ‪2007‬‬
‫‪4‬‬
Simplest variable combination:
diagonal cut
Combining variables
•
•
•
•
Many variables that weakly separate signal from background
Often correlated distributions
Complicated to deal with or to use in a fit
Easiest to combine into one simple variable
Fisher discriminant:
Neural networks
BB & qq
Background MC
Continuum MC
BB MC
Signal MC
Input variables for neural net
Legendre Fisher
Log(Dz)
cosqT
Log(K-D DOCA)
Signal
BB bgd
cc+uds
Lepton tagging
(BtgElectronTag
& BtgMuonTag)
Uncorrelated, (approximately) Gaussiandistributed variables
• “Gaussian-distributed” means the distribution of v is
 ( v   v ) 2 

p(v)  exp  
2
2


• How to combine the information?
Signal
Background
v1
• Option 1: V = v1 + v2
• Option 2: V = v1 – v2
Background
• Option 3: V = a1 v1 + a2 v2
• What are the best weights ai?
v2
• How about ai = (<vis> – <vib>)
= difference between the signal & background means
Signal
Incorporating spreads in vi
• <v1s> – <v1b> > <v2s> – <v2b>, but v2 has a smaller v1
spreads and more actual separation between S and B
•
ai = (<vis> – <vib>)/((is)2 + (ib)2)
where (is)2 = <(vis – <vis>)2> = e (vies – <vis>)2 / N
is the RMS spread in the vi distribution of a pure
signal sample (similarly defined for ib)
• You may be familiar with the form
<(v – <v>)2> = <v2> + <<v>2> – 2 <v<v>>
= <v2>  <v>2
v2
Signal
Background
Background
Signal
Linearly correlated, Gaussian-distributed
variables
• Linear correlation:
– <v1> = <v1>0 + c v2
– (1)2 independent of v2
•
ai = (<vis> – <vib>) / ((is)2 + (ib)2)
doesn’t account for the correlation
• Recall (is)2 = <(vis – <vis>)2>
• Replace it with the covariance matrix
Cijs = <(vis – <vis>) (vjs – <vjs>) >
•
ai = j (<vis> – <vib>) (Cijs + Cijb)1
• Fisher discriminant: F  j ai vi
Inverse of the sum of the S+B
covariance matrices
Fisher discriminant properties
• Best S-B separation for a linearly correlated set of Gaussian-distributed
variables
• Non-Gaussian-ness of v is usually not a problem…
• There must be a mean difference <vis> – <vib>  0
• Need to calculate ai coefficients using
(correctly simulated) Monte Carlo (MC)
signal and background samples
• Should validate using control samples
(true for any discriminant)
Take abs value
More properties
• F is more Gaussian than its inputs
• (virtual calorimeter example)
• Central limit theorem:
– If xj (j=1, …n) are independent random variables with means <xj> and
variances j2, then for large n, the sum j xj is a Gaussian-distributed variable
with mean
j <xj> and variance j j2
• F can usually be fit with 2 Gaussians or a bifurcated Gaussian
• A cut on F corresponds to an (n-1)-diemensional plane cut through the ndimensional variable space
Nonlinear correlations
• Linear methods (Fisher) are not optimal for such cases
• May fail altogether if there is no S-B mean difference
Artificial neural networks
• “Complex nonlinearity”
• Each neuron
– takes many inputs
– outputs a response function value
• The output of each
neuron serves as input
for the others
• Neurons divided among
layers for efficiency
• The weight wijl between neuron i
in layer l and neuron j in layer l+1
is calculated using a MC “training
sample”
Response functions
• Neuron output = r (inputs, weights) = a(k(inputs, weights))
Common usage
a = linear in output layer
a = tanh in hidden layer
k = sum in hidden & output layer
Training (calculating weights)
• Event a (a=1…N) has input variable vector x = (x1…xnvar)
• For each event, calculate the deviation from the desired value (0 for
background, 1 for signal)
• Calculate the error function for random values w of the weights
… Training
• Change the weights so as to cause the most steep decline in E:
• “online learning”: remove the sums
– Requires a randomized training sample
What architecture to use?
• Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is
sufficient to approximate a continuous correlation function to any
precision, if the number of neurons in the layer is high enough
• Alternatively: several hidden layers and less neurons may converge faster
and be more stable
• Instability problems:
– output distribution changes with different samples
What variables to use?
• Improvement with added variables:
• Importance of variable i:
More info
• A cut on a NN output = non-linear slice through n-dimensional space
• NN output shape can be (approximately) Gaussianized:
q  q’ = tanh1[(q – ½ (qmax+qmin) / ½(qmax – qmin)]