Statistics in ROOT
Download
Report
Transcript Statistics in ROOT
Statistics in ROOT
René Brun, Anna Kreshuk, Lorenzo Moneta
PH/SFT group, CERN
http://root.cern.ch
ftp://root.cern.ch/root/phystat05.ppt
15th September 2005
PHYSTAT 05, Oxford
1
Contents
User interface
Data storage and access
Analysis
Visualization
New Math libraries
Future plans
15th September 2005
PHYSTAT 05, Oxford
2
ROOT’s user interface
C++ in batch mode
root -b -q myMacro.C > myMacro.log
C++ interpreted code with CINT – the C++ interpreter
in the command line:
root[0] for (int i=0; i<10; i++) cout<<“hello ”<<i<<endl;
loading a macro:
root[1] .L mySmallMacro.C;
root[2] myFunction(1, 2, 3);
C++ compiled code via CINT
root[] .L myScript.C+
Creating shared library /home/…/MyScript_C.so
Python:
Access to ROOT from Python
Access to Python from ROOT
15th September 2005
>>> from ROOT import TLorentzVector
>>> l = TLorentzVector
root [0] TPython::LoadMacro(“MyPyClass.py”);
root [1] MyPyClass mpc;
PHYSTAT 05, Oxford
3
ROOT and external libraries
Using external libraries from ROOT:
– utility to link compiled C/C++ objects with
CINT C/C++ interpreter
Example:
rootcint
In the Makefile of MyLibrary, rootcint generates the dictionary
for MyClass
Load and use MyLibrary in a ROOT session:
root[] .L MyLibrary.so
root[] MyClass *mc = new MyClass();
15th September 2005
PHYSTAT 05, Oxford
4
Data storage and access
• Allows
TTree1
TTree2
Dataset to
analyze
TTreeN
Branches of a TTree are
read independently,
so the variables not
needed for the analysis
are not loaded into
memory
15th September 2005
to analyze Terabytes of
data
• Can select entries from
different physical locations and
collect them into the analysis
dataset
V1 V2 …………V23 ………….....V99
PHYSTAT 05, Oxford
5
Histograms
1-2-3 dimensional histograms
Errors for each bin can be computed:
Default: as sqrt(bin content)
As sqrt(sum of squares of weights of the bin)
1-2 dimensional profile histograms
Mean value of Y and its standard deviation for each bin in X
15th September 2005
PHYSTAT 05, Oxford
6
Analysis of TTrees
TTree::Draw method and TTreeViewer - an easy way to examine the tree:
Producing histograms of user-defined expressions in up to 4 dimensions
Expressions – C++ formulas
Selections – expressions, user-defined macros or graphical cuts
Examples:
15th September 2005
Tree.Draw(“sqrt(x):y”, “x>0 && y<1”);
Tree.Draw(“2*TMath::Log(x)”, cut1 || cut2);
PHYSTAT 05, Oxford
7
Fitting - interface
Minimization packages: Minuit and Fumili
Fitting can be done:
Directly in those packages with a user-defined function to minimize
Through the general interface of
TH1::Fit (binned data) – Chisquare and Loglikelihood methods
TGraph::Fit (unbinned data)
TGraphErrors::Fit (data with errors)
TGraphAsymmErrors::Fit (taking into account asymmetry of errors)
TTree::Fit and TTree::UnbinnedFit
RooFit package for object-oriented
data modeling. Distributed with ROOT
starting from version 5.02-00
15th September 2005
PHYSTAT 05, Oxford
8
Linear Fitting (1)
New class TLinearFitter
Used
to fit functions linear in the parameters
10-15 times faster than Minuit, depending on
the fitting function
Simple to use in a multidimensional case
Example:
lfitter.SetFormula(“1 ++ x0 ++ sqrt(x1) ++ exp(x2) ++ x3 ++ x4”);
Expressions with such syntax can be used in all the
Fit interface functions
15th September 2005
PHYSTAT 05, Oxford
9
Linear Fitting (2)
Robust least trimmed squares fitting
Based on the subset of h
cases (out of n) whose
least squares fit possesses
the smallest sum of
squared residuals
High breakdown point –
smallest proportion of outliers that can cause the estimator
to produce values arbitrarily far from the true parameters
Graph.Fit(“pol3”, “rob=0.75”, -2, 2);
15th September 2005
PHYSTAT 05, Oxford
2nd parameter –
fraction h of the
good points
10
Smoothing and peak finding
TSpectrum class:
Graph smoothers:
1 and 2-dim background
estimation
smoothing
deconvolution
peak search and fitting
Kernel smoother
Lowess
“Super smoother”
Splines – cubic and quintic
15th September 2005
PHYSTAT 05, Oxford
11
Multivariate methods (1)
Minimum Covariance Determinant Estimator –
a highly robust estimator of multivariate location
and scatter
Class TRobustEstimator
High breakdown
point
Algorithm similar to
Least Trimmed
Squares regression
15th September 2005
PHYSTAT 05, Oxford
12
Multivariate methods (2)
TPrincipal - principal components analysis
TMultiDimFit – approximates a
multidimensional function with monomials,
Chebyshev or Legendre polynomials
TMultiLayerPerceptron – a neural
networks class
All multivariate methods can take input
data from a TTree
15th September 2005
PHYSTAT 05, Oxford
13
Confidence intervals
TLimit – computes 95% C.L. limits using the
Likelihood ratio semi-Bayesian method
TRolke – computes confidence intervals for the
rate of the Poisson in the presence of
background and efficiency with a fully frequentist
treatment of uncertainties.
TFeldmanCousins – calculate the C.L. upper
limit using the Feldman-Cousins method
15th September 2005
PHYSTAT 05, Oxford
14
Small useful algorithms
In the namespace TMath:
Most
probability distribution functions, their
densities and inverses
Special functions
Mean and Median – also for weighted
datasets, Variance and K-th order statistic
Kolmogorov-Smirnov test
15th September 2005
PHYSTAT 05, Oxford
15
Linear algebra and quadratic
programming
Linear algebra package:
General, symmetric and
sparse matrices
Matrix decompositions
Eigenvalue analysis
Quadratic programming
library:
Dense and sparse data
Gondzio and Mehrotra
solving methods
15th September 2005
PHYSTAT 05, Oxford
16
Graphs
1-d:
TGraph
TGraphErrors
TGraphAsymmErrors
TMultiGraph – a collection
of graphs
2-d:
TGraph2D
TGraph2DErrors
15th September 2005
PHYSTAT 05, Oxford
17
ROOT Math Packages
15th September 2005
PHYSTAT 05, Oxford
18
MathCore
Library with the basic Math functionality
build-able as a standalone library
no
dependency on others ROOT packages
no external dependency
Main content of MathCore:
Basic
and commonly used mathematical functions
Special and statistics (pdf, cdf) functions
Interfaces
to function and algorithm classes
Basic implementation of some numerical algorithms
3D
and LorentzVectors
Random numbers
15th September 2005
PHYSTAT 05, Oxford
19
MathMore
Library with extra mathematical functionalities
Current content:
C++ interface to functions and algorithms from the Gnu
Scientific Library (GSL)
Mathematical functions implemented using GSL
Algorithms currently present:
adaptive numerical integration, derivation, root finders,
interpolation,1D minimization
repository for needed and useful extra Math
functionality
could include other useful math libraries
15th September 2005
PHYSTAT 05, Oxford
20
Summary and Future plans
First versions of MathCore and MathMore libraries are
being released
Next addition will be new random number package
Improvement of the fitting interface
Statistical algorithms to add:
Transition phase, over in 2-3 months
sPlot
Loess - locally weighted polynomial regression
Cluster analysis
Boxplot and spiderplot
Interface with R?
15th September 2005
PHYSTAT 05, Oxford
21
Mathematical Functions
Special functions
use
proposed C++ standard interface:
double cyl_bessel_i (double nu, double x);
Statistical functions
Probability
density functions (pdf)
Cumulative dist. (lower tail and upper tail)
Inverse of cumulative distributions
Coherent naming scheme (also proposed to C++
standard)
chisquared_pdf, chisquared_prob, chisquared_quant,
Chisquared_prob_inv, chisquare_quant_inv
15th September 2005
PHYSTAT 05, Oxford
22
Mathematical Functions (cont)
New functions with better precision than old one
in ROOT
Extensive
tests of numerical accuracy
Comparison with other libraries (Nag, Mathematica)
15th September 2005
PHYSTAT 05, Oxford
23
Numerical Algorithm
New C++ classes and interfaces for
describing algorithms and functions
Integrator classes
Implementation
based on GSL (QGS) for
definite and indefinite integration
Move of functionality currently in ROOT
TF1 inside new classes in MathCore
Easier
15th September 2005
to use for all clients
PHYSTAT 05, Oxford
24
Physics and Geometry Vectors
Classes for 3D Vectors and LorentzVectors with their
operations and transformations
New classes with cleaner interfaces, generic on the
scalar type and the based coordinates
(cartesian, polar, cylindrical, etc..)
Classes for 3D rotations and Lorentz transformations
Merge old ROOT and CLHEP
Have also rotations based on quaternion
Work done in collaboration with Fermilab group
15th September 2005
PHYSTAT 05, Oxford
25
Minimization
New C++ version of Minuit being introduced in ROOT
Same algorithms translated in C++ plus some added
functionality
Fumili minimizer, single side bounds
Going under extensive validation tests
before
15th September 2005
after
PHYSTAT 05, Oxford
26