acceptance fluctuations, 57 activation function, 314 AdaBoost, 325
AMISE, see asymptotic mean integrated square error
ancillary statistic, 185 Anderson–Darling statistic, 359 Anderson-Darling test, 266 angular distribution, 53
generation, 113
ANN, see artificial neural network approximation of functions, see function
approximation
artificial neural network, see neural network asymptotic mean integrated square error
of histogram approximation, 330 attributes, 289
averaging measurements, 205
B-splines, 301
back propagation of ANN, 316 bagging, 325
Bayes factor, 282, 374 Bayes’ postulat, 5 Bayes’ probability, 5
Bayes’ theorem, 11, 133, 135 for probability densities, 43
Bayesian statistics, 3 Bernoulli distribution, 56 bias, 187
of estimate, 345, 358 of measurement, 103 binomial distribution, 55
Poisson limit, 60 weighted observations, 57
boosting, 324 bootstrap, 277, 336
confidence limits, 339 estimation of variance, 337 jackknife, 340
precision, 339
two-sample test, 340 breakdown point, 377 Breit-Wigner distribution, 74
generation, 113 brownian motion, 30
categorical variables, 309 Cauchy distribution, 74
generation, 113
central limit theorem, 66, 73, 344 characteristic function, 33
of binomial distribution, 56 of Cauchy distribution, 74
of exponential distribution, 37
of extremal value distribution, 78 of normal distribution, 66
of Poisson distribution, 36 of uniform distribution, 65
Chebyshev inequality, 343 chi-square, 151, 196
of histograms, 153
histogram of weighted events, 364
of histograms of weighted events, 362 chi-square distribution, 41, 70 chi-square probability, 253
chi-square test, 254 binning, 257
composite hypothesis, 260 generalized, 259
small samples, 261 two-sample, 276
CL, see confidence level classification, 290, 309 decision tree, 322
k-nearest neighbors, 319 kernel methods, 319 support vector machines, 320 weighting, 319
classifiers
training and testing, 340 combining measurements, 151, 205
388 Index
conditional probability, 11 conditionality principle, 185 confidence belt, 354 confidence interval, 104, 202
classical, 353
unphysical parameter values, 219 upper limits, 214
confidence level, 353 confidence region, 354 consistency
of estimate, 345, 358 of test, 248
constraints, 163, 368 convolution function, 222 convolution integral, 48, 222 convolution matrix, 226 correlation, 45, 52
coe cient, 45, 96 covariance, 45 covariance matrix, 97 coverage probability, 353
Cramer–Rao inequality, 346 Cramer–von Mises test, 265 Cramer–von-Mises statistic, 359 credibility interval, 202
critical region, 247 cross validation, 311 cumulants, 35
curse of dimensionality, 290
decision tree, 290, 322, 327 boosted, 324
deconvolution, 221
binning of histograms, 230 binning-free, 234
by matrix inversion, 224 error estimation, 241 iterative, 231
migration method, 236 of histograms, 227 regularization, 226, 229
regularization of transfer matrix, 232 degree of belief, 3
degrees of freedom, 71, 72, 255 di usion, 30
digital measurements, 31 direct probability, 138 discriminant analysis, 311 distribution
angular, 53 continuous, 16 discrete, 16 multivariate, 51 sample width, 71
distribution function, 16
EDF, see empirical distribution function
EDF statistics, 359 e ciency
of estimators, 346 e ciency fluctuations, 57 e ciency of estimate, 358
empirical distribution function, 264 empirical moments, 87
energy test, 270
distance function, 270, 272 two-sample, 277
entropy regularization, 229 Epanechnikov kernel, 334 equivalent number of events, 63 error, 81, 201
declaration of, 82 definition, 83 definition of, 204 determination of, 85 of a product, 211
of a sum, 211, 212 of average, 98
of correlated measurements, 98 of empirical variance, 87
of error, 87 of ratio, 207
of weighted sum, 102 one-sided, 214 parabolic, 203
propagation of, 94, 94, 205, 210 relative, 82
several variables, 100 statistical, 84 systematic, 88, 90 types of, 84
unphysical parameter values, 219 verification of, 82
error ellipsoid, 97 error interval, 202 error matrix, 97
error of the first kind, 247 error of the second kind, 247
error propagation, 94, 94, 205, 210 estimate, 3
estimator
minimum variance bound, 347 event, 2, 9
excess, 26 expected value, 20
definition, 21 exponential distribution, 69
generation, 112
generation from uniform distribution, 42 extended likelihood, 154
extreme value distribution generation, 113
extreme value distributions, 77
extremum search, 364
method of steepest descent, 366 Monte Carlo methods, 365 parabola method, 366
simplex algorithm, 365 stochastic, 367
f.w.h.m., see full width at half maximum factor analysis, 303
Fisher information, 346
Fisher’s spherical distribution, 55 Fisher–Tippett distribution, 78 frequentist confidence intervals, 353 frequentist statistics, 3
full width at half maximum, 28 function approximation, 291
adapted functions, 302 Gaussian weighting, 293 k-nearest neighbors, 292 orthogonal functions, 294 polynomial, 295, 369 splines, 300
wavelets, 298 weighting methods, 292
gamma distribution, 72 Gauss distribution, 65 Gauss–Markov theorem, 199 Gini-index, 323
GOF test, see goodness of fit test goodness-of-fit test, 250, 363 Gram–Charlier series, 297 Gram–Schmidt method, 296 Gumbel distribution, 78
Haar wavelet, 299 Hermite polynomial, 295
histogram, comparison of, 361 hypothesis
composite, 246 simple, 246
hypothesis test, 245 multivariate, 268
i.i.d. variables, see independent, identically distributed varaiables
importance sampling, 115 incompatible measurements, 209 independence, 52
independence of variates, 46 independent, identically distributed
variables, 52 information, 186 input vector, 289
integrated square error, 329 interval estimation, 201, 355 inverse probability, 137
Index 389
ISE, see integrated square error iterative deconvolution, 231
jackknife, 340
k-nearest neighbor test, 278 two-sample, 270
k-nearest neighbors, 292, 319 kernel method, 326
kernel methods, 290, 333 classification, 319
kernel trick, 373 kinematical fit, 165
Kolmogorov–Smirnov test, 263, 277 Kuiper test, 265
kurtosis, 26 coe zient of, 26
L2 test, 268
Laguerre polynomial, 295 law of large numbers, 73, 343 learning, 289
least median of squares, 378 least square fit, 195
truncated, 376
least square method, 195 counter example, 196 least trimmed squares, 378 Legendre polynomial, 295
lifetime distribution moments of, 28
Monte Carlo adjustment, 158 likelihood, 137
definition, 137 extended, 154
histogram of weighted events, 364 histograms, 152
histograms with background, 153 map, 155
likelihood function, 137 approximation, 208 asymptotic form, 349 parametrization, 208 transformation invariance, 142
likelihood principle, 186 likelihood ratio, 137, 137, 140
examples, 140
likelihood ratio test, 261, 281 for histograms, 262, 363 two-samples, 276
linear distribution generation, 112
linear regression, 198, 292 with constraints, 368
literature, 6
LMS, see least median of squares
390 Index
loadings, 308
location parameter, 27 log-likelihood, 138
log-normal distribution, 75, 211 log-Weibull distribution, 78
generation, 113 look-else-where e ect, 279, 286 Lorentz distribution, 74
generation, 113 loss function
decision tree, 324
LP, see likelihood principle LS, see least squares
LST, see least squares truncated LTS, see least trimmed squares
machine learning, 289 Mahalanobis distance, 269 marginal distribution, 43 marginal likelihood, 374 Markov chain Monte Carlo, 120
maximum likelihood estimate, 142 bias of, 187
consistency, 347 e ciency, 348
small sample properties, 350 maximum likelihood method, 142
examples, 144 recipe, 143
several parameters, 148 signal with background, 150
MCMC, see Markov chain Monte Carlo mean integrated square error, 330, 333
of histogram approximation, 330 of linear spline approximation, 333
mean value, 22 measurement, 2 average, 205
bias, 103
combination of correlated results, 98 combining, 98, 151, 202, 205
measurement error, see error measurement uncertainty, see error median, 27, 377
method of steepest descent, 366 Mexican hat wavelet, 299 minimal su cient statistic, 183 minimum search, 364
minimum variance bound estimate, 350 minimum variance estimate, 350
MISE, see mean integrated square error MLE, see maximum likelihood estimate mode, 27
moments, 32
exponential distribution, 38 higher-dimensional distributions, 44
of Poisson distribution, 36 Monte Carlo integration, 123
accuracy, 57 advantages of, 129 expected values, 128
importance sampling, 126 selection method, 123 stratified sampling, 128 subtraction method, 127 weighting method, 127
with improved selection, 125 Monte Carlo search, 365 Monte Carlo simulation, 107
additive distributions, 118
by variate transformation, 110 discrete distributions, 114 generation of distributions, 109 histogram distributions, 114 importance sampling, 115 Markov chain Monte Carlo, 120 Metropolis algorithm, 120 parameter inference, 155, 157 Planck distribution, 117 selection method, 115
with weights, 119 Morlet wavelet, 299
multinomial distribution, 58 multivariate distributions
correlation, 52 correlaton matrix, 52 covariance matrix, 52 expected values, 52 independence, 52 transformation, 52
MV estimate, see minimum variance estimate
MVB estimate, see minimum variance bound estimate
neural network, 290, 312, 326 activation function, 314 loss function, 315
testing, 316 training, 315
Neyman’s smooth test, 266 normal distribution, 65
generation, 113
generation from uniform p.d.f., 51 in polar coordinates, 47 two-dimensional, 66 two-dimensional rotation, 68
nuisance parameter, 174 dependence on, 181 elimination, 174
elimination by factorization, 176 elimination by integration, 181
elimination by restructuring, 177 profile likelihood, 179
null hypothesis, 246, 246
number of degrees of freedom, 71, 72, 255
observation, 2 Ockham’s razor, 374
optimal variable method, 171 orthogonal functions, 294
p-value, 248, 252 combination of, 254
p.d.f., see probability density function parameter inference, 131
approximated likelihood estimator, 171 least square method, 195
moments method, 191 Monte Carlo simulation, 155 optimal variable method, 171
reduction of number of variates, 168 weighted Monte Carlo , 157
with constraints, 163 with given prior, 133
PCA, see principal component analysis PDE, see probability density estimation Pearson test, 257
Peelle’s pertinent puzzle, 213 PIT, 266, 359
Planck distribution generation, 117
point spread function, 222 Poisson distribution, 58
weighted observations, 61 Poisson numbers
weighted, 63
polynomial approximation, 295 population, 3
power law distribution generation, 112
principal component analysis, 290, 303 principal components, 306
prior probability, 134, 136 for particle mass, 5
probability, 3 assignment of, 4 axioms, 10 conditional, 11 independent, 11
probability density conditional, 43 two-dimensional, 42
probability density estimation, 268, 329 by Gram–Charlier series, 297
fixed volume, 333
histogram approximation, 330 k-nearest neighbors, 333
Index 391
kernel methods, 333
linear spline approximation, 332 probability density function, 16 probability integral transformation, 266,
359
probability of causes, 137 profile likelihood, 179 propagation of errors, 94, 94
linear, 94
several variables, 95 pseudo random number, 109
quantile, 28
random event, 2, 9 random forest, 325 random number, 109 random variable, 10 random walk, 30
reduction of number of variables, 47 regression, 195
regression analysis, 292 regularization, 226, 229, 241
minimize curvature, 228 of the transfer matrix, 232
regularization function, 228 resampling techniques, 336 response, 289
robust fitting methods, 375 breakdown point, 377
least median of squares, 378 least trimmed squares, 378 M-estimator, 377
sample median, 377 truncated least square fit, 376
sample, 1 sample mean, 22
sample width, 25, 71 relation to variance, 25
scale parameter, 27 shape parameter, 27 sigmoid function, 315 signal test, 246
multi-channel, 285 signal with background, 61 significance, 63 significance level, 247 significance test, 245
small signals, 279 simplex, 365
singular value decomposition, 308 skewness, 26
coe cient of, 26
soft margin classifier, 372 solid angle, 55
392 Index
spline approximation, 300 spline functions, 370
cubic, 371 linear, 370 normalized, 301 quadratic, 370
stability, 37
standard deviation, 23 statistic, 144
ancillary, 185
minimal su cient, 183 su cient, 183
statistical error definition, 91
statistical learning, 289 statistics
Bayesian, 3 frequentist, 3 goal of, 1
stimulated annealing, 367 stopping rule paradox, 190 stopping rules, 190 straight line fit, 179, 197 Student’s t distribution, 75 su ciency, 145, 183 su ciency principle, 183 su cient statistic, 183 support vector, 322
support vector machine, 291, 320, 371 SVD, see singular value decomposition SVM, see support vector machine systematic error, 88, 90
definition, 91 detection of, 92 examples, 91
test, 245 bias, 248
comparison, 273 consistency, 248 distribution-free, 251 goodness-of-fit, 250, 363 power, 247
significance, 245 size, 247
uniformly most powerful, 247 test statistic, 246
training sample, 289 transfer function, 222 transfer matrix, 226
transformation of variables, 38 multivariate, 46 transformation function, 50
truncated least square fit, 376 two-point distribution, 56 two-sample test, 246, 275
chi-square test, 276 energy test, 277
k-nearest neighbor test, 278 Kolmogorov–Smirnov test, 277 likelihood ratio, 276
UMP test, see test, uniformly most powerful unfolding, see deconvolution
uniform distribution, 31, 65 upper limit, 214
Poisson statistics with background, 216 Posson statistics, 215
v. Mises distribution, 53 variables
independent, identically distributed, 52 variance, 23
estimation by bootstrap, 337 of a sum, 23
of a sum of distributions, 25 of sample mean, 24
variate, 10 transformation, 41
Venn diagram, 10, 134
Watson statistic, 359 Watson test, 266 wavelets, 298
Weibull distribution, 78 weight matrix, 69 weighted observations, 61
statistics of, 61 width of sample, 25
relation to variance, 25
List of Examples
Chapter 1
1.Uniform prior for a particle mass
Chapter 2
2.Card game, independent events
3.Random coincidences, measuring the e ciency of a counter
4.Bayes’ theorem, fraction of women among students
5.Bayes’ theorem, beauty filter
Chapter 3
6.Discrete probability distribution (dice)
7.Probability density of an exponential distribution
8.Probability density of the normal distribution
9.Relation between the expected values of the track momentum and of its curvature
10.Variance of the convolution of two distributions
11.Expected values, dice
12.Expected values, lifetime distribution
13.Mean value of the volume of a sphere with a normally distributed radius
14.Playing poker until the bitter end
15.Di usion
16.Mean kinetic energy of a gas molecule
17.Reading accuracy of a digital clock
18.E ciency fluctuations of a detector
19.Characteristic function of the Poisson distribution
20.Distribution of a sum of independent, Poisson distributed variates
21.Characteristic function and moments of the exponential distribution
22.Calculation of the p.d.f. for the volume of a sphere from the p.d.f. of the radius
394Index
23.Distribution of the quadratic deviation
24.Distribution of kinetic energy in the one-dimensional ideal gas
25.Generation of an exponential distribution starting from a uniform distribution
26.Superposition of two two-dimensional normal distributions
27.Correlated variates
28.Dependent variates with correlation coe cient zero
29.Transformation of a normal distribution from cartesian into polar coordinates
30.Distribution of the di erence of two digitally measured times
31.Distribution of the transverse momentum squared of particle tracks
32.Quotient of two normally distributed variates
33.Generation of a two-dimensional normal distribution starting from uniform distributions
34.The v. Mises distribution
35.Fisher’s spherical distribution
36.E ciency fluctuations of a Geiger counter
37.Accuracy of a Monte Carlo integration
38.Acceptance fluctuations for weighted events
39.Poisson limit of the binomial distribution
40.Fluctuation of a counting rate minus background
41.Distribution of weighted, Poisson distributed observations
42.Distribution of the mean value of decay times
Chapter 4
43.Scaling error
44.Low decay rate
45.Poisson distributed rate
46.Digital measurement (uniform distribution)
47.E ciency of a detector (binomial distribution)
48.Calorimetric energy measurement (normal distribution)
49.Average from 5 measurements
50.Average of measurements with common o -set error
51.Average outside the range defined by the individual measurements
52.Error propagation: velocity of a sprinter
53.Error propagation: area of a rectangular table
54.Straight line through two measured points
55.Error of a sum of weighted measurements
Index 395
56.Bias in averaging measurements
57.Confidence levels for the mean of normally distributed measurements
Chapter 5
58.Area of a circle of diameter d
59.Volume of the intersection of a cone and a torus
60.Correction of decay times
61.E ciency of particle detection
62.Measurement of a cross section in a collider experiment
63.Reaction rates of gas mixtures
64.Importance sampling
65.Generation of the Planck distribution
66.Generation of an exponential distribution with constant background
67.Mean distance of gas molecules
68.Photon-yield for a particle crossing a scintillating fiber
69.Determination of π
Chapter 6
70.Bayes’ theorem: pionor kaon decay?
71.Time of a decay with exponential prior
72.Likelihood ratio: V + A or V − A reaction?
73.Likelihood ratio of Poisson frequencies
74.Likelihood ratio of normal distributions
75.Likelihood ratio for two decay time distributions
76.MLE of the mean life of an unstable particle
77.MLE of the mean value of a normal distribution with known width (case Ia)
78.MLE of the width of a normal distribution with given mean (case Ib)
79.MLE of the mean of a normal distribution with unknown width (case IIa)
80.MLE of the width of a normal distribution with unknown mean (case IIb)
81.MLEs of the mean value and the width of a normal distribution
82.Determination of the axis of a given distribution of directions
83.Likelihood analysis for a signal with background
84.Adjustment of a linear distribution to a histogram
85.Fit of the slope of a linear distribution with Monte Carlo correction
86.Fit of a lifetime with Monte Carlo correction
87.Signal over background with background reference sample
88.Fit with constraint: two pieces of a rope
396Index
89.Fit of the particle composition of an event sample
90.Kinematical fit with constraints: eliminating parameters
91.Example 88 continued
92.Example 90 continued
93.Example 88 continued
94.Reduction of the variate space
95.Approximated likelihood estimator: lifetime fit from a distorted distribution
96.Approximated likelihood estimator: linear and quadratic distributions
97.Nuisance parameter: decay distribution with background
98.Nuisance parameter: measurement of a Poisson rate with a digital clock
99.Elimination of a nuisance parameter by factorization of a two-dimensional normal distribution
100.Elimination of a nuisance parameter by restructuring: absorption measurement
101.Eliminating a nuisance parameter by restructuring: slope of a straight line with the y-axis intercept as nuisance parameter
Chapter 7
102.Su cient statistic and expected value of a normal distribution
103.Su cient statistic for mean value and width of a normal distribution
104.Conditionality
105.Likelihood principle, dice
106.Likelihood principle, V − A
107.Bias of the estimate of a decay parameter
108.Bias of the estimate of a Poisson rate with observation zero
109.Bias of the measurement of the width of a uniform distribution
110.Stopping rule: four decays in a time interval
111.Moments method: mean and variance of the normal distribution
112.Moments method: asymmetry of an angular distribution
113.Counter example to the least square method: gauging a digital clock
114.Least square method: Fit of a straight line
Chapter 8
115.Error of a lifetime measurement
116.Averaging lifetime measurements
117.Averaging ratios of Poisson distributed numbers
118.Distribution of a product of measurements
119.Sum of weighted Poisson numbers
120.Average of correlated cross section measurements, Peelle’pertinent puzzle