vstatmp_engl
.pdf
|
7.2 |
Further Methods of Parameter Inference 197 |
|||||
distribution, |
|
|
|
|
|
|
|
f(y1 |
, . . . , yN |θ) exp |
− |
1 N |
|
, |
||
2 i,j=1(yi − ti)Vij (yj − tj ) |
|||||||
|
|
|
|
|
X |
|
|
see Sect. 6.5.6. Maximizing the likelihood is again equivalent to minimizing χ2 if the errors are normally distributed.
The sum χ2 is not invariant against a non-linear variable transformation y′(y). The least square method is also used when the error distribution is unknown. In this situation we do not dispose of a better method.
Example 114. Least square method: fit of a straight line
We fit the parameters a, b of the straight line
y(x) = ax + b |
(7.8) |
to a sample of points (xi, yi) with uncertainties δi of the ordinates. We minimize χ2:
χ2 = |
(yi − a xi − b)2 |
, |
|
|
|
∂χ2 |
Xi |
δi2 |
|
|
|
= |
(−yi + a xi + b)2xi |
, |
|||
|
|
|
|||
∂a |
Xi |
δi2 |
|
|
|
∂χ2 |
= |
(−yi + a xi + b)2 |
. |
|
|
|
|
|
|||
∂b |
Xi |
δi2 |
|
|
|
We set the derivatives to zero and introduce the following abbreviations. (In parentheses we put the expressions for the special case where all uncertainties are equal, δi = δ):
|
|
|
|
|
= |
Xi |
|
xi |
/ |
Xi |
1 |
|
|
|
|
|
|
|
|
|
( |
xi/N) , |
|||||||||||||||||
|
|
|
x |
||||||||||||||||||||||||||||||||||||
|
|
|
|
δi2 |
δi2 |
||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
||||||||||||||||||||||||||
|
|
|
|
|
= |
Xi |
|
yi |
/ |
Xi |
1 |
|
|
|
|
|
|
|
|
|
( |
yi/N) , |
|||||||||||||||||
|
|
|
|
y |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||
|
|
|
|
|
δi2 |
δi2 |
|||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
Xi |
|
|
x2 |
|
Xi |
1 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
|
|
x2 = |
|
|
|
i |
|
/ |
|
|
|
|
|
|
|
|
|
|
|
( |
|
x2/N) , |
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
δi2 |
|
|
|
|
|
δi2 |
Xi |
i |
|||||||||||||||||||||
|
|
= |
|
|
xiyi |
/ |
Xi |
|
1 |
|
|
|
|
|
|
( |
Xi |
xiyi/N) . |
|||||||||||||||||||||
|
xy |
|
|
|
|
|
|
||||||||||||||||||||||||||||||||
|
|
|
|
|
|
δi2 |
|||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
Xi |
δi2 |
|
|
|
|
|
|
|
||||||||||||||||||||||||
We obtain |
|
|
|
|
|
|
ˆ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
− aˆ x , |
|
|
||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b = y |
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ˆ |
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
and |
|
xy − aˆ x |
− b x = 0 , |
||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
aˆ = |
|
|
|
− |
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
xy |
x |
y |
, |
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
x2 − |
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x |
|
|
|
||||||||||||||||
ˆ x2 y − x xy b = .
x2 − x2
198 7 Parameter Inference II
The problem is simplified when we put the origin of the abscissa at the center of gravity x:
x′ = x − x ,
aˆ′ = x′y , x′2
ˆ′
b = y .
Now the equation of the straight line reads
y = aˆ′(x |
|
|
ˆ′ |
. |
(7.9) |
|
|
||||
|
− x) + b |
||||
We gain an additional advantage, the errors of the estimated parameters are no longer correlated.
|
x2 |
|
|
δ2(ˆa′) = 1/ |
i |
, |
|
|
|
||
Xi |
δi2 |
|
|
δ2(ˆb′) = 1/ Xi |
1 |
. |
|
δi2 |
|||
We recommend to use always the form (7.9) instead of (7.8).
7.2.3 Linear Regression
If the prediction depends only linearly on the parameters, we can compute the parameters which minimize χ2 analytically. We put
yt(θ) = a + Tθ . |
(7.10) |
Here θ is the P -dimensional parameter vector, a is a given N-dimensional vector, yt is the N-dimensional vector of predictions. T, also called the design matrix, is a rectangular matrix of given elements with P columns and N rows.
The straight line fit discussed in Example 114 is a special case of (7.10) with
yt = θ1x + θ2 |
, a = 0, and |
|
1 |
· · · |
1 |
|
|
|
|
|
· · · |
|
T |
|
|
T = |
x1 |
xN . |
||
We have to find the minimum of
χ2 = (y − a − Tθ)T V(y − a − Tθ)
where, as usual, V is the weight matrix, the inverse of the covariance matrix: V = C−1. In our case it is a diagonal N ×N matrix with elements 1/δi2. To simplify the notation we transform the observations
y′ = y − a ,
derive
χ2 = (y′ − Tθ)T V(y′ − Tθ)
with respect to the parameters θ and set the derivatives equal to zero:
1 ∂χ2 |
|
|
|
= 0 = −TT V(y′ − Tθˆ) . |
(7.11) |
2 ∂θ θˆ |
200 7 Parameter Inference II
Table 7.1. Virtues and caveats of di erent methods of parameter estimation.
|
moments |
|
χ2 |
max. likelihood |
simplicity |
++ |
|
+ |
− |
precision |
− |
|
+ |
++ |
individual observations |
+ |
|
− |
+ |
measured points |
− |
|
+ |
− |
histograms |
+ |
|
+ |
+ |
upper and lower limits |
− |
|
− |
+ |
external constraints |
− |
|
+ |
+ |
background included |
+ |
2 |
+ |
− |
error assignment |
from error propagation |
χmin + 1 |
ln Lmax − 0.5 |
|
requirement |
full p.d.f. |
only variance |
full p.d.f. |
|
204 8 Interval Estimation
|
∂2 ln L |
|
Vij = − |
∂θi∂θj |
θˆ |
and the covariance or error matrix from its inverse C = V−1.
If we are interested only in part of the parameters, we can eliminate the remaining nuisance parameters simply forgetting about the part of the matrix which contains the corresponding elements. This is a consequence of the considerations from Sect. 6.9.
In most cases the likelihood function is not known analytically. Usually, we have a computer program which delivers the likelihood function for arbitrary values of the parameters. Once we have determined the maximum, we are able to estimate the second derivative and the weight matrix V computing the likelihood function at parameter points close to the MLE. To ensure that the parabolic approximation is valid, we should increase the distance of the points and check whether the result remains consistent.
In the literature we find frequently statements like “The measurement excludes the theoretical prediction by four standard deviations.” These kind of statements have to be interpreted with caution. Their validity relies on the assumption that the log-likelihood is parabolic over a very wide parameter range. Neglecting tails can lead to completely wrong conclusions. We have also to remember that for a given number of standard deviations the probability decreases with the number of dimensions (see Tab. 4.2 in Sect. 4.5).
In the following section we address more problematic situations which usually occur with small data samples where the asymptotic solutions are not appropriate. Fortunately, they are rather the exception. We keep in mind that a relatively rough estimate of the error often is su cient such that approximate methods in most cases are justified.
8.2.2 General Situation
As above, we again use the likelihood ratio to define the error limits which now usually are asymmetric. In the one-dimensional case the two errors δ− and δ+ satisfy
ˆ |
ˆ |
ˆ |
ˆ |
(8.2) |
ln L(θ) − ln L(θ |
− δ−) = ln L(θ) − ln L(θ + δ+) = 1/2 . |
|||
If the log-likelihood function deviates considerably from a parabola it makes sense to supplement the one standard deviation limits ln L = −1/2 with the two standard deviation limits ln L = −2 to provide a better documentation of the shape of the likelihood function. This complication can be avoided if we can obtain an approximately parabolic likelihood function by an appropriate parameter transformation. In some situations it is useful to document in addition to the mode of the likelihood function and the asymmetric errors, if available, also the mean and the standard deviation which are relevant, for instance, in some cases of error propagation which we will discuss below.
Example 115. Error of a lifetime measurement
To determine the mean lifetime τ of a particle from a sample of observed decay times, we use the likelihood function
|
|
|
|
|
8.3 |
Error Propagation 205 |
||
|
iY |
1 |
|
|
|
|
||
Lτ = |
N |
1 |
e−ti/τ = |
e−Nt/τ . |
(8.3) |
|||
|
τN |
|||||||
|
=1 |
τ |
|
|
|
|
||
|
|
|
|
|
|
|
|
|
The corresponding likelihood for the decay rate is
YN
Lλ = λe−λti = λN e−Ntλ. i=1
The values of the functions are equal at equivalent values of the two parameters τ and λ, i.e. for λ = 1/τ:
Lλ(λ) = Lτ (τ) .
Fig. 8.1 shows the two log-likelihoods for a small sample of ten events with mean value t = 0.5. The lower curves for the parameter τ are strongly asymmetric. This is also visible in the limits for changes of the log-likelihood by 0.5 or 2 units which are indicated on the right hand cut-outs. The likelihood with the decay rate as parameter (upper figures) is much more symmetric than that of the mean life. This means that the decay rate is the more appropriate parameter to document the shape of the likelihood function, to average di erent measurement and to perform error propagation, see below. On the other hand, we can of course transform the maximum likelihood estimates and errors of the two parameters into each other without knowing the likelihood function itself.
Generally, it does not matter whether we use one or the other parameter to present the result but for further applications it is always simpler and more precise to work with approximately symmetric limits. For this reason usually 1/p (p is the absolute value of the momentum) instead of p is used as parameter when charged particle trajectories are fitted to the measured hits in a magnetic spectrometer.
In the general case we satisfy the conditions 4 to 7 of our wish list but the first three are only approximately valid. We neither can associate an exact probability content to the intervals nor do the limits correspond to moments of a p.d.f..
8.3 Error Propagation
In many situations we have to evaluate a quantity which depends on one or several measurements with individual uncertainties. We thus have a problem of point estimation and of interval estimation. We look for the parameter which is best supported by the di erent measurements and for its uncertainty. Ideally, we are able to construct the likelihood function. In most cases this is not necessary and approximate procedures are adequate.
8.3.1 Averaging Measurements
In Chap. 4 we have shown that the mean of measurements with Gaussian errors δi which are independent of the measurement, is given by the weighted sum of the individual measurements (4.7) with weights proportional to the inverse errors squared
206 |
8 |
Interval Estimation |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
1a |
|
|
|
1b |
|
Likelihood |
0.01 |
|
|
|
|
|
|
|
|
|
|
1E-3 |
|
|
|
|
|
0.01 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1E-4 |
|
|
|
|
|
|
|
|
|
|
|
1E-50 |
2 |
4 |
6 |
1E-30 |
2 |
|
4 |
||
|
|
|
|
|
|
|
2a |
|
|
|
2b |
|
Likelihood |
0.01 |
|
|
|
|
|
|
|
|
|
|
1E-3 |
|
|
|
|
|
0.01 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1E-4 |
|
|
|
|
|
|
|
|
|
|
|
1E-5 |
0 |
1 |
2 |
|
1E-3 |
0.5 |
1.0 |
1.5 |
|
|
|
|
|
3 |
0.0 |
||||||
Fig. 8.1. Likelihood functions for the parameters decay rate (top) and lifetime (below). The standard deviation limits are shown in the cut-outs on the right hand side.
1/δi2. In case the errors are correlated with the measurements which occurs frequently with small event numbers, this procedure introduces a bias (see Example 56 in Chap. 4) From (6.6) we conclude that the exact method is to add the log-likelihoods of the individual measurements. Adding the log-likelihoods is equivalent to combining the raw data as if they were obtained in a single experiment. There is no loss of information and the method is not restricted to specific error conditions.
Example 116. Averaging lifetime measurements
N experiments quote lifetimes τˆi ± δi of the same unstable particle. The estimates and their errors are computed from the individual measurements
tij of the i-th experiment according to τˆi = |
ni |
|
= |
||||
j=1 tij /ni, respectively δi |
|||||||
n |
decays. We can reconstruct the |
||||||
τˆi/√ |
|
where ni is the number of observed |
P |
|
|
|
|
i |
|
N |
|
, |
|||
individual log-likelihood functions and their sum ln L, with n, n = |
Pi=1 |
n |
|||||
the overall event number: |
|
|
i |
|
|||
