Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:


6.43 Mб








Inclusion of Constraints 167

Example 92. Example 90 continued











The modified χ2 expression is



























ri − ρi













χ2 =






π )V (p


π ) +






ai aij










πi 2


ε 2













+ i,j=1(pbi − πbi)Vbij (pbj − πbj ) + i=1



with tiny values for δπ and δε.

This direct inclusion of the constraint through a penalty term in the fit is technically very simple and e cient.

As demonstrated in example 88 the term su ciently small means that the uncertainty δhk of a constraint k as derived from the fitted values of the parameters and their errors is large compared to δk:





(δh)k2 =


∂hk ∂hk

δθiδθj δk2 .



∂θi ∂θj





The quantities δθi in (6.24) are not known precisely before the fit is performed but can be estimated su ciently well beforehand. The precise choice of the constraint precision δk is not at all critical, variations by many orders of magnitude make no di erence but a too small values of δk could lead to numerical problems.

6.6.4 The Method of Lagrange Multipliers

This time we choose the likelihood presentation of the problem. The likelihood function is extended to






ln L =

ln f(xi|θ) + αkhk(θ) .




We have appended an expressions that in the end should be equal to zero, the constraint functions multiplied by the so-called Lagrange multipliers. The MLE as obtained by setting ∂ ln L/∂θj = 0 yields parameters that depend on the Lagrange multipliers α. We can now use the free parameters αk to fulfil the constraints, or in other words, we use the constraints to eliminate the Lagrange multiplier dependence of the MLE.

Example 93. Example 88 continued

Our full likelihood function is now

ln L =

(l1 − λ1)2

(l2 − λ2)2

+ α(λ + λ








1 2













α = (l1


with the MLE λ1,2

= l1,2−δ

α. Using λ1

2 = L we find δ









= (L + l2 − l1)/2.





and, as before, λ1

= (L + l1 − l2)/2, λ2





168 6 Parameter Inference I

Of course, the general situation is much more complicated than that of our trivial example. An analytic solution will hardly be possible. Instead we can set the derivative of the log-likelihood not only with respect to the parameters θ but also with respect to the multipliers αk equal to zero, ∂ ln L/∂αk = 0, which automatically implies, see (6.25), that the constraints are satisfied. Unfortunately, the zero of the derivative corresponds to a saddle point and cannot be found by a maximum searching routine. More subtle numerical methods have to be applied.

Most methods avoid this complication and limit themselves to linear regression models which require a linear dependence of the observations on the parameters and linear constraint relations. Non-linear problems are then solved iteratively. The solution then is obtained by a simple matrix calculus.

Linear regression will be sketched in Sect. 7.2.3 and the inclusion of constraints in Appendix 13.10.

6.6.5 Conclusion

By far the simplest method is the one where the constraint is directly included and approximated by a narrow Gaussion. With conventional minimizing programs the full error matrix is produced automatically.

The approach using a reduced parameter set is especially interesting when we are primarily interested in the parameters of the reduced set. Due to the reduced dimension of the parameter space, it is faster than the other methods. The determination of the errors of the original parameters through error propagation is sometimes tedious.

It is recommended to either eliminate redundant parameters or to use the simple method where we represent constraints by narrow Gaussians. The application of Lagrange multipliers is unnecessarily complicated and the linear approximation requires additional assumptions and iterations.

6.7 Reduction of the Number of Variates

6.7.1 The Problem

A statistical analysis of an univariate sample is obviously much simpler than that of a multidimensional one. This is not only true for the qualitative comparison of a sample with a parameter dependent p.d.f. but also for the quantitative parameter inference. Especially when the p.d.f. is distorted by the measurement process and a Monte Carlo simulation is required, the direct ML method cannot be applied as we have seen above. The parameter inference then happens by comparing histograms with the problem that in multidimensional spaces the number of entries can be quite small in some bins. Therefore, we have an interest to reduce the dimensionality of the variable space by appropriate transformations, of course, if possible, without loss of information. However, it is not always easy to find out which variable or which variable combination is especially important for the parameter estimation.

6.7 Reduction of the Number of Variates


6.7.2 Two Variables and a Single Linear Parameter

A p.d.f. f(x, y|θ) of two variates with a linear parameter dependence can always be written in the form

f(x, y|θ) = v(x, y)[1 + u(x, y)θ] .

The log-likelihood for observations u = u(x, y), v = v(x, y),

ln L(θ) = ln v + ln(1 + uθ) ,

is essentially a function of only one significant variate u(x, y) because ln v does not depend on θ and can be omitted. A MLE of θ for a given sample {(x1, y1), . . . , (xN , yN )}


X ln(1 + uiθ) ,


ln L(θ) =


and ui = u(xi, yi) depends only on the observations {u1, . . . , uN }. The analysis can be based on the individual quantities or a histogram.

The simple form of the relation (6.26) suggests that the analytic form g(u|θ) of the p.d.f. of u is not needed for the parameter inference. Only the experimental observations ui enter into the likelihood function.

Unfortunately this nice property is lost when acceptance and resolution e ects are present – and this is usually the case. In this situation, the linearity in θ is lost because we are forced to renormalize the p.d.f.. Nevertheless we gain by the reduction to one variate u. If the detector e ects are not too large, the distribution of u still contains almost the complete information relative to θ.

The analytic variable transformation and reduction is possible only in rare cases, but it is not necessary because it is performed implicitly by the Monte Carlo simulation. We generate according to f(x, y|θ) and for each observation xi, yi we calculate the corresponding quantity ui = u(xi, yi). The parameter θ is determined by a comparison of the experimental sample with the Monte Carlo distribution of u by means of a likelihood or a χ2 fit. The distribution of u lends itself also for a goodness-of-fit test (see Chap. 10).

6.7.3 Generalization to Several Variables and Parameters

The generalization to N variates which we combine to a vector x is trivial:

f(x|θ) = v(x) [1 + u(x)θ] .

Again we can reduce the variate space to a single significant variate u without loosing relevant information. If simultaneously P parameters have to be determined, we usually will need also P new variates u1, . . . , uP :

f(x|θ) = v(x) "1 +


up(x)θp# .


Our procedure thus makes sense only if the number of parameters is smaller than the dimension of the variate space.



Parameter Inference I









= -1




= 1

















Fig. 6.13. Simulated p.d.f.s of the reduced variable u for the values ±1 of the parameter.

Example 94. Reduction of the variate space We consider the p.d.f.

f(x, y, z|θ) = π1 h(x2 + y2 + z2)1/2 + (x + y3i , x2 + y2 + z2 ≤ 1 , (6.27)

which depends on three variates and one parameter. For a given sample of observations in the three dimensional cartesian space we want to determine the parameter θ. The substitutions


x + y3





u =


, |u| ≤ 2 ,

(x2 + y2 + z2)1/2

v = (x2 + y2 + z2)1/2 , 0 ≤ v ≤ 1 ,

z = z

lead to the new p.d.f. g(u, v, z)

g(u, v, z|θ) =


[1 + u θ]

∂(x, y, z)






∂(u, v, z)

which after integrating out v and z yields the p.d.f. g(u|θ):


g(u|θ) = dz dv g(u, v, z|θ) .

This operation is not possible analytically but we do not need to compute g explicitly. We are able to determine the MLE and its error from the simple log-likelihood function of θ


ln L(θ) = ln(1 + uiθ) .


6.8 Method of Approximated Likelihood Estimator


In case we have to account for acceptance e ects, we have to simulate the u distribution. For a Monte Carlo simulation of (6.27) we compute for each observation xi, yi, zi the value of ui and histogram it. The simulated histograms g+ and gof u for the two parameter values θ = ±1 are shown in Fig. 6.13. (The figure does not include experimental e ects. This is irrelevant for the illustration of the method.) The superposition ti = (1 − θ)gi + (1 + θ)g+i has then to be inserted into the likelihood function (6.15).

6.7.4 Non-linear Parameters

The example which we just investigated is especially simple because the p.d.f. depends linearly on a single parameter. Linear dependencies are quite frequent because distributions often consist of a superposition of several processes, and the interesting parameters are the relative weights of those.

For the general, non-linear case we restrict ourselves to a single parameter to simplify the notation. We expand the p.d.f. into a Taylor series at a first rough estimate θ0:

f(x|θ) = f(x|θ0) +



Δθ +

1 ∂2f


Δθ2 + · · ·






2 ∂θ2

= V 1 + u1Δθ + u2Δθ2 + · · · .


As before, we choose the coe cients ui as new variates. Neglecting quadratic and


+ c depends only on the new variate



higher terms, the estimate θ = θ0





u1(x) = ∂f(x|θ)/∂θ |θ0 f(x|θ0)

which is a simple function of x.

If the linear approximation is insu cient, a second variate u2 should be added. Alternatively, the solution can be iterated. The generalization to several parameters is straight forward.

A more detailed description of the method with application to a physics process can be found in Refs. [30, 31]. The corresponding choice of the variate is also known under the name optimal variable method [42].

6.8 Method of Approximated Likelihood Estimator

As in the previous section we investigate the situation where we have to estimate parameters in presence of acceptance and resolution e ects. The idea of the method

is the following: We try to find a statistic ˆof the distorted data sample which


summarizes the information relative to the parameter of interest. Then we perform a Monte Carlo simulation to infer the relation θ(θ) between the parameter of interest θ and the observed quantity θ. Ideally, we can find an approximately su cient statistic9. If we are not successful, we can use the likelihood estimate which we

9A su cient statistic is a function of the sample and replaces it for the parameter estimation without loss in precision. We will define su ciency in Chap. 7.



Parameter Inference I










mean values























































































































Fig. 6.14. Observed lifetime distribution. The insert indicates the transformation of the observed lifetime to the corrected one.

obtain when we insert the data into the undistorted p.d.f.. In both cases we find the relation between the experimental statistic and the estimate of the parameter by a Monte Carlo simulation. The method should become clear in the following example.

Example 95. Approximated likelihood estimator: Lifetime fit from a distorted distribution

The sample mean t of a sample of N undistorted exponentially distributed lifetimes ti is a su cient estimator: It contains the full information related to the parameter τ, the mean lifetime (see Sect. 7.1.1). In case the distribution is distorted by resolution and acceptance e ects (Fig. 6.14), the mean value


t= ti/N

of the distorted sample ti will usually still contain almost the full information relative to the mean life τ. The relation τ(t) between τ and its approximation t(see insert of Fig. 6.14) is generated by a Monte Carlo simulation. The uncertainty δτ is obtained by error propagation from the uncertainty δtof t,





















)2 =












N − 1














X ti′2


with t′2 =






using the Monte Carlo relation τ(t).

This approach has several advantages:

6.8 Method of Approximated Likelihood Estimator


We do not need to histogram the observations.

Problems due to small event numbers for bins in a multivariate space are avoided.

It is robust, simple and requires little computing time.

For these reasons the method is especially suited for online applications, provided that we find an e cient estimator.

If the distortions are not too large, we can use the likelihood estimator extracted from the observed sample {x1, . . . , xN } and the undistorted distribution f(x|λ):


L(λ) = f(xi|λ) ,


d |ˆ = 0 . (6.29)

λ λ

This means concretely that we perform the usual likelihood analysis where we ignore

the distortion. We obtain ˆ. Then we correct the bias by a Monte Carlo simulation


which provides the relation ˆ ˆ.

λ(λ )

It may happen in rare cases where the experimental resolution is very bad that f(x|λ) is undefined for some extremely distorted observations. This problem can be

cured by scaling ˆor by eliminating particular observations.


Acceptance losses α(x) alone without resolution e ects do not necessarily entail a reduction in the precision of our approach. For example, as has been shown in Sect. 6.5.2, cutting an exponential distribution at some maximum value of the variate, the mean value of the observations is still a su cient statistic. But there are cases where sizable acceptance losses have the consequence that our method deteriorates. In these cases we have to take the losses into account. We only sketch a suitable method. The acceptance corrected p.d.f. f(x|λ) for the variate x is

f(x λ) =







where the denominator is the global acceptance and provides the correct normalization. We abbreviate it by A(λ). The log-likelihood of N observations is


ln L(λ) = ln α(xi) + ln f(xi|λ) − NA(λ) .

The first term can be omitted. The acceptance A(λ) can be determined by a Monte Carlo simulation. Again a rough estimation is su cient, at most it reduces the precision but does not introduce a bias, since all approximations are automatically corrected with the transformation λ(λ).

Frequently, the relation (6.29) can only be solved numerically, i.e. we find the maximum of the likelihood function in the usual manner. We are also allowed to approximate this relation such that an analytic solution is possible. The resulting error is compensated in the simulation.

Example 96. Approximated likelihood estimator: linear and quadratic distributions

A sample of events xi is distributed linearly inside the interval [−1, 1], i.e. the p.d.f. is f(x|b) = 0.5 + bx. The slope b , |b| < 1/2, is to be fitted. It is located in the vicinity of b0. We expand the likelihood function


6 Parameter Inference I






ln L = X ln(0.5 + bxi)


at b0 with








b = b0 + β



and derive it with respect to






β to find the value β at the maximum:









= 0 .


0.5 + (b0 + βˆ)xi








Neglecting quadratic and higher order terms in β we can solve this equation








for β and obtain





















where we have set f0i = f(xi|b0).

If we allow also for a quadratic term

f(x|a, b) = a + bx + (1.5 − 3a)x2 ,

we write, in obvious notation,

f(x|a, b) = f0 + α(1 − 3x2) + βx

and get, after deriving ln L with respect to α and β and linearizing, two linear






equations for αˆ and β:











= Ai ,


αˆ Ai

+ β AiBi





= X Bi ,


αˆ X AiBi + β X Bi






with the abbreviations Ai = (1 − 3x2i )/f0i, Bi = xi/f0i.

From the observed data using (6.31) we get ˆ′ ′ , ′ ′ , and the simulation

β (x ) αˆ (x )

provides the parameter estimates ˆ ˆ′ ′ and their uncertainties. b(β ), aˆ(αˆ )

The calculation is much faster than a numerical minimum search and almost as

precise. If ˆ are large we have to iterate.

α,ˆ β

6.9 Nuisance Parameters

Frequently a p.d.f. f(x|θ, ν) contains several parameters from which only some, namely θ, are of interest, whereas the other parameters ν are unwanted, but influence the estimate of the former. Those are called nuisance parameters. A typical example is the following.

Example 97. Nuisance parameter: decay distribution with background

We want to infer the decay rate γ of a certain particle from the decay times ti of a sample of M events. Unfortunately, the sample contains an unknown amount of background. The decay rate γb of the background particles be known. The nuisance parameter is the number of background events N. For

6.9 Nuisance Parameters


a fraction of background events of N/M, the p.d.f. for a single event with

lifetime t is





f(t|γ, N) = 1 −


γeγt +


γbeγbt ,





from which we derive the likelihood for the sample:

L(γ, N) = YM 1 − MN γeγti + MN γbeγbti .


A contour plot of the log-likelihood of a specific data sample of 20 events and γb = 0.2 is depicted in Fig. 6.15. The two parameters γ and N are correlated. The question is then: What do we learn about γ, what is a sensible point estimate of γ and how should we determine its uncertainty?

We will re-discuss this example in the next subsection and present in the following some approaches which permit to eliminate the nuisance parameters. First we will investigate exact methods and then we will turn to the more problematic part where we have to apply approximations.

6.9.1 Nuisance Parameters with Given Prior

If we know the p.d.f. π(ν) of a nuisance parameter vector ν, the prior of ν, then we can eliminate ν simply by integrating it out, thereby weighting ν with its probability π(ν) to occur. Z

fθ(x|θ) = f(x|θ, ν)π(ν)dν .

In this way we obtain a p.d.f. depending solely on the parameters of interest θ. The

corresponding likelihood function of θ is


Lθ(θ|x) = L(θ, ν|x)π(ν)dν = f(x|θ, ν)π(ν)dν . (6.32)

Example 98. Nuisance parameter: measurement of a Poisson rate with a digital clock

An automatic monitoring device measures a Poisson rate θ with a digital clock with a least count of . For n observed reactions within a time interval ν the p.d.f. is given by the Poisson distribution P (n|θν). If we consider both, the rate parameter θ and the length of the time interval ν as unknown parameters, the corresponding likelihood function is

L(θ, ν) = eθν [θν]n . n!

For a clock reading t0, the true measurement time is contained in the time interval t0 ± Δ/2. We can assume that all times ν within that interval are equally probable and thus the prior of ν is π(ν) = 1/Δ for ν in the interval [t0 − Δ/2 , t0 + Δ/2] and equal to zero elsewhere. We eliminate constant factors, and, integrating over ν,



Lθ(θ) = Zt0Δ/2

eθν [θν]n dν ,

we get rid of the nuisance parameter. The integral can be evaluated numerically.



Parameter Inference I







































lnL = 0.5

lnL = 2































0.0 0





number of background events

Fig. 6.15. Log-likelihood contour as a function of decay rate and number of background events. For better visualization the discrete values of the event numbers are connected.

Let us resume the problem discussed in the introduction. We now assume that we have prior information on the amount of background: The background expectation had been determined in an independent experiment to be 10 with su cient precision to neglect its uncertainty. The actual number of background events follows a Poisson distribution. The likelihood function is

L(γ) = N=0


i=1 1 −


γeγti +

20 0.2e−0.2ti .


1010N 20












Since our nuisance parameter is discrete we have replaced the integration in (6.32) by a sum.

6.9.2 Factorizing the Likelihood Function

Very easy is the elimination of the nuisance parameter if the p.d.f. is of the form

f(x|θ, ν) = fθ(x|θ)fν (x|ν) ,


i.e. only the first term fθ depends on θ. Then we can write the likelihood as a product

L(θ, ν) = Lθ(θ)Lν (ν)




Lθ = fθ(xi|θ) ,

independent of the nuisance parameter ν.

Example 99. Elimination of a nuisance parameter by factorization of a twodimensional normal distribution

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]