Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

vstatmp_engl

.pdf
Скачиваний:
10
Добавлен:
12.03.2016
Размер:
6.43 Mб
Скачать

8.3 Error Propagation

207

XN

ln L(τ) = −ni(ln τ + τˆi/τ)

i=1

= −n ln τ − X niτˆi

τ

with the maximum at P niτˆi

τˆ =

n

and its error

τˆ

δ = n .

The individual measurements are weighted by their event numbers, instead of weights proportional to 1/δi2. As the errors are correlated with the measurements, the standard weighted mean (4.7) with weights proportional to 1/δi2 would be biased. In our specific example the correlation of the errors and the parameter values is known and we could use weights proportional to

ii)2.

Example 117. Averaging ratios of Poisson distributed numbers

In absorption measurements and many other situations we are interested in a parameter which is the ratio of two numbers which follow the Poisson distri-

bution. Averaging naively these ratios ˆ using the weighted mean

θi = mi/ni

(4.7) can lead to strongly biased results. Instead we add the log-likelihood functions which we have derived in Sect. 6.9.3

= X

 

 

 

 

ln L =

[ni ln(1 + 1/θ) − mi ln(1 + θ)]

n ln(1 + 1/θ)

m ln(1 + θ)

 

with m = mi and n =

 

 

 

 

ˆ

ni. The MLE is θ = m/n and the error limits

computed numerically in the usual way or for not too small n, m

have to be P

P 2

2

 

 

 

by linear error propagation, δθ

 

= 1/n + 1/m.

In the standard situation where we do not know the full likelihood function but only the MLE and the error limits we have to be content with an approximate procedure. If the likelihood functions which have been used to extract the error limits are parabolic, then the standard weighted mean (4.7) is exactly equal to the result which we obtain when we add the log-likelihood functions and extract then the estimate and the error.

Proof: A sum of terms of the form (8.1) can be written in the following way:

1

X Vi(θ − θi)2 =

1

V˜ (θ − θ˜)2 + const. .

 

 

2

2

Since the right hand side is the most general form of a polynomial of second order, a comparison of the coe cients of θ2 and θ yields

˜

 

X Vi ,

˜V˜

=

X

V θ = Viθi ,

208 8 Interval Estimation

that is just the weighted mean including its error. Consequently, we should aim at approximately parabolic log-likelihood functions when we present experimental results. Sometimes this is possible by a suitable choice of the parameter. For example, we are free to quote either the estimate of the mass or of the mass squared.

8.3.2 Approximating the Likelihood Function

We also need a method to average statistical data with asymmetric errors, a method which works without knowing the exact shape of the likelihood function. To this end we try to reconstruct the log-likelihood functions approximately, add them, and extract the parameter which maximize the sum and the likelihood ratio errors. The approximation has to satisfy the constraints that the derivative at the MLE is zero and the error relation (8.2).

The simplest parametrization uses two di erent parabola branches

 

 

1

 

 

ˆ 2

2

(8.4)

 

 

− ln L(θ) =

 

 

 

 

 

2

(θ − θ) /δ±

with

1

 

 

 

1

 

 

 

 

ˆ

 

 

ˆ

 

δ± =

2

δ+[1 + sgn(θ − θ)] +

2

δ[1

− sgn(θ − θ)] ,

 

i.e. the parabolas meet at the maximum and obey ( 8.2). Adding functions of this type produces again a piecewise parabolic function which fixes the mean value and its asymmetric errors. The solution for both the mean value and the limits is unique.

Parameterizations [43] varying the width σ of a parabola linearly or quadratically with the parameter are usually superior to the simple two branch approximation. We

set

 

1

h(θ − θˆ)/σ(θ)i

2

 

 

 

 

 

− ln L(θ) =

 

 

 

 

with

2

 

 

 

 

σ(θ) =

+δ

+

δ+ − δ

θˆ)

(8.5)

or

 

δ+ + δ

 

δ+ + δ

 

 

 

 

 

 

 

 

 

 

ˆ

 

 

2

= δ+δ+ (δ+ − δ)(θ

(8.6)

 

(σ(θ))

θ) ,

respectively. The log-likelihood function has poles at locations of θ where the width becomes zero, σ(θ) = 0. Thus our approximations are justified only in the range of θ which excludes the corresponding parameter values.

In Fig. 8.2 we present four typical examples of asymmetric likelihood functions. The log-likelihood function of the mean life of four exponentially distributed times is shown in 8.2 a. Fig. 8.2 b is the corresponding log-likelihood function of the decay time4. Figs. 8.2 c, d have been derived by a parameter transformation from normally distributed observations where in one case the new parameter is one over the mean and the square root of the mean5 in the other case. A method which is optimum for all cases does not exist. All three approximations fit very well inside the one standard deviation limits. Outside, the two parameterizations (8.5) and (8.6) are superior to the two-parabola approximation.

We propose to use one of two parameterizations (8.5, 8.6) but to be careful if σ(θ) becomes small.

4The likelihood function of a Poisson mean has the same shape.

5An example of such a situation is a fit of a particle mass from normally distributed mass squared observations.

8.3 Error Propagation

209

0

 

 

 

a

 

 

 

 

-1

 

 

 

 

lnL

 

 

 

 

-2

exact

 

 

 

 

 

-3

 

2 (x) linear

 

 

 

(x) linear

 

 

 

2 Gaussians

 

 

-4

 

 

 

 

 

1

2

3

4

0

 

 

 

 

 

 

 

 

c

-1

 

 

 

 

lnL

 

 

 

 

-2

 

 

 

 

-3

 

 

 

 

-4

 

 

 

 

 

1

1/

2

 

0

 

 

 

 

b

 

 

 

 

 

-1

 

 

 

 

 

-2

 

 

 

 

 

-3

 

 

 

 

 

-4

 

 

 

 

 

0

1

2

1/

3

4

0

 

 

 

 

d

 

 

 

 

 

-1

 

 

 

 

 

-2

 

 

 

 

 

-3

 

 

 

 

 

-4

 

 

 

 

 

0

1

 

1/2

2

 

Fig. 8.2. Asymmetric likelihood functions and parametrizations.

8.3.3 Incompatible Measurements

Before we rely on a mean value computed from the results of di erent experiments we should make sure that the various input data are statistically compatible. What we mean with compatible is not obvious at this point. It will become clearer in Chap. 10, where we discuss significance tests which lead to the following plausible procedure that has proven to be quite useful in particle physics [24].

We compute the weighted mean value ˜ of the results and form the sum of the

θ N

quadratic deviations of the individual measurements from their average, normalized to their expected errors squared:

X

2 ˜ 2 2

χ = (θi θ) /δi .

The expectation value of this quantity is N − 1 if the deviations are normally distributed with variances δi2. If χ2 is sizably (e.g. by 50%) higher than N − 1, then we can suspect that at least one of the experiments has published a wrong value, or what is more likely, has underestimated the error, for instance when systematic errors have not been detected. Under the premise that none of the experiments can be discarded a priori, we scale-up all declared errors by a common scaling factor

210 8 Interval Estimation

S = χ2/(N − 1) and publish this factor together with mean value and the scaled error. Large scaling factors indicate problems in one or several experiments.

A similar procedure is applied if the errors are asymmetric even though the condition of normality then obviously is violated. We form

χ2 = Xi − θ˜)2i2± ,

 

˜

˜

where δi+ and δi, respectively, are valid for θi < θ and θi > θ.

8.3.4 Error Propagation for a Scalar Function of a Single Parameter

If we have to propagate the MLE and its error limits of a parameter θ to another parameter θ= θ(θ), we should apply the direct functional relation which is equivalent to a transformation of the likelihood function:

 

θˆ= θˆ) ,

θˆ

+ δ+

= θˆ + δ+) ,

θˆ

− δ

= θˆ − δ) .

Here we have assumed that θ(θ) is monotonically increasing. If it is decreasing, the arguments of θhave to be interchanged.

The errors of the output quantity are asymmetric either because the input errors are asymmetric or because the functional dependence is non-linear. For instance an angular measurement α = 870 ± 10 would transform into sin α = 0.9986+..00080010.

8.3.5 Error Propagation for a Function of Several Parameters

A di cult problem is the determination of the error of a scalar quantity θ(µ) which depends on several measured input parameters µ with asymmetric errors. We have to eliminate nuisance parameters.

If the complete likelihood function ln L(µ) of the input parameters is available, we derive the error limits from the profile likelihood function of θ as proposed in Sect. 6.5.1.

ˆ

has to fulfil the

The MLE of θ is simply θ = θ(µ). The profile likelihood of θ

 

ˆ

 

 

relation ln L(θ) = ln L(θ) − ln L(θ) = ln L(µ) − ln L(µ). To find the two values of

θ for the given

ln L, we have to

find the maximum and the minimum of θ fulfilling

b

limits are the two extreme values of θ

the constraint. The one standard deviation

b

located on the

ln L(θ) = ln L(µb) − ln L(µ) = 1/2 surface6.

There are various numerical methods to compute these limits. Constraint problems are usually solved with the help of Lagrange multipliers. A simpler method is the one which has been followed when we discussed constrained fits (see Sect. 6.6): With an extremum finding program, we minimize

θ(µ) + c [ln L(µb) − ln L(µ) − 1/2]2

6Since we assumed likelihood functions with a single maximum, this is a closed surface, in two dimensions a line of constant height.

8.3 Error Propagation

211

where c is a number which has to be large compared to the absolute change of θ within the ln L = 1/2 region. We get µlow and θlow = θ(µlow) and maximizing

θ(µ) − c [ln L(µb) − ln L(µ) − 1/2]2

we get θup.

If the likelihood functions are not known, the only practical way is to resort to a Bayesian treatment, i.e. to make assumptions about the p.d.f.s of the input parameters. In many cases part of the input parameters have systematic uncertainties. Then, anyway, the p.d.f.s of those parameters have to be constructed. Once we have established the complete p.d.f. f(µ), we can also determine the distribution of θ. The analytic parameter transformation and reduction described in Chap. 3 will fail in most cases and we will adopt the simple Monte Carlo solution where we generate a sample of events distributed according to f(µ) and where θ(µ) provides the θ distribution in form of a histogram and the uncertainty of this parameter. To remain consistent with our previously adopted definitions we would then interpret this p.d.f.

of as a likelihood function and derive from it the MLE ˆ and and the likelihood

θ θ

ratio error limits.

We will not discuss the general scheme in more detail but add a few remarks related to special situations and discuss two simple examples.

Sum of Many Measurements

If the output parameter θ = P ξi is a sum of “many” input quantities ξi with variances σi2 of similar size and their mean values and variances are known, then due to the central limit theorem we have

ˆ h i h i

θ = θ = Σ ξi , δθ2 = σθ2 ≈ Σσi2

independent of the shape of the distributions of the input parameters and the error of θ is normally distributed. This situation occurs in experiments where many systematic uncertainties of similar magnitude enter in a measurement.

Product of Many Measurements

Y

If the output parameter θ = ξi is a product of “many” positive input quantities ξi with relative uncertainties σii of similar size then due to the central limit theorem

hln θi = Σhln ξii , q

σln θ ≈ Σσln2 ξi

independent of the shape of the distributions of the input parameters and the error of ln θ is normally distributed which means that θ follows a log-normal distribution (see Sect. 3.6.10). Such a situation may be realized if several multiplicative e ciencies

212 8 Interval Estimation

number of entries

1000

800

600

400

200

0 0

1

2

3

4

 

 

 

x

 

 

800

 

 

 

 

 

600

 

 

 

 

 

400

 

 

 

 

 

200

 

 

 

 

5

0

-1.0

-0.5

0.0

0.5

 

 

 

 

 

ln(x)

 

Fig. 8.3. Distribution of the product of 10 variates with mean 1 and standard deviation 0.2.

with similar uncertainties enter into a measurement. The distribution of θ is fully specified only once we know the quantities hln ξii and σln ξi . The latter condition educated guess.

will usually not be fulfilled and hln ξii , σln ξi have to be set by some2

 

2

 

 

2

2

In most cases, however, the approximations hθi =

ii and δθ

 

=

 

δi

i

may be adequate. These two quantities fix the log-normal distribution from which

Q

 

 

 

 

P

 

 

we can derive the maximum and the asymmetric errors. If the relative errors are su ciently small, the log-normal distribution approaches a normal distribution and we can simply use the standard linear error propagation with symmetric errors. As always, it is useful to check approximations by a simulation.

Example 118. Distribution of a product of measurements

We simulate the distribution of θ = Q ξi, of 10 measured quantities with mean equal to 1 and standard deviation of 0.2, all normally distributed. The result is very di erent from a Gaussian and is well described by a log-normal distribution as is shown in Fig. 8.3. The mean is compatible with hθi = 1 and the standard deviation is 0.69, slightly larger than the prediction from simple error propagation of 0.63. These results remain the same when we replace the Gaussian errors by uniform ones with the same standard deviation. Thus details of the distributions of the input parameters are not important.

Sum of Weighted Poisson Numbers

If θ = wiηi is a sum of Poisson numbers ηi weighted with wi then we can apply

the

simple linear error propagation rule:

 

P

X

 

 

2

2

 

ˆ

 

wiηi ,

 

θ =

 

δθ

= X wi ηi .

The reason for this simple relation is founded on the fact that a sum of weighted Poisson numbers can be approximated by the Poisson distribution of the equivalent number of events (see Sect. 3.6.3). A condition for the validity of this approximation

+10
−8

8.3 Error Propagation

213

is that the number of equivalent events is large enough to use symmetric errors. If this number is low we derive the limits from the Poisson distribution of the equivalent number of events which then will be asymmetric.

Example 119. Sum of weighted Poisson numbers

Particles are detected in three detectors with e ciencies ε1 = 0.7, ε2 = 0.5, ε3 = 0.9. The observed event counts are n1 = 10, n2 = 12, n3 = 8.

A background contribution is estimated in a separate counting experiment as b = 9 with a reduction factor of r = 2. The estimate nˆ for total number of particles which traverse the detectors is nˆ = P nii − b/r = 43. From linear error propagation we obtain the uncertainty δn = 9. A more precise calculation based on the Poisson distribution of the equivalent number of events would yield asymmetric errors, nˆ = 43 .

Averaging Correlated Measurements

The following example is a warning that naive linear error propagation may lead to false results.

Example 120. Average of correlated cross section measurements, Peelle’s pertinent puzzle

The results of a cross section measurements is ξ1 with uncertainties due to the event count, δ10, and to the beam flux. The latter leads to an error δf ξ which is proportional to the cross section ξ. The two contributions are independent and thus the estimated error squared in the Gaussian approximation is δ12 = δ102 + δf2 ξ12. A second measurement ξ2 with di erent statistics but the same

uncertainty on the flux has an uncertainty δ22 = δ202 + δf2 ξ22. Combining the two measurements we have to take into account the correlation of the errors. In the literature [45] the following covariance matrix is discussed:

 

δ2

+ δ2

ξ2

δ2

ξ1ξ2

 

C =

10

f

1

 

f

 

ξ22

.

δf2 ξ1ξ2

δ202

+ δf2

 

 

It can lead to the strange result that the least square estimate b of the two

ξ

section

cross sections is located outside the range defined by the individual results [46] , e.g. ξ < ξ1, ξ2. This anomaly is known as Peelle’s Pertinent Puzzle [44]. Its reason is that the normalization error is proportional to the true cross and not to the observed one and thus has to be the same for the two

b

measurements, i.e. in first approximation proportional to the estimate ξ of

the true cross section. The correct covariance matrix is

b

 

δ2

+ δ2

b

b

! .

 

 

ξ2

δ2 ξ2

 

C =

10

f

 

f

(8.7)

 

δf2 ξ2

 

δ202 + δf2 ξ2

 

 

b

 

b

 

 

Since the best estimate of ξ cannot depend on the common scaling error it is given by the weighted mean

b

δ−2ξ1 + δ−2ξ2

.

(8.8)

δ10−2

+ δ20−2

ξ =

10

20

 

 

 

 

 

 

214 8 Interval Estimation

The error δ is obtained by the usual linear error propagation,

δ2

=

1

 

2 ξ2.

(8.9)

 

 

 

 

 

 

δ10−2 + δ20−2 + δf b

 

Proof: The weighted mean for b is defined as the combination

 

 

ξ

 

 

 

 

ξ = w ξ

+ w ξ

 

 

b

1 1

2 2

 

which, under the condition w1 +w2 = 1, has minimal variance (see Sect. 4.3):

ξ) = w2C

11

+ w2C

22

+ 2w w C

12

= min .

var(b

1

2

1 2

 

Using the correct C (8.7), this can be written as

var(ξ) =

2

2

2 2

2 ξ2 .

b

w1

δ10

+ (1 − w1) δ20

+ δf b

Setting the derivative with respect to w1 to zero, we get the usual result

 

δ10−2

 

 

 

 

w1 =

 

, w2 = 1 − w1 ,

 

 

δ10−2 + δ20−2

 

 

δ2 = min[var(ξ)] =

δ−2

 

δ−2

2 ξ2

 

10

 

+

20

,

10−2 + δ20−2)2

10−2 + δ20−2)2

b

 

+ δf b

 

proving the above relations (8.8), (8.9).

8.4 One-sided Confidence Limits

8.4.1 General Case

Frequently, we cannot achieve the precision which is necessary to resolve a small physical quantity. If we do not obtain a value which is significantly di erent from zero, we usually present an upper limit. A typical example is the measurement of the lifetime of a very short-lived particle which cannot be resolved by the measurement. The result of such a measurement is then quoted by a phrase like “The lifetime of the particle is smaller than ... with 90 % confidence.” Upper limits are often quoted for rates of rare reactions if no reaction has been observed or the observation is compatible with background. For masses of hypothetical particles postulated by theory but not observed with the limited energy of present accelerators, experiments provide lower limits.

In this situation we are interested in probabilities. Thus we have to introduce prior densities or to remain with likelihood ratio limits. The latter are not very popular. As a standard, we fix the prior to be constant in order to achieve a uniform procedure allowing to compare and to combine measurements from di erent experiments. This means that a priori all values of the parameter are considered as equally likely. As a consequence, the results of such a procedure depend on the choice of the variable. For instance lower limits of a mass um and a mass squared um2 , respectively, would not obey the relation um2 = (um)2. Unfortunately we cannot avoid this unattractive property when we want to present probabilities. Knowing that a

8.4 One-sided Confidence Limits

215

uniform prior has been applied, the reader of a publication can interpret the limit as a sensible parametrization of the experimental result and draw his own conclusions. Of course, it is also useful to present the likelihood function which fully documents the result.

To obtain the p.d.f. of the parameter of interest, we just have to normalize the likelihood function7 to the allowed range of the parameter θ. The probability P {θ < θ0} computed from this density is the confidence level C for the upper limit θ0:

 

θo

L(θ) dθ

 

C(θ0) =

RL(θ) dθ

.

(8.10)

 

R−∞

 

 

 

Lower limits are computed in an analogous way:

 

L(θ) dθ

 

 

θo

 

 

 

Clow0) =

R

L(θ) dθ

.

(8.11)

 

R−∞

 

 

 

Here the confidence level C is given and the relations (8.10), (8.11) have to be solved for θ0.

8.4.2 Upper Poisson Limits, Simple Case

When, in an experimental search for a certain reaction, we do not find the corresponding events, we quote an upper limit for its existence. Similarly in some cases where an experiment records one or two candidate events but where strong theoretical reasons speak against accepting those as real, it is common practice not to quote a rate but rather an upper limit. The result is then expressed in the following way: The rate for the reaction x is less than λ0 with 90 % confidence.

The upper limit is again obtained as above by integration of the normalized likelihood function.

For k observed events, we want to determine an upper limit µ0 with C = 90% confidence for the expectation value of the Poisson rate. The normalization integral over the parameter µ of the Poisson distribution P (k|µ) = eµ µk/k! is equal to one. Thus we obtain:

C = Z0µo P (k|µ)dµ

 

= R0µo ekµ! µk .

(8.12)

The integral is solved by partial integration,

k

 

eµ0 µi

Xi

0

 

C = 1 −

 

i!

 

=0

 

 

 

k

 

 

 

Xi

P (i|µ0) .

= 1 −

=0

 

 

 

7In case the likelihood function cannot be normalized, we have to renounce to produce a p.d.f. and present only the likelihood function.

216 8 Interval Estimation

However, the sum over the Poisson probabilities cannot be solved analytically for µ0. It has to be solved numerically, or (8.12) is evaluated with the help of tables of the incomplete gamma function.

A special role plays the case k = 0, e.g. when no event has been observed. The integral simplifies to:

C = 1 − eµ0 , µ0 = − ln(1 − C) .

For C = 0.9 this relation is fulfilled for µ0 ≈ 2.3.

Remark that for Poisson limits of rates without background the frequentist statistics (see Appendix 13.5) and the Bayesian statistics with uniform prior give the same results. For the following more general situations, this does not hold anymore.

8.4.3 Poisson Limit for Data with Background

When we find in an experiment events which can be explained by a background reaction with expected mean number b, we have to modify (8.12) correspondingly. The expectation value of k is then µ + b and the confidence C is

R µo P (k|µ + b)dµ

C = R0P (k|µ + b)dµ .

0

Again the integrals can be replaced by sums:

Pk

C = 1 − i=0 P (i|µ0 + b) .

Pk P (i|b)

i=0

Example 121. Upper limit for a Poisson rate with background

Expected are two background events and observed are also two events. Thus the mean signal rate µ is certainly small. We obtain an upper limit µ0 for the signal with 90% confidence by solving numerically the equation

 

P

2

 

 

 

0.9 = 1

i=0 P (i|µ0 + 2)

.

 

P

2

|

 

 

 

i=0 P (i 2)

We find µ0 = 3.88. The Bayesian probability that the mean rate µ is larger than 3.88 is 10 %. Fig. 8.4 shows the likelihood functions for the two cases b = 2 and b = 0 together with the limits. For comparison are also given the likelihood ratio limits which correspond to a decrease from the maximum by e−2. (For a normal distribution this would be equivalent to two standard deviations).

We now investigate the more general case that both the acceptance ε and the background are not perfectly known, and that the p.d.f.s of the background and the acceptance fb, fε are given. For a mean Poisson signal µ the probability to observe k events is

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]