vstatmp_engl

Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Южный Федеральный Университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

exp(−x2/2)dx

= h1 − erf(sG/√

.pdf

Скачиваний:

Добавлен:

12.03.2016

Размер:

6.43 Mб

Скачать

☆

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 4230 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

Total number of permutations

10.4 Two-Sample Tests

277

ln Lumax = −(a + b) + a ln a + b ln b .

Our test statistic is VAB , the logarithm of the likelihood ratio, now summed over all bins:

VAB = ln Lcmax − ln Lumax

= X (ai + bi) ln ai + bi − ai ln ai − bi ln bi + bi ln r . 1 + r

Note that VAB (r) = VBA(1/r), as it should.

Now we need a method to determine the expected distribution of the test statistic VAB under the assumption that both samples originate from the same population.

To generate a distribution from a sample, the so-called bootstrap method [64](see Chap. 12.2) has been developed. In our situation a variant of it, a simple permutation method is appropriate.

We combine the two samples to a new sample with M +N elements and form new pairs of samples, the bootstrap samples, with M and N elements by permutation: We draw randomly M elements from the combined sample and associate them to A and the remaining elements to B. Computationally this is easier than to use systematically all individual possibilities. For each generated pair i we determine the statistic Vi. This procedure is repeated many times and the values Vi form the reference distribution. Our experimental p-value is equal to the fraction of generated Vi which are larger than VAB:

p = Number of permutations with Vi > VAB .

10.4.4 The Kolmogorov–Smirnov Test

Also the Kolmogorov–Smirnov test can easily be adapted to a comparison of two

samples. We construct the test statistic in an analogous way as above. The test p

statistic is D = D Neff , where D is the maximum di erence between the two empirical distribution functions SA, SB, and Neff is the e ective or equivalent number of events, which is computed from the relation:

1 = 1 + 1 . Neff N M

In a similar way other EDF multi-dimensional tests which we have discussed above can be adjusted.

10.4.5 The Energy Test

For a binning-free comparison of two samples A and B with M and N observations we can again use the energy test [63] which in the multi-dimensional case has only few competitors.

We compute the energy φAB in the same way as above, replacing the Monte Carlo sample by one of the experimental samples. The expected distribution of the test statistic φAB is computed in the same way as for the likelihood ratio test from

278 10 Hypothesis Tests

-2

number of entries

-2	0	2
	x

400

observed

200

energy

0	-1.8	-1.6	-1.4

energy

Fig. 10.15. Two-sample test. Left hand: the samples which are to be compared. Right hand: distribution of test statistic and actual value.

the combined sample using the bootstrap permutation technique. Our experimental p-value is equal to the fraction of generated φi from the bootstrap sample which are larger than φAB :

p = Number of permutations with φi > φAB . Total number of permutations

Example 132. Comparison of two samples

We compare two two-dimensional samples with 15 and 30 observations with the energy test. The two samples are depicted in a scatter plot at the left hand side of Fig. 10.15. The energy of the system is φAB = −1.480 (The negative value arises because we have omitted the term φ3). From the mixed sample 10000 sample combinations have been selected at random. Its energy distribution is shown as a histogram in the ﬁgure. The arrow indicates the location of φAB. It corresponds to a p-value of 0.06. We can estimate the error of the p-value p computing it from many permutation sets each with a smaller number of permutations. From the variation of p from 100 times 100 permutations we ﬁnd δp = 0.02. The p-value is small, indicating that the samples belong to di erent populations. Indeed they have been drawn from di erent distributions, a uniform distribution, −1.5 < x, y < 1.5 and a normal distribution with standard deviations σx = σy = 1.

10.4.6 The k-Nearest Neighbor Test

The k-nearest neighbor test is per construction a two-sample test. The distribution of the test statistic is obtained in exactly the same way as in the two-sample energy test which we have discussed in the previous section.

The performance of the k-nearest neighbor test is similar to that of the energy test. The energy test (and the L2 test which is automatically included in the former) is more ﬂexible than the k-nearest neighbor test and includes all observation of the

10.5 Signiﬁcance of Signals

279

sample in the continuous distance function. The k-nearest neighbor test on the other hand is less sensitive to variations of the density which are problematic for the energy test with the Gaussian distance function of constant width.

10.5 Signiﬁcance of Signals

10.5.1 Introduction

Tests for signals are closely related to goodness-of-ﬁt tests but their aim is di erent. We are not interested to verify that H0 is compatible with a sample but we intend to quantify the evidence of signals which are possibly present in a sample which consists mainly of uninteresting background. Here not only the distribution of the background has to be known but in addition we must be able to parameterize the alternative which we search for. The null hypothesis H0 corresponds to the absence of deviations from the background. The alternative Hs is not fully speciﬁed, otherwise it would be su cient to compute the simple likelihood ratio which we have discussed in Chap. 6.

Signal tests are applied when we search for rare decays or reactions like neutrino oscillations. Another frequently occurring problem is that we want to interpret a line in a spectrum as indication for a resonance or a new particle. To establish the evidence of a signal, we usually require a very signiﬁcant deviation from the null hypothesis, i.e. the sum of background and signal has to describe the data much better than the background alone because particle physicists look in hundreds of histograms for more or less wide lines and thus always ﬁnd candidates9 which in most cases are just background ﬂuctuations. For this reason, signals are only accepted by the community if they have a signiﬁcance of at least four or ﬁve standard deviations. In cases where we search more speciﬁcally for a certain phenomenon a smaller signiﬁcance may be su cient. A high signiﬁcance for a signal corresponds to a low p-value of the null hypothesis.

To quote the p-value instead of the signiﬁcance as expressed by the number of standard deviations by which the signal exceeds the background expectation is to be preferred because it is a measure which is independent of the form of the distribution. However, the standard deviation scale is better suited to illustrate the signiﬁcance than the p-values scale where very small values dominate. For this reason it has become customary to transform the p-value p into the number of Gaussian standard deviations sG which are related through

The function sG(p) is given in Fig. 10.16. Relations (10.23), (10.24) refer to one-sided tests. For two-sided tests, p has to be multiplied by a factor two.

When we require very low p-values for H0 to establish signals, we have to be especially careful in modeling the distribution of the test statistic. Often the distribution corresponding to H0 is approximated by a polynomial and/or a signal by a Gaussian

9This is the so-called look-else-where e ect.

280 10 Hypothesis Tests

deviations	4
	3

standard	2
standard	1
	1
	0	1E-4	1E-3	0.01	0.1	1
	1E-5	1E-4	1E-3	0.01	0.1	1
			p-value
deviations	10
	9
	8
	7
	6
standard	6
	5
	4
	3
	3
	2
	1
	0	1E-15		1E-10	1E-5	1
	1E-20	1E-15		1E-10	1E-5	1
			p-value

Fig. 10.16. Transformation of p-values to one-sided number of standard deviations.

with some uncertainties in the parameters and assumptions which are di cult to implement in the test procedure. We then have to be especially conservative. It is better to underestimate the signiﬁcance of a signal than to present evidence for a new phenomenon based on a doubtful number.

To illustrate this problem we return to our standard example where we search for a line in a one-dimensional spectrum. Usually, the background under an observed bump is estimated from the number of events outside but near the bump in the so-called side bands. If the side bands are chosen too close to the signal they are a ected by the tails of the signal, if they are chosen too far away, the extrapolation into the signal region is sensitive to the assumed shape of the background distribution which often is approximated by a linear or quadratic function. This makes it di cult to estimate the size and the uncertainty of the expected background with su cient accuracy to establish the p-value for a large (>4 st. dev.) signal. As numerical example let

10.5 Signiﬁcance of Signals

281

us consider an expectation of 1000 background events which is estimated by the experimenter too low by 2%, i.e. equal to 980. Then a 4.3 st. dev. excess would be claimed by him as a 5 st. dev. e ect and he would ﬁnd too low a p-value by a factor of 28. We also have to be careful with numerical approximations, for instance when we approximate a Poisson distribution by a Gaussian.

Usually, the likelihood ratio, i.e. the ratio of the likelihood which maximizes Hs and the maximum likelihood for H0 is the most powerful test statistic. In some situations a relevant parameter which characterizes the signal strength is more informative.

10.5.2 The Likelihood Ratio Test

Deﬁnition

An obvious candidate for the test statistic is the likelihood ratio (LR) which we have introduced and used in Sect. 10.3 to test goodness-of-ﬁt of histograms, and in Sect. 10.4 as a two-sample test. We repeat here its general deﬁnition:

λ= sup [L0(θ0|x)] , sup [Ls(θs|x)]

ln λ = ln sup [L0(θ0|x)] − ln sup [Ls(θs|x)]

where L0, Ls are the likelihoods under the null hypothesis and the signal hypothesis, respectively. The supremum is to be evaluated relative to the parameters, i.e. the likelihoods are to be taken at the MLEs of the parameters. The vector x represents the sample of the N observations x1, . . . , xN of a one-dimensional geometric space. The extension to a multi-dimensional space is trivial but complicates the writing of the formulas. The parameter space of H0 is assumed to be a subset of that of Hs. Therefore λ will be smaller or equal to one.

For example, we may want to ﬁnd out whether a background distribution is described signiﬁcantly better by a cubic than by a linear distribution:

f0	= α0	+ α1x ,	(10.25)
fs = α0 + α1x + α2x2 + α3x3 .

We would ﬁt separately the parameters of the two functions to the observed data and then take the ratio of the corresponding maximized likelihoods.

Frequently the data sample is so large that we better analyze it in form of a histogram. Then the distribution of the number of events yi in bin i, i = 1, . . . , B can be approximated by normal distributions around the parameter dependent predictions ti(θ). As we have seen in Chap. 6, Sect. 6.5.6 we then get the log-likelihood

ln L =

−

[yi − ti]2

+ const.

2 ln L. In this

limit

the likelihood

which is equivalent to the χ

2 statistic, χ2

≈ −

, of the

ratio statistic is equivalent to the

di erence, Δχ

= min χ

0 −

min χ

deviations, min χ0 with the parameters adjusted to the null hypothesis H0, and

min χs2

with its parameters adjusted to the alternative hypothesis Hs, background

plus signal:

282 10 Hypothesis Tests
ln λ = ln sup [L0(θ0\|y)] − ln sup [Ls(θs\|y)]			(10.26)
≈ −	1	(min χ02 − min χs2) .	(10.27)
	2

The p-value derived from the LR statistic does not take into account that a simple hypothesis is a priori more attractive than a composite one which contains free parameters. Another point of criticism is that the LR is evaluated only at the parameters that maximize the likelihood while the parameters su er from uncertainties. Thus conclusions should not be based on the p-value only.

A Bayesian approach applies so-called Bayes factors to correct for the mentioned e ects but is not very popular because it has other caveats. Its essentials are presented in the Appendix 13.14

Distribution of the Test Statistic

The distribution of λ under H0 in the general case is not known analytically; however, if the approximation (10.27) is justiﬁed, the distribution of −2 ln λ under certain additional regularity conditions and the conditions mentioned at the end of Sect. 10.3.3 will be described by a χ2 distribution. In the example corresponding to relations (10.25) this would be a χ2 distribution of 2 degrees of freedom since fs compared to f0 has 2 additional free parameters. Knowing the distribution of the test statistic reduces the computational e ort required for the numerical evaluation of p-values considerably.

Let us look at a speciﬁc problem: We want to check whether an observed bump above a continuous background can be described by a ﬂuctuation or whether it corresponds to a resonance. The two hypotheses may be described by the distributions

f0	= α0	+ α1x + α2x2	,	(10.28)
fs = α0 + α1x + α2x2 + α3N(x\|µ, σ) ,

and we can again use ln λ or Δχ2 as test statistic. Since we have to deﬁne the test before looking at the data, µ and σ will be free parameters in the ﬁt of fs to the data. Unfortunately, now Δχ2 no longer follows a χ2 distribution of 3 degrees of freedom and has a signiﬁcantly larger expectation value than expected from the χ2 distribution. The reason for this dilemma is that for α3 = 0 which corresponds to H0 the other parameters µ and σ are undeﬁned and thus part of the χ2 ﬂuctuation in the ﬁt to fs is unrelated to the di erence between fs and f0.

More generally, only if the following conditions are satisﬁed, Δχ2 follows in the large number limit a χ2 distribution with the number of degrees of freedom given by the di erence of the number of free parameters of the null and the alternative hypotheses:

1.The distribution f0 of H0 has to be a special realization of the distribution fs of

Hs.

2.The ﬁtted parameters have to be inside the region, i.e. o the boundary, allowed by the hypotheses. For example, the MLE of the location of a Gaussian should not be outside the range covered by the data.

3.All parameters of Hs have to be deﬁned under H0.

10.5 Signiﬁcance of Signals

283

	4000
events	3000
	3000
of
number	2000
number
	1000
	0	0	2	4	6

-ln(LR)

	1
	0.1
p-value	0.01
p-value	1E-3
	1E-4
	1E-5
8	1E-60	5	10	15

-ln(LR)

Fig. 10.17. Distributions of the test statistic under H0 and p-value as a function of the test statistic.

If one of these conditions is not satisﬁed, the distribution of the test statistic has to be obtained via a Monte Carlo simulation. This means that we generate many ﬁctive experiments of H0 and count how many of those have values of the test statistic that exceed the one which has actually been observed. The corresponding fraction is the p-value for H0. This is a fairly involved procedure because each simulation includes ﬁtting of the free parameters of the two hypotheses. In Ref. [65] it is shown that the asymptotic behavior of the distribution can be described by an analytical function. In this way the amount of simulation can be reduced.

Example 133. Distribution of the likelihood ratio statistic

We consider a uniform distribution (H0) of 1000 events in the interval [0, 1] and as alternative a resonance with Gaussian width, σ = 0.05, and arbitrary location µ in the range 0.2 ≤ µ ≤ 0.8 superposed to a uniform distribution. The free parameters are ε, the fraction of resonance events and µ. The logarithm of the likelihood ratio statistic is

ln λ = ln sup [L0(θ0|x)] − ln sup [Ls(θs|x)]

i=1

− i=1

− √2πσ

−

2σ2

1000

εˆ

(xi − µˆ)2

= ln(1)

εˆ +

−

exp

− i=1

− √2πσ

2σ2

1000

ln 1

εˆ +

εˆ

exp

(xi − µˆ)2

essentially the negative logarithm of the likelihood of the MLE. Fig. 10.17 shows the results from a million simulated experiments. The distribution of − ln λ under H0 has a mean value of −1.502 which corresponds to hΔχ2i = 3.004. The p-value as a function of − ln λ follows asymptotically an exponential as is illustrated in the right hand plot of Fig. 10.17. Thus it is possible to extrapolate the function to smaller p-values which is necessary to

284 10 Hypothesis Tests

	40
of events	30
of events	20
number	20
number
	10
	0	0.2	0.4	0.6	0.8	1.0
	0.0	0.2	0.4	0.6	0.8	1.0
				energy

Fig. 10.18. Histogram of event sample used for the likelihood ratio test. The curve is an unbinned likelihood ﬁt to the data.

claim large e ects. Figure 10.18 displays the result of an experiment where a likelihood ﬁt ﬁnds a resonance at the energy 0.257. It contains a fraction of 0.0653 of the events. The logarithm of the likelihood ratio is 9.277. The corresponding p-value for H0 is pLR = 1.8 · 10−4. Hence it is likely that the observed bump is a resonance. In fact it had been generated as a 7 % contribution of a Gaussian distribution N(x|0.25, 0.05) to a uniform distribution.

We have to remember though that the p-value is not the probability that H0 is true, it is the probability that H0 simulates the resonance of the type seen in the data. In a Bayesian treatment, see Appendix 13.14, we ﬁnd betting odds in favor of H0 of about 2% which is much less impressive. The two numbers refer to di erent issues but nonetheless we have to face the fact that the two di erent statistical approaches lead to di erent conclusions about how evident the existence of a bump really is.

In experiments with a large number of events, the computation of the p-value distribution based on the unbinned likelihood ratio becomes excessively slow and we have to turn to histograms and to compute the likelihood ratio of H0 and Hs from the histogram. Figure 10.19 displays some results from the simulation of 106 experiments of the same type as above but with 10000 events distributed over 100 bins.

In the ﬁgure also the distribution of the signal fraction under H0 and for experiments with 1.5% resonance added is shown. The large spread of the signal distributions reﬂects the fact that identical experiments by chance may observe a very signiﬁcant signal or just a slight indication of a resonance.

10.5 Signiﬁcance of Signals

285

experimentsof			H
	20000		0
	20000

number	10000		1.5 % resonance added
			1.5 % resonance added

	0	0	5	10	15
	1
	0.1
p-value	0.01
p-value	1E-3
	1E-3
	1E-4
	1E-5
		0	5 -ln (LR)	10	15

Fig. 10.19. Distributions of the test statistic under H0 and p-value as a function of the test statistic. In the upper graph also the distribution for experiments with a resonance contribution is shown.

General Multi-Channel Case

It is easy to extend the likelihood ratio test to the multi-channel case. We assume that the observations xk of the channels k = 1, . . . , K are independent of each other. The overall likelihood is the product of the individual likelihoods. For the log-likelihood ratio we then have to replace (10.26) by

ln λ = {ln sup [L0k(θ0k|xk)] − ln sup [Lsk(θsk|xk)]} .

k=1

As an example, we consider an experiment where we observe bumps at the same mass in K di erent decay channels, bumps which are associated to the same phenomenon, i.e. a particle decaying into di erent secondaries.

286 10 Hypothesis Tests

When we denote the decay contribution into channel k by εk, the p.d.f. of the decay distribution by fk(xk|θk) and the corresponding background distributions by f0k(xk|θ0k), the distribution under H0 is

f0(x1, . . . , xK |θ01, . . . , θ0K ) = f0k(xk|θ0k)

k=1

and the alternative signal distribution is

fs(x1, . . . , xK |θ01, . . . , θ0K ; θ1, . . . , θK ; ε1, . . . , εK ) =

[ (1 − εk)f0k(xk|θ0k) + εkfk(xk|θk)] .

k=1

The likelihood ratio is then			0k) + εˆkfk(xk\|θk)io .
ln λ = k=1 nln f0k(xk\|θ0k) − ln h(1 − εˆk)f0k(xk\|θ′
K	b	b	b
X

Remark, that the MLEs of the parameters θ0k depend on the hypothesis. They are di erent for the null and the signal hypotheses and, for this reason, have been marked by an apostrophe in the latter.

10.5.3 Tests Based on the Signal Strength

Instead of using the LR statistic it is often preferable to use a parameter of Hs as test statistic. In the simple example of (10.25) the test statistic t = α3 would be a sensible choice. When we want to estimate the signiﬁcance of a line in a background distribution, instead of the likelihood ratio the number of events which we associate to the line (or the parameter α3 in our example (10.28)) is a reasonable test statistic. Compared to the LR statistic it has the advantage to represent a physical parameter but usually the corresponding test is less powerful.

Example 134. Example 133 continued

Using the ﬁtted fraction of resonance events as test statistic, the p-value for H0 is pf = 2.2 · 10−4, slightly less stringent than that obtained from the LR. Often physicists compare the number of observed events directly to the prediction from H0. In our example we have 243 events within two standard deviations around the ﬁtted energy of the resonance compared to the expectation of 200 from a uniform distribution. The probability to observe ≥ 243 for a Poisson distribution with mean 200 is pp = 7.3 · 10−4. This number cannot be compared directly with pLR and pf because the latter two values include the look-else-where e ect, i.e. that the simulated resonance may be located at an arbitrary energy. A lower number for pp is obtained if the background is estimated from the side bands, but then the computation becomes more involved because the error on the expectation has to be included. Primitive methods are only useful for a ﬁrst crude estimate.

We learn from this example that the LR statistic provides the most powerful test among the considered alternatives. It does not only take into account the excess of events of a signal but also its expected shape. For this reason pLR is smaller than pf .

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 4230 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
01.05.2025266.01 Кб0Voprosy_po_istorii_Rossii_s_otvetami.docx
#
27.09.2019275.97 Кб28voprosy_po_spets_kursu_k_zachetu.doc
#
23.09.2019653.82 Кб126Voprosy_vnimanie_i_pamyat_1.doc
#
01.03.2025394.96 Кб3Voprosy_ya_10-18.docx
#
02.08.201928.44 Кб20Vopros_10.docx
#
12.03.20166.43 Mб22vstatmp_engl.pdf
#
13.02.20151.12 Mб19Vsya_teoria_k_FAYa.pdf
#
14.11.2019212.99 Кб14Vvedenie_v_ specialnoct_kl.doc
#
11.11.2019739.33 Кб19vvedenie_v_socialno_ekonomicheskuyu_geografiyu.doc
#
08.11.201963.49 Кб3vvodnyy_urok_10_kl.doc
#
12.03.20161.99 Mб14web.doc