Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Using and Understanding Medical Statistics_Matthews, Farewell_2007

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
3.03 Mб
Скачать

to marrow cell dose in units of 108 cells per kilogram of body weight, patient age in decades, extent of prior blood units transfused, transplant year (minus 1970) and preceding androgen treatment. The prior blood unit variable was zero if the patient received less than 10 whole blood units prior to transplantation, and one otherwise. The androgen variable was zero if the patient had not received androgen previously, and one otherwise. Let us call these five variables X1, X2, X3, X4, and X5. A convenient, shorthand notation for the set of variables {X1, X2, X3, X4, X5} is X. Remember, also, that we denote a particular value for

˜

a variable by the corresponding lower case letter, viz x = {x1, x2, x3, x4, x5}.

˜

11.2. Logistic Regression

Since Y can assume only two possible values, it would be unrealistic to entertain a linear regression model such as

Y=a +b1X1 + ... +b5X5 =a + bi Xi .

i =15

Theoretically, the right-hand side of this equation can take any value between minus infinity (–G) and plus infinity (+G) unless we restrict the values of a and the regression coefficients b1, …, b5.

In a linear regression model, the expression a + biXi is assumed to be the expected value of a normal distribution. The expected value of a binary variable such as Y turns out to be the probability that Y = 1. Thus, it is more reasonable to consider a regression model which involves the probability of graft rejection, i.e., Pr(Y = 1). A probability lies between zero and one, and this is still too narrow a range of values for the expression a + biXi. However, if a probability, say p, is between zero and one, then p/(1 – p) belongs to the interval (0, G) and log{p/(1 – p)} belongs to the interval (–G, G). This is the same range of values to which the expression a + biXi belongs.

If we represent the probability of graft rejection, Y = 1, by Pr(Y = 1|x) for

 

 

 

 

 

 

˜

an individual with covariate values x, then a binary logistic regression model

for Y is specified by the equation

˜

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5

 

Pr(Y =1| x)

 

Pr(Y =1| x)

=a + bi xi .

log

 

 

=log

 

 

 

 

 

 

 

 

i =1

1 Pr(Y =1| x)

Pr(Y =0 | x)

 

 

 

 

 

 

 

 

 

 

 

 

 

An equivalent way of specifying the model is via the equation

Pr(Y =1| x) =

5

 

 

exp(a + bi xi )

 

 

i =1

,

 

5

1+exp(a + bi xi )

 

 

i =1

 

Logistic Regression

129

Table 11.2. Graft rejection status and marrow cell dose data for 68 aplastic anemia patients

Graft rejection

Marrow cell dose (108 cells/kg)

Total

 

<3.0

63.0

 

 

 

 

 

Yes

17

4

21

No

19

28

47

 

 

 

 

Total

36

32

68

 

 

 

 

which reveals that the model links the linear expression a + bixi to the probability of graft rejection. Now, as in linear regression, if bi is zero, then the factor represented by Xi is not associated with graft rejection. As in the case of linear regression analysis (see chapter 10), a suitable statistic for testing the hypothesis that the regression coefficient, bi, equals zero is

ˆ

T = | bi | ˆ . est. standard error(bi )

Occasionally, the results of an analysis may be presented in terms of the ratio

ˆ

bi ˆ , est. standard error(bi )

which is equal to T, apart from the sign. This is the situation in table 11.1. In either case, the conclusion regarding the covariate represented by Xi is the same.

In table 11.1, the largest ratio is associated with marrow cell dose (p = 0.004). Transplant year has a significant effect on graft rejection, with a p-val- ue of 0.02. The other covariates are not significant at the 5% level, although the associated p-values are all less than 0.10. Remember that a test of the hypothesis that a certain regression coefficient is zero is a test for the importance of the corresponding covariate, having adjusted for all the other variables in the regression model. For example, the effect of transplant year cannot be attributed to a change in marrow cell dose values with time, since marrow cell dose is included in the model when we test the hypothesis b4 = 0, i.e., the covariate representing transplant year is not associated with graft rejection.

Details of the calculations that are involved in estimating a binary logistic regression model, and that are known as maximum likelihood estimation, are beyond the scope of this book. To actually use this methodology to analyze a particular set of data, it would be necessary to consult a statistician. However, we hope our brief discussion of the binary logistic regression model has been

11 Binary Logistic Regression

130

Table 11.3. A logistic regression analysis of graft rejection and marrow cell dose in 68 aplastic anemia patients

Regression

Estimate

Estimated

Test statistic

coefficient

 

standard error

 

 

 

 

 

a

–1.95

0.53

b

1.83

0.63

2.90 (p = 0.004)

 

 

 

 

informative, and will permit readers to appraise the use of this technique in published papers critically.

There are many aspects which are common to the use of quite different regression models. For this reason, we have minimized our discussion of binary logistic regression; much of the discussion in chapters 12–15 is equally relevant to logistic regression. Nevertheless, as another illustration of this methodology, and as a means of addressing a topic which we have thus far neglected, we discuss the application of logistic regression to 2 ! 2 tables in the next section.

11.3. Estimation in 2 ! 2 Tables

The discussion of 2 ! 2 tables in chapters 2 through 5 concentrates on the concept of a significance test. This emphasis was adopted for pedagogical purposes, and we now turn to the equally important problem of estimation in 2 ! 2 tables. This can be done within the framework of the binary logistic regression model.

Consider the data presented in table 11.2 concerning graft rejection in 68 aplastic anemia patients; each marrow cell dose is recorded as being one of two types, namely either less than or at least 3.0 ! 108 cells/kg. Let Y represent graft rejection as in §§11.1, 11.2 and let X be a binary covariate, where X = 1 corresponds to a low marrow cell dose and X = 0 indicates a higher dose. A binary logistic regression model for graft rejection and marrow cell dose is specified by the equation

Pr(Y =1| x) =

 

exp(a +bx)

.

(11.1)

 

 

1

+exp(a +bx)

 

Therefore, the probability of graft rejection for a high marrow cell dose (X =

0)is exp(a)/{1 + exp(a)}; the corresponding probability for the lower dose (X =

1)is exp(a + b)/{1 + exp(a + b)}.

Estimation in 2 ! 2 Tables

131

The estimation of b is of major importance in studying the influence of marrow cell dose on graft rejection. Table 11.3 presents the estimation of model (11.1) for the data in table 11.2. A test of the hypothesis that b, the regression coefficient, equals zero is based on the observed value of the test statistic T =

ˆ ˆ

|b|/{est. standard error(b)} which equals 2.90. Since this observed value exceeds the 5% critical point given in table 8.1, there is evidence to contradict the hypothesis that marrow cell dose does not influence graft rejection, i.e., the hypothesis that b equals zero.

ˆ

The larger b is, the larger is the estimated effect of a low marrow cell dose

ˆ

on graft rejection. As we saw in chapter 8, b is only a single number, and if we wish to estimate b, a confidence interval should also be calculated. A 95% confidence interval for b is defined to be

ˆ ± ˆ b 1.96{est. standard error(b)},

and can be represented by the interval (bL, bH).

The simplest way to think about b is in terms of an odds ratio. If p represents the probability of graft rejection, then p/(1 – p) is called the odds in favor of rejection. Now, let p1 represent the probability of rejection for the higher marrow cell dose and p2 the corresponding probability for the lower dose; then the ratio

p2/(1 p2 )

p1/(1 p1)

is called the odds ratio (OR). If equation (11.1) is used to define p1 and p2, then it turns out that

OR = eb.

Since a 95% confidence interval for b is (bL, bH), the corresponding interval for the odds ratio, OR, is

(ebL, ebH),

with an estimate of OR being ORˆ

ˆ

= exp(b).

 

 

 

 

 

 

 

ˆ

For a simple 2 ! 2 table, the formula for b can be stated explicitly. If the

table is of the form shown in table 11.4, then

ˆ

 

 

 

ps

 

 

 

 

 

 

 

 

ps

 

 

 

b =log

 

and OR =

 

.

 

 

 

 

 

 

 

 

qr

 

 

 

qr

 

 

 

ˆ

In addition, an approximation to the estimated standard error of b, based on the logistic regression model, is

 

1

 

1

 

1

 

12

 

 

 

 

1

 

 

+

 

+

 

+

 

 

.

 

 

 

 

 

 

 

q

 

r

 

 

p

 

 

 

 

s

11 Binary Logistic Regression

132

Table 11.4. The format of a 2 ! 2 table in which the estimate of the odds ratio is ps/qr; the symbols – and + indicate absence and presence, respectively

Factor 2

Factor 1

 

 

+

 

 

 

p

q

+

r

s

 

 

 

This estimate can be somewhat too small, but it is convenient for quick calculation. In our example, the approximate value is 0.63, which is equal to the estimate of 0.63 from the logistic regression analysis.

In chapter 5, we discussed the use of stratification to adjust the test for no association in a 2 ! 2 table for possible heterogeneity in the study population. In general, regression models are designed to make such adjustments more efficiently, but the stratification approach can be viewed as a special case of a regression model.

In the regression model for graft rejection, represented by the equation

Pr(Y =1| x) =

 

exp(a +bx)

,

 

 

1

+exp(a +bx)

the parameter a determines the probability of graft rejection in individuals with X = 0, while b measures the change in this probability if X = 1. Other factors relevant to graft rejection would alter the overall probability of rejection in subgroups of the data. For example, table 11.2 can be subdivided into two 2 ! 2 tables by stratifying according to transplant year (After 1972 – No or Yes). This is done in table 11.5. If these two tables are numbered 1 and 2, then we would define the two logistic regression models

 

 

 

 

 

exp(a1 +bx)

 

 

 

 

 

 

 

 

 

+exp(a1 +bx)

1

 

 

 

 

 

 

Pr(Y =1| x) =

 

 

 

 

 

 

 

 

 

exp(a2 +bx)

 

 

 

 

 

 

 

 

 

+exp(a2 +bx)

1

for table 1,

(11.2)

for table 2.

In these models, the probability of rejection changes from table 1 to table 2 because the parameter a varies; however, the odds ratio parameter b, which measures the association between marrow cell dose and graft rejection, is assumed not to change. This type of model underlies the approach to combining 2 ! 2 tables which we described in chapter 5. For any subclassifications of the popu-

Estimation in 2 ! 2 Tables

133

Table 11.5. Graft rejection and marrow cell dose data in 68 aplastic anemia patients stratified by year of transplant

Graft

Transplant year after 1972

 

 

 

 

rejection

 

 

 

 

 

 

 

no

 

 

 

yes

 

 

 

 

 

 

 

 

 

marrow cell dose (108 cells/kg)

 

marrow cell dose (108 cells/kg)

 

<3.0

63.0

total

<3.0

63.0

total

 

 

 

 

 

 

 

Yes

4

2

6

13

2

15

No

9

16

25

10

12

22

 

 

 

 

 

 

 

Total

13

18

31

23

14

37

 

 

 

 

 

 

 

 

 

Table 1

 

 

 

Table 2

 

 

 

 

 

 

 

 

 

 

Table 11.6. A logistic regression analysis of graft rejection and marrow cell dose, stratified by year of transplant

Regression

Estimate

Estimated

Test

coefficient

 

standard error

statistic

 

 

 

 

a1

–2.37

0.65

a2

–1.54

0.59

b

1.72

0.64

2.69 (p = 0.007)

 

 

 

 

lation, including matched pairs, we can specify logistic regression models for each subgroup by using different a parameters and the same b parameter. Based on these models, b is estimated in order to study the association of interest.

Table 11.6 presents the estimation of model (11.2). The estimate of b is 1.72 and the observed value of the statistic used to test for no association is 1.72/0.64 = 2.69. This result is consistent with the unstratified analysis which we discussed earlier.

Table 11.6 also records estimates of a1 and a2. As the number of subgroups becomes large, problems do arise in estimating the a parameters. Specialized methodology for estimating b, alone, does exist and should be used in these situations; however, the details of this methodology are beyond the scope of this book. For our purposes, the nature of the logistic regression model, and

ˆ

the use of b to test for association, are more important.

Finally, we note that it is possible to test the assumption that the odds ratio, exp(b), is the same in the stratified 2 ! 2 tables. If there are only a few tables,

11 Binary Logistic Regression

134

Table 11.7. Data from a study of fetal mortality and prenatal care (L = Less, M = More) in two clinics

Prenatal care

Clinic 1

 

 

Clinic 2

 

 

 

L

M

 

L

M

 

 

 

 

 

Died

12

16

34

4

Survived

176

293

197

23

 

 

 

 

 

 

 

we can calculate separate interval estimates of b for each table and see if they overlap. More formal tests are rather complicated, but should be carried out if the assumption that b is constant is thought to be questionable in the least. A statistician should be consulted concerning these tests, and the appropriate way to proceed with the analysis, if the assumption that b is the same in the stratified 2 ! 2 tables is not supported by the data.

Comment:

Readers who dip into the epidemiological literature will observe that logistic regression is frequently being used to analyze case-control studies. In this literature, it is common to see exp(b) referred to as a relative risk. For the model defined in equation (11.1), the relative risk associated with a low marrow cell dose would be Pr(Y = 1 X = 1)/Pr(Y = 1 X = 0), or p2/p1 in our later notation. For a rare disease, the type usually investigated via a case-control study, p1 and p2 are small and therefore 1 – p1 and 1 – p2 are close to 1. In this situation, there is little difference between the odds ratio {p2/(1 – p2)}/{p1/(1 – p1)} and the relative risk p2/p1. Therefore, epidemiologists frequently ignore the approximation which is involved, and refer to estimates of odds ratios from a case-control study as estimates of relative risks. It is also worth noting that the application of logistic regression to case-control studies involves certain arguments which go beyond the scope of this book. However, the nature of the conclusions arising from such an approach will be as we have presented them in this chapter.

11.4. Reanalysis of a Previous Example

In chapter 5, we discussed data concerning fetal mortality and prenatal care (L { less, M { more) in two clinics. The data are summarized in the 2 ! 2 tables shown in table 11.7. Logistic regression analyses of these data, based on the unstratified model (see equation 11.1) and the stratified model

Reanalysis of a Previous Example

135

Table 11.8. Two logistic regression analyses of fetal mortality and prenatal care

 

Regression

Estimate

Estimated

Test statistic

 

coefficient

 

standard error

 

 

 

 

 

 

Unstratified

a

–2.76

0.23

model

b

0.67

0.28

2.39 (p = 0.017)

 

 

 

 

 

Stratified

a1

–2.88

0.24

model

a2

–1.89

0.35

 

b

0.15

0.33

0.45 (p = 0.65)

 

 

 

 

 

(see equation 11.2), are given in table 11.8. As we found in chapter 5, the unstratified model indicates there is a significant association (p = 0.017), whereas the more appropriate stratified analysis suggests that there is no association (p = 0.65) between fetal mortality and the amount of prenatal care received.

Tables 11.3, 11.6 and 11.8 illustrate that a stratified analysis, although generally appropriate, may or may not lead to different conclusions than an unstratified analysis. It is the potential for different conclusions that makes the adjustment for heterogeneity in a population important.

11.5. The Analysis of Dose-Response Data

As we have already seen in previous sections of this chapter, the binary logistic regression model is ideal for analyzing the dependence of a binary response variable on a set of explanatory variables, or covariates. Therefore, we wish to emphasize that the method of analysis which is discussed in this section is simply a special case of binary logistic regression. However, it represents a situation that is common in clinical studies. Moreover, the clinical example which we intend to discuss involves certain aspects of regression models which have not arisen in any of the examples we have previously considered.

Duncan et al. [20] report the results of a study which was initiated to investigate the effect of premedication on the dose requirement in children of the anaesthetic thiopentone. The study involved observations on 490 children aged 1–12 years. These patients were divided into four groups, three of which received different types of premedication. No premedication of any kind was administered to the fourth group of patients. All the children subsequently received an injection of 2.0–8.5 mg/kg of thiopentone in steps of 0.5 mg/kg. The anaesthetic was administered to each patient over a 10-second interval, and the eyelash reflex was tested 20 seconds after the end of the thiopentone

11 Binary Logistic Regression

136

Probability of response

1.0

0.8

0.6

0.4

0.2

0

0.2

0.5

1.0

2.0

5.0

10.0

Concentration (mg/kg)

Fig. 11.1. A graph showing how the probability of responding might depend on the logarithm of the concentration in a dose-response study.

injection. If the eyelash reflex was abolished, the patient was deemed to have responded to the anaesthetic.

The investigation described above is typical of a class of clinical studies involving a binary response variable. Clearly, the purpose of the research is to assess the dependence of the response on a continuous variable which is under the control of the researcher. Investigations of this type are generally referred to as dose-response studies, because the clinician administers a measured concentration of a particular substance to each subject in a sample and then observes whether or not the subject exhibits the designated response. A principal assumption on which dose-response studies are based is the notion that the probability of responding depends in a simple, smooth way on the concentration. Figure 11.1 shows an example of this smooth relationship. In addition to estimating this dependence, researchers are usually interested in questions which concern differences in the dependence on concentration among welldefined subgroups of the population.

The method of binary logistic regression, which we introduced in §§11.1 and 11.2, is ideally suited to the analysis of dose-response data. In the use of this regression model to analyze data from such a study, it is common practice to choose the logarithm of the measured concentration as the explanatory

The Analysis of Dose-Response Data

137

Table 11.9. Maximum likelihood fit of a binary logistic regression model to data on 137 children premedicated with TDP and atropine and then anaesthetized with thiopentone

Regression

Estimate

Estimated

Test statistic

coefficient

 

standard error

 

 

 

 

 

a

–1.92

0.82

b

2.78

0.72

3.86 (p = 0.0001)

 

 

 

 

variable rather than the actual concentration. Since use of the logarithmic value tends to improve the fit of the model to the data, we will follow this convention and represent the log concentration, or dose, of the administered substance by the letter d.

Let the binary variable Y denote the observed response, with Y = 1 indicating occurrence of the event of interest and Y = 0 its absence. Then Pr(Y = 1 d) represents the probability of observing the designated response in a subject who receives dose d. As we saw in §11.2, a binary logistic regression model for Y is specified by the equation

 

 

 

 

 

 

Pr(Y =1| d)

Pr(Y =1| d)

=a +bd ,

log

 

 

=log

 

 

 

 

 

 

 

 

 

1 Pr(Y =1| d)

Pr(Y =0 | d)

 

which is equivalent to requiring that

a +bd

Pr(Y =1| d) = e + .

1+ea bd

If b, the regression coefficient for dose, is zero, then dose, and hence concentration, is not associated with the probability of a response. In §11.2 we indicated that a suitable statistic for testing the hypothesis that b equals zero is

 

ˆ

T =

| b |

 

.

ˆ

 

est. standard error(b)

Table 11.9 presents an analysis of a subset of the dose-response data which were collected for the study described by Duncan et al. [20]. The data were made available by Mr. B. Newman. The results summarized in the table pertain solely to the group of 137 patients who were premedicated orally with TDP (trimeprazine, droperiodol and physeptone) and atropine. As we might expect, the regression analysis shows that dose is strongly associated with the probability of responding (T = 3.86, p = 0.0001). The estimated relationship is specified by the equation

1.92 +2.78d

Pr(Y =1| d) = e + ,

1+e 1.92 2.78d

11 Binary Logistic Regression

138

Соседние файлы в папке Английские материалы