Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Kluwer - Handbook of Biomedical Image Analysis Vol

.1.pdf
Скачиваний:
106
Добавлен:
10.08.2013
Размер:
10.58 Mб
Скачать

Advanced Segmentation Techniques

485

The maximum a posteriori parameters estimation involves the determination

of x that maximizes p(x|y) with respect to x. By Bayes’ rule,

 

p(x y)

=

p(y|x) p(x)

.

(9.4)

 

|

p(y)

 

Since the denominator of Eq. 9.4 does not affect the optimization, the MAP parameters estimation can be obtained, equivalently, by maximizing the numerator of Eq. 9.4 or its natural logarithm; that is, we need to find x which maximizes

the following criterion:

 

L(x|y) = ln p(y|x) + ln p(x).

(9.5)

The first term in Eq. 9.5 is the likelihood due to the low-level process and the second term is due to the high-level process. Based on the models of the highlevel and low-level processes, the MAP estimate can be obtained.

In order to carry out the MAP parameters estimation in Eq. 9.5, one needs to specify the parameters of the two processes. A popular model for the high-level process is the Gibbs Markov model. In the following sections we introduce a new accurate model to model the low-level process. In this model we will assume that each class consists of a mixture of normal distributions which follow the following equation:

ni

p(y| i) = πl p(y|Cl ),

for i = 1, 2, . . . , m,

(9.6)

l=1

 

 

where ni is the number of normal components that formed class i, π is the corresponding mixing proportion, and {Cl }ln=i 1 is the number of Gaussian components that formed class i. So the overall model for the low-level process can be expressed as follows:

m

pes(y) =

p( i) p(y| i).

(9.7)

 

i=1

 

In our proposed algorithm the priori probability p( i) is included in the mixing proportion for each class.

9.2.5 Parameter Estimation for Low-Level Process

In order to estimate the parameters for low-level process, we need to estimate the number of Gaussian components that formed the distribution for each class, their means, the variances, and mixing proportions for each Gaussian

486

Farag, Ahmed, El-Baz, and Hassan

component. To estimate the distribution for each class, we use the expectation maximization algorithm. The first step to estimate the distribution for each class is to estimate the dominant Gaussian components in the given empirical distribution.

9.2.5.1 Dominant Gaussian Components Extracting Algorithm

1.Assume the number of Gaussian components that represent the classes

i, i = 1, ..., m. Initialize the parameters of each distribution randomly.

2.The E-step: Compute δit that represent responsibility that the given pixel value is extracted from certain distribution as

δk

 

πik p(yt | ik, i)

,

for t 1 to N2,

(9.8)

 

 

it

 

m

πlk p(yt | lk, l )

 

=

 

 

= l=1

 

 

where yt is the gray level at location t in the given image, πik is the mixing proportion of Gaussian component i at step k, and ik is estimated parameter for Gaussian component i at step k.

3.The M-step: we compute the new mean, the new variance, and the new proportion from the following equations:

 

 

 

 

 

 

N2

 

 

 

 

 

 

 

 

πik+1

=

 

2

 

 

 

 

(9.9)

 

t=1

δit ,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

µk+1

 

 

 

N

 

δk yt

 

 

 

 

=

 

 

t=1 it

,

 

 

(9.10)

i

 

 

 

 

 

N2

 

 

 

 

 

 

 

 

t

 

1 δitk

 

 

 

 

 

 

 

 

 

 

 

N2=

 

 

 

µik)2

 

k

1

 

2

 

 

t

1 δitk (yt

 

 

(σi +

 

)

 

=

 

 

=

 

N2

 

.

(9.11)

 

 

 

 

 

 

t=1 δitk

 

 

4.Repeat steps 1 and 2 until the relative difference of the subsequent values of Eqs. 9.9, 9.10, and 9.11 are sufficiently small.

Let pI 1(y), pI 2(y), . . . , pI m(y) be the dominant Gaussian components that are estimated from the above algorithm. Then the initial estimated density ( pI (y)) for the given image can be defined as follows:

pI (y) = π1 pI 1(y) + π2 pI 2(y) + · · · + πm pI m(y).

(9.12)

Because the empirical data does not exactly follow mixture of normal distribution, there will be error between pI (y) and pem(y). So we suggest the following

Advanced Segmentation Techniques

487

models for the empirical data:

pem(y) = pI (y) + ζ (y),

where ζ (y) represent the error between pem(y) and can be rewritten as

ζ (y) = | pem(y) − pI (y)|sign( pem(y)

(9.13)

pI (y). From Eq. 9.13, ζ (y)

pI (y)).

(9.14)

We assume that the absolute value of ζ (y) is another density which consists of a mixture of normal distributions and we will use the following EM algorithm to estimate the number of Gaussian components in ζ (y) and the mean, the variance, and mixing proportion.

9.2.5.2 Sequential EM Algorithm

1.Assume the number of Gaussian components (n) in ζ (y) is 2.

2.The E-step: Given the current value of the number of Gaussian components in ζ (y), compute δit as

δk

 

πik p(yt | ik)

,

for i

1 to n and t 1 to N2. (9.15)

 

 

it

 

n

πlk p(yt | lk)

 

=

=

 

= l=1

 

3.The M-step: We compute the new mean, the new variance, and the new proportion from the following equations:

 

 

 

 

 

 

N2

 

 

 

 

 

 

 

 

πik+1

=

 

2

 

 

 

 

(9.16)

 

t=1

δit ,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

µk+1

 

 

 

N

 

δk yt

 

 

 

 

=

 

 

t=1 it

,

 

 

(9.17)

i

 

 

 

 

 

N2

 

 

 

 

 

 

 

 

 

t

 

1 δitk

 

 

 

 

 

 

 

 

 

 

 

N2=

 

 

 

µik)2

 

k

1

 

2

 

 

t

1 δitk (yt

 

 

(σi +

 

)

 

=

 

 

=

 

N2

 

.

(9.18)

 

 

 

 

 

 

t=1 δitk

 

 

4.Repeat steps 1 and 2 until the relative differences of the subsequent values of Eqs. 9.16, 9.17, and 9.18 are sufficiently small.

5.Compute the conditional expectation and the error between |ζ (y)| and the estimated density ( pζ (y)) for |ζ (y)| from the following equations:

N2

n

n

 

 

(9.19)

Q(n) =

δit ln pζ (y| i),

t=1

i=1

 

 

(n) = |ζ (y)| −

 

πi pζ i(y).

(9.20)

i=1

488

Farag, Ahmed, El-Baz, and Hassan

6.Repeat steps 2, 3, 4, and 5, and increase the number of Gaussian components n by 1 if the conditional expectation Q(n) is still increasing and

(n) is still decreasing, otherwise stop and select the parameters which correspond to maximum Q(n) and minimum (n).

Since EM algorithm can be trapped in a local minimum, we run the above algorithm several times and select the number of Gaussian components and their parameters that give maximum Q(n) and minimum (n).

After we determined the number of Gaussian components that formed |ζ (y)|, we need to determine which components belong to class 1, and belong to class

2, and so on. In this model we classify these components based on the minimization of risk function under 0–1 loss. In order to minimize the risk function, we can use the following algorithm. Note that the following algorithm is writen for two classes but it is easy to generalize to n classes.

9.2.5.3 Components Classification Algorithm

1.All Gaussian components that have mean less than the estimated mean for pI 1(y) belong to the first class.

2.All Gaussian components that have mean greater than the estimated mean for pI 2(y) belong to the second class.

3.For the components which have mean greater than the estimated mean for pI 1(y) and less than the estimated mean for pI 2(y), do the following:

(a)Assume that the first component belongs to the first class and the other components belong to the second class. Compute the risk value from the following equation:

Th

 

R(Th) = Th

p(y| 1)dy + −∞ p(y| 2)dy,

(9.21)

where Th is the threshold that separates class 1 from class 2. The above integration can be done using a second-order spline.

(b)Assume that the first and second components belong to the first class and other components belong to the second class, and from Eq. 9.21 compute R(Th). Continue this process as R(Th) decreases, and stop when R(Th) starts to increase.

Advanced Segmentation Techniques

489

Finally, to show the convergence of the proposed model, we will show experimentally, when we use this model, the Levy distance will decrease between the estimated distribution Pes(y) and empirical distribution Pem(y). The Levy distance ρ(Pem, Pes) is defined as

ρ(Pem, Pes) = inf{ξ > 0 : yPem(y ξ ) − ξ Pes(y) ≤ Pem(y + ξ ) + ξ }.

(9.22)

As ρ(Pem, Pes) approach zero, Pes(y) converge weakly to Pem(y).

9.2.6 Parameter Estimation for High-Level Process

In order to carry out the MAP parameters estimation in Eq. 9.5, one needs to specify the parameters of high-level process. A popular model for the high-level process is the Gibbs Markov model which follows Eq. 9.2. In order to estimate the parameters of GMRF, we will find the parameters that maximize Eq. 9.2, and we will use the Metropolis algorithm and genetic algorithm (GA).

The Metropolis algorithm is a relaxation algorithm to find a global maximum. The algorithm assumes that the classes of all neighbors of yare known. The highlevel process is assumed to be formed of m-independent processes; each of the m processes is modeled by Gibbs Markov random which follow Eq. 9.2. Then y can be classified using the fact that p(xi|y) is proportional to p(y|xt ) P(xt |ηs), where s is the neighbor set to site S belonging to class xt , p(xt |ηs) is computed from Eq. 9.2, and p(y|xt ) is computed from the estimated density for each class.

By using the Bayes classifier, we get initial labeling image. In order to run the Metropolis algorithm, first we must know the coefficients of potential function

E(x), so we will use GA to estimate the coefficient of E(x) and evaluate these coefficients through the fitness function.

9.2.6.1 Maximization Using Genetic Algorithm

To build the genetic algorithm, we define the following parameters:

Chromosome: A chromosome is represented in binary digits and consists of representations for model order and clique coefficients. Each chromosome has 41 bits. The first bit represent the order of the system (we use digit “0” for firstorder and digit “1” for second-order-GMRF). The remaining bits represent the

490

Farag, Ahmed, El-Baz, and Hassan

clique coefficients, where each clique coefficient is represented by 4 bits (note that for first-order system, we estimate only five parameters, and the remaining clique’s coefficient will be zero, but for the second-order system we will estimate ten parameters).

Fitness Function: Since our goal is to select the high-level process X that maximize Eq. 9.5, we can use Eq. 9.5 as the fitness function.

High-level parameters estimation algorithm:

1.Generate the first generation which consists of 30 chromosomes.

2.Apply the Metropolis algorithm for each chromosome on each image and then compute the fitness function as shown in Eq. 9.5.

3.If the fitness values for all chromosomes do not change from one population to another population, then stop and select the chromosome, which gives maximum fitness value. (If there are two chromosomes that give the same fitness value, we select the chromosome which represents lower order system.) Otherwise go to step 2.

Using the results obtained by this algorithm, we will repeat the estimation of low-level process and high-level process. We will stop when the difference between the current parameters and previous parameters is small.

9.3 Applications

Lung Cancer remains the leading cause of mortality cancer. In 1999, there were approximately 170 000 new cases of lung cancer [21]. The 5-year survival rate from the diseases is 14% and has increased only slightly since the early 1970s despite extensive and expensive research work to find effective therapy. The disparity in survival between early and late-stage lung cancer is substantial, with a 5-year survival rate of approximately 70% in stage 1A disease compared to less than 5% in stage IV disease according to the recently revised lung cancer staging criteria [21]. The disproportionately high prevalence of and mortality from lung cancer has encouraged attempts to detect early lung cancer with screening programs aimed at smokers. Smokers have an incidence rate of lung

Advanced Segmentation Techniques

491

cancer that is ten times that of nonsmokers and accounts for greater than 80% of lung cancer cases in the United States [21].

One in every 18 women and every 12 men develop lung cancer, making it the leading cause of cancer deaths. Early detection of lung tumors (visible on the chest film as nodules) may increase the patient’s chance of survival. For this reason the Jewish Hospital designed a program for early detection with the following specific aims: A number of lung cancer screening trials have been conducted in the United States, Japan, and Europe for the purpose of developing an automatic approach of tummor detection [21].

At the University of Louisville CVIP Lab a long-term effort has been ensued to develop a comprehensive image analysis system to detect and recognize lung nodules in low dose chest CT (LDCT) scans. The LDCT scanning was performed with the following parameters: slice thickness of 8 mm reconstructed every 4 mm and scanning pitch of 1.5. In the following section we highlight our approach for automatic detection and recognition of lung nodules; further details can be found in [22].

9.3.1 Lung Extraction

The goal of lung extraction is to separate the voxels corresponding to lung tissue from those belonging to the surrounding anatomical structures. We assume that each slice consists of two types of pixels: lung and other tissues (e.g., chest, ribs, and liver). The problem in lung segmentation is that there are some tissues in the lung such as arteries, veins, bronchi, and bronchioles having gray level close to the gray level of the chest. Therefore, in this application if we depend only on the gray level we lose some of the lung tissues during the segmentation process. Our proposed model which depends on estimating parameters for two processes (high-level process and low-level process) is suitable for this application because the proposed model not only depend on the gray level but also takes into consideration the characterization of spatial clustering of pixels into regions.

We will apply the approach that was described in Section 9.2.4 on lung CT. Figure 9.4 shows a typical CT slice for the chest. We assume that each slice consists of two types of tissues: lung and other tissues (e.g., chest, ribs, and liver). As discussed above, we need to estimate parameters for both low-level process and high-level process. Table 9.1 presents the results of applying the

492

 

 

 

 

Farag, Ahmed, El-Baz, and Hassan

 

Table 9.1: Estimated using dominant Gaussian components

 

 

 

extracting algorithm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter

µI 1

µI 2

σI2 1

σI2 2

πI 1

πI 2

 

Value

59.29

139.97

177.15

344.29

0.25

0.758

 

 

 

 

 

 

 

 

 

 

dominant Gaussian components extracting algorithm described in 9.2.5.1. Figure 9.5 shows the empirical density for the CT slice shown in Fig. 9.4 and the initial estimated density (which represented the two dominant Gaussian components in the given CT). The Levy distance between the two distribution functions which represented the densities shown in Fig. 9.5 is 0.09. This value is large and this means there is a mismatch between empirical pem(y) and pI (y). Figure 9.6 shows the error and absolute error between pem(y) and pI (y).

After we apply sequential EM algorithm to |ζ (y)|, we get that the number of normal components that represent |ζ (y)| is 10 as shown in Fig. 9.7. Figure 9.8

Figure 9.4: A typical slice form of a chest spiral CT scan.

Advanced Segmentation Techniques

493

0.015

0.01

pem(y)

0.005

pl (y)

0

50

100

150

200

250

0

y

Figure 9.5: Empirical density for given CT slice and initial estimated density.

 

 

×

10−3

 

 

 

 

 

 

3

Error between pem(y) and pl(y)

 

 

 

 

 

 

 

 

 

 

 

2

 

 

0 Absolute error between pem(y) and pl(y)

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

−1

 

 

 

 

 

 

 

−2

 

50

100

150

200

250

0

y

Figure 9.6: Error and absolute error between pem(Y = y) and p1(Y = y).

494

 

 

 

 

 

 

 

 

Farag, Ahmed, El-Baz, and Hassan

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.45

 

 

 

 

 

 

 

 

 

 

 

 

 

0.4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.35

 

 

Q(n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.25

 

 

 

 

 

 

 

ε(n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.2

 

 

 

 

 

 

 

 

 

 

 

 

0.15

3

4

5

6

7

8

9

10

11

2

n

Figure 9.7: Conditional expectation Q(n) and the error function ( (n)) ver-

sus the number of Gaussians approximating the scaled absolute deviation in

Fig. 9.6.

2.5× 10−3

2

1.5

1

0.5

00

50

100

150

200

250

 

 

 

 

 

y

Figure 9.8: Estimated density for |η(Y = y)|.