
Kluwer - Handbook of Biomedical Image Analysis Vol
.2.pdf
A Knowledge-Based Scheme for Digital Mammography |
623 |
Table 11.9: Mean percentage improvement in segmenting an unenhanced mammogram compared to that obtained when segmenting the image enhanced using the predicted enhancement method from each strategy for all breast types
Type |
Mean TP |
Mean SUBTP |
Total |
|
|
|
|
Target expert |
1.00 |
2.00 |
1.20 |
FUZZY expert |
0.11 |
4.25 |
0.79 |
DNM |
0.08 |
1.16 |
0.24 |
(A) BPM FBP26 |
0.20 |
3.75 |
0.78 |
(B) BPM FBP316 |
0.13 |
3.66 |
0.72 |
Types 1–3 (A); |
0.29 |
3.88 |
0.88 |
Type 4 (B) |
|
|
|
|
|
|
|
of the target optimal values from Table 11.4. Additionally, the table shows the result obtained by applying the FUZZY method to all images (given in Table 5(part c) over all four breast types. The last row in Table 11.9 shows the result of using the prediction from the BPM strategy with feature set FBP26 on breast types 1–3 and feature set FBP316 on type 4. From these results the following key observations are made:
1.Utility of contrast enhancement: From the complete dataset of mammograms, 75% showed an improved sensitivity following application of the expert contrast enhancement compared with the unenhanced original images.
2.Target experts: Figure 11.5 highlighted that given a set of contrast enhancement methods, different methods can be identified as target enhancement experts for different mammograms. This observation is the motivation for learning the optimal expert.
3.Characterizing a mammogram: Reviewing the results in Table 11.9, it can be seen that as the DNM strategy relies on characterizing a mammogram by a suspicious ROI, it performs poorly. In contrast the BPM strategy utilizes an image feature vector extracted from the breast comprizing an extensive set of features and performs better.
4.The superior BPM approach: The resultant performance using the modified BPM strategy based on breast type leads to a greater performance
624 |
Singh and Bovis |
than simply using the FUZZY method. The result is inferior to the target contrast enhancement baseline performance indicating that learning the expert enhancement is a nontrivial problem. In implementing the modified BPM strategy, a mechanism of predicting the breast type is required.
5.Use of mammogram grouping knowledge: The BPM approach has been developed to utilize a priori knowledge describing the mammogram grouping indicating the mammographic breast density type. This knowledge is used to determine the feature extraction method to be used, either FBP26 for breast types 1–3 or FBP316 for type 4. In the experimental results presented above, the target breast type was used.
11.4 Image Segmentation Layer
The image segmentation layer aims to use a number of image segmentation schemes and then adopt a mixture of experts model. In other words, on a per pixel basis, a number of segmentation experts make classification decisions that are fused together. The fusion of decisions is possible either using standard combination rules or adaptable scheme (based on determining appropriate weights of combination that are based on image properties). Our approach is based on the use of parametric models of image segmentation.
Recently, GMM have gained considerable prominence in the image segmentation literature since there is a vast range of training data available from which a priori information can be gathered. One of their key strengths is that such statistical models are underpinned by well-founded statistical probability and information theory. In addition, such approaches can be used in supervised or unsupervised modes. In addition, the output of such models is the a posteriori probability estimate that can be used to optimize the model to perform at a given point on the ROC curve. Also, by expressing the result as a posteriori probability, the outputs of various experts can be combined within a unified framework. Finally, the postprocessing of images is cheaper with statistical methods since only those regions that contain suspicious pixels need further examination, as opposed to a region-based approach where all regions must be considered.
A Knowledge-Based Scheme for Digital Mammography |
625 |
The GMM approach does not consider the spatial arrangement of class labels in an image, which can be quite useful for relaxation labeling [28]. Markov random fields (MRF) have been shown as a powerful class of techniques [29–31] for modeling the spatial arrangement of class labels. MRF can be expressed in terms of a probabilistic framework and they can be combined with a statistical observed model of the mammogram. An MRF can increase the homogeneity of the formed regions that leads to a reduction in the false positives.
In this study we propose a Weighted Gaussian Mixture Model (WGMM) for both supervised (WGMMS) and unsupervised (WGMMU ) data analysis. A set of GMMs is constructed, each modeling a particular class distribution and capable of being combined into a single unconditional density. We combine the WGMM model with a MRF hidden model and propose two approaches that work for supervised (WGMMMRFS ) and unsupervised (WGMMUMRF) modes. The four models or experts (WGMMS, WGMMU , WGMMMRFS , and WGMMUMRF) each produce a label for the test pixel. We use a number of different features, each forming the basis of a different expert and relying on one of the above four models for segmentation. The expert outputs can be combined using well-known expert combination methods. In this chapter we propose an adaptive weighted model (AWM) for the combination of four experts and show that this new method of combination outperforms other popular methods.
11.4.1 Weighted Gaussian Mixture Models
A gray-scale image is represented as a 1-D array X = {x1, x2, . . . , xN ), where xn is an input feature for pixel n and N is the total number of pixels in the image. The input feature vector xn may be a D-dimensional vector or simply the gray-scale value of the pixel n. Let the underlying true segmentation of the image be denoted as Y = {y1, y2, . . . , yN ). It is assumed that the number of classes is predetermined as a set of known class labels ωl , where l {1, . . . , L}, and therefore the class label of pixel n is indicated as yn {ωl }lL=1. A common assumption in modeling a density with a GMM for image segmentation is that each component m, m {1, . . . , M}, will model the pdf of each class M = L. Let yˆn represent the estimate of the segmentation. Each component is weighted by its weight of Ymn that indicates the relationship of pixel xn to class label ωl modeled by component m. To ensure that the parameters of each component density are learnt correctly, the weight Ymn is set to indicate the class to which
626 |
|
|
Singh and Bovis |
data point xn belongs, thus |
|
|
|
γ |
mn = |
1 |
if yn = m |
|
0 |
otherwise |
If Ymn = 1, then data point xn will only be considered when setting the parameters of class ωl modeled by component m. Using the labelled training data, a maximum likelihood (ML) estimate of all component parameters and mixing coefficients can be found.
We first describe the two modes of test image segmentation, supervised and unsupervised, in section 11.4.2. We then detail our weighted GMM/MRF models in section 11.4.3.
11.4.2Supervised and Unsupervised Test Image Segmentation
A test image to be segmented is represented in the same way as the training
ˆ = {
image by a 1-D array X. In the case of test image, a 1-D array Y yˆ1, yˆ2, . . . , yˆ N ) is the estimate of the segmentation. We can now adopt one of the two strategies for test image segmentation.
1.Supervised segmentation with GMM: Using the ML estimate of the parameter values obtained from the training images, a segmentation of the test images is performed. This is achieved by substituting the learnt model parameters θ from training when performing testing. The image is segmented by setting the class label estimate yˆn of pixel xn as the one with the maximum estimate of the component-conditional probability.
M
yˆn = arg max{ p(yn = m | xn, θm)
m=1
2.Unsupervised segmentation with GMM: This alternative approach assumes no a priori knowledge except for the number of classes in the image corresponding to the number of components in the GMM, L = M. Therefore, the weight Ymn = 1 indicates that all samples are considered as being generated from this distribution. Using the GMM-EM algorithm, an ML estimate of the parameter values is found. The segmentation can then be estimated using the GMM by extracting the component-conditional probabilities using the Bayes rule.
A Knowledge-Based Scheme for Digital Mammography |
627 |
11.4.3 A Weighted GMM/MRF Model of Segmentation
A finite mixture model (FMM) [23, 27, 32] is defined as a linear combination of M component conditional densities f (x | m, θm), for m = {1, . . . , M}, and M mixing coefficients f (m) of the form
M
f (x) = f (m) f (x | m, θm) |
(11.14) |
m=1 |
|
such that the mixing coefficients f (m) satisfy the following constraints:
M
f (m) = 1 and 0 ≤ f (m) ≤ 1.
m=1
The framework of WGMM comprises of l (1, . . . , L) class densities each modeled independently using a GMM of the form given in Eq. (11.14) and a set of mixing coefficients p(ωl ) as
L
p(x) = |
p(ωl ) p(x | ωl , l ) |
(11.15) |
|
l=1 |
|
The lth GMM estimates the class-conditional pdf p(x | ωl , l ), which is itself another mixture model, for each data point for each class {ωl }lL=1. The vector
l is defined as the M component Gaussian parameters of the lth GMM as
l = {Pl (m), µlm, lm }, m = {1, . . . , M}. Each estimate of the class conditional pd f is mixed to model the overall unconditional density p(x), using a mixing coefficient p(ωl ), identifying the contribution of the lth class density in the unconditional pdf.
If it is assumed that for a complete dataset X, of points xn, where X ≡
{x1, . . . , xN ), is drawn independently from the distribution f (x | θ ), then the joint occurrence of the whole dataset can be conveniently expressed as the log likelihood as follows:
N |
N |
L |
|
|
|
|
|
log ζ ( ) = |
log p(xn | ) = |
log γnl p(ωl ) p(xn | ωl , l) |
(11.16) |
n=1 |
n=1 |
l=1 |
|
Using a modified version of the expectation-maximisation (EM) algorithm, as described below, we derive an ML estimate of the parameter values of each of the L GMMs { l }lL=1.
The general framework for parameter estimation in GMM can be used to learn the parameters of WGMM. Here the component conditional densities, appearing
628 |
Singh and Bovis |
in Eq. (11.13) are themselves mixture models. In the EM algorithm, the update equations for mixing coefficients do not depend on the functional particulars of the component densities. Hence, the mixing coefficients of the WGMM are updated according to
1 |
N |
|
|
|
|
||
Pnew(ωl ) = |
|
pold(ωl | xn, lold) |
(11.17) |
N |
|||
|
|
n=1 |
|
The m-step involves maximizing the auxiliary function with respect to the parameters { l }lL=1. The auxiliary function can be written as
new |
|
old |
|
N |
L |
old |
|
old |
|
new |
|
new |
new |
|
||
, |
) = |
|
|
|
|
|
|
) |
||||||||
Q( |
|
n=1 l=1 |
p |
(ωl | xn, l |
) log P |
(ωl ) p |
(xn | ωl , θl |
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
(11.18) |
||||
where |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M |
|
|
|
|
|
|
|
|
|
new |
|
|
new |
|
|
new |
|
new |
|
|
|
|
||||
p |
(xn |
| ωl |
, l |
|
) |
= |
p |
(ml ) p |
|
(xn | ml , ml |
) |
|
(11.19) |
m=1
Writing γnl = pold(ωl | xn, lold), the auxiliary function can be written as the sum of L auxiliary functions, one for each mixture model:
|
|
|
|
N |
L |
|
|
|
|
|
|
|
|
|
|
|
new |
|
old |
|
|
new |
|
new |
new |
|
|
||||
Q( |
, |
|
) = |
|
γnl log P |
|
|
|
(ωl ) p |
(xn | ωl , θl |
) |
(11.20) |
|||
|
|
|
|
n=1 l=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
L |
|
|
|
|
|
|
|
|
|
|
|
|
new |
|
old |
|
ˆ |
new |
old |
|
|
|
|
|
|
|
|
Q( |
, |
|
) = |
Ql ( , |
|
|
) |
|
|
|
|
|
(11.21) |
||
|
|
|
|
l=1 |
|
|
|
|
|
p(xn | ωl , θl ) P(ωl ) |
|
|
|||
|
where |
γnl = p(ωl | xn, l ) = |
|
|
|
(11.22) |
|||||||||
|
|
|
L |
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
j=1 p(xn | ω j , θ j ) P(ω j ) |
|
|||
ˆ |
new |
|
old |
|
new |
new |
|
new |
|
|
|||||
|
N |
γnl |
|
|
(11.23) |
||||||||||
and Ql ( l |
, l |
) = |
log P |
(ωl ) p |
(xn | ωl , θl ) |
|
n=1
The procedure for maximising the overall likelihood of a WGMM is outlined in Algorithm 1. It consists of an outer EM loop, which are nested in L inner EM loops. Each time the outer loop is traversed, the mixing weights p(ωl ) are updated according to Eq. (11.17) and the L inner loops are iterated to update the mixing weights pl(m), means µlm, and covariances lm for each of the components. It should be noted that it is not necessary to iterate the inner loops to converge on each outer EM step, since it is only necessary to increase the
630 |
Singh and Bovis |
The update equations for the mean and covariances in the GMM-EM algorithm remain unchanged. The MRF-MAP estimate is combined in the conditional density function pold(ωl | xn, θlold) as
γ |
nl = |
p(ω |
x , θ ) |
= |
p(xn | ωl , l ) p(yn = ωl | n) |
|
(11.25) |
|
|
||||||||
|
l | |
n l |
M |
p(xn | ω j , j ) p(yn = ω j | n) |
|
|||
|
|
|
|
|
j=1 |
|
The WGMMMRF-EM algorithm is used to determine the ML estimates of the parameter values by iterating the WGMM-EM algorithm while constraining the density estimation with the hidden MRF model. For supervised learning, the labelled training data is used for the initialization of the WGMM and WGMMMRF models, to give us WGMMS, and WGMMMRFS and no training data is used for the unsupervised learning case, WGMMU, and WGMMMRFU .
11.4.4 Combination of Image Segmentation Experts
In the previous section we developed four new models of image segmentation and mentioned the use of different experts based on different texture features that rely on them. It is beneficial to fuse the decisions of different experts on a per pixel basis. In this section we detail the conventionally used strategy of classifier decision combination, called “ensemble based combination rules,” and then propose a novel strategy for combining expert outputs, called “adaptive weighted model (AWM).” First of all, we describe a generic framework of combination, and then discuss the combination strategies within that framework.
11.4.4.1 Expert Combination Framework and Nomenclature
The image to be segmented can be represented as a 1-D array X = {x1, . . . , xN ), where xn is an input feature for the pixel n and N is the total number of pixels
ˆ = in the image. Let the estimate of the segmentation be denoted by array Y
{yˆ1, . . . , yN ). It is assumed that the number of classes is predetermined from a set of known class labels ωl {1, . . . , L}, and therefore, the estimated class label of pixel n is indicated as yˆn = ωl .
We assume that there are R image segmentation experts, where the rth expert provides a segmentation decision for a given pixel feature xn from a set of learnt parameter vectors θr . Using a WGMM expert, the parameter vector θr of each expert is defined as a set of component mixing coefficients pl(m), means µlm, and


632 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Singh and Bovis |
Majority |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Voting |
|
= |
l | n |
|
1 |
|
R = |
|
R |
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
R |
lr |
|
|
|
|
(Mv) |
p(yˆ |
|
ω x , θ . . . θ ) |
|
|
r=1 |
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|||||||||
|
where |
lr |
|
|
|
|
|
max rR |
1 p(yˆ |
|
ωl |
|
xn, θ j ) |
|||
|
|
|
|
= |
|
1 if p(yˆ = ωl |
| xn, θ1 |
. . . θr ) |
||||||||
|
|
|
|
|
|
= |
|
|
= = |
|
| |
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
otherwise |
|
|
|
|
|
||||
|
|
|
|
|
0 |
|
|
|
|
|
The above combination rules have been used in several studies and form the basis of our baseline comparison.
11.4.4.3Average Weighted Model (AWM) Classifier Combination
In our proposed approach, the expert decisions are modeled as a probability density function. From a linear opinion pool of R experts, assume that the rth segmentation expert provides an estimate of the a posteriori probability.
p(yˆn | r, xn) = p(yˆn | xn, θr ) n = (1, . . . , N) |
(11.28) |
We assume that accompanying this pd f is a linear weight or mixing coefficient, p(r), indicating the contribution of the rth expert in the joint pd f, p(yˆ | x, ), resulting from the combination of experts. The vector is the complete set of parameters describing the combined pd f . Hence, following the expert combination, the complete pd f can be written as
R
p(yˆn | xn, ) = |
p(r) p(yˆn | r, xn) |
|
|
(11.29) |
|
r=1 |
|
|
|
given that the mixing coefficients satisfy the following constraints: |
|
R |
p(r) = |
|
r=1 |
||||
|
|
expert in the |
||
1 and 0 ≤ p(r) ≤ 1. If we treat the weighted contribution of each |
|
|
unconditional distribution as probabilities, then statistical models such as mixture of experts (MOE) framework [34] can be trained to learn the individual classifier and weight contribution distributions. For this we propose using the GMM using EM algorithm. We now present a method for identifying the weights in a probabilistic manner motivated by the MOE framework. Our proposed approach is, however, different to the conventional MOE method in two ways: (i) First, the a posteriori pd f from each segmentation expert remains fixed having been generated during segmentation; (ii) second, the mixing coefficients for