Kluwer  Handbook of Biomedical Image Analysis Vol
.2.pdfA KnowledgeBased Scheme for Digital Mammography 
633 
each expert, p(r), are determined in an unsupervised manner through statistical methods.
11.4.4.3.1 Maximum Likelihood Solution. The mixing coefﬁcient parameter values for each expert can be determined using the ML principle by forming a likelihood function. Assume that we have the complete dataset, ψ , of combined decisions from segmentation experts for each data point, where
ψ = {yˆ1, . . . , yˆ N ), and it is drawn independently from the complete distribution p(yˆ  x, ). Then the joint occurrence of the whole dataset is given as
N 
R 




p(ψ  ) = 
p(r) p(yˆn  r, xn) ≡ ζ ( ) 
(11.30) 
n=1 r=1
For simplicity, the above likelihood function can be rewritten and expressed as a log likelihood as follows:
N 
N 
R 





log ζ ( ) = log p(yˆn  ) ≡ 
log 
p(r) p(yˆn  r, xn) 
(11.31) 
n=1 
n=1 
r=1 

For the above equation, it is not possible to ﬁnd the ML estimate of the parameter values directly because of the inability to solve = 0 [23]. Our approach used to maximising the likelihood log ζ ( ) is based on the EM algorithm proposed in the context of missing data estimation [35].
11.4.4.3.2 AWM Parameter Estimation Using EM Algorithm. The EM algorithm attempts to maximize an estimate of the log likelihood that expresses the expected value of the complete data log likelihood conditional on the data points. By evaluating an auxiliary function, Q in the Estep, an estimate of the log likelihood can be iteratively maximized using a set of update equations in the Mstep. Using the AWM likelihood function from Eq.(11.30) the auxiliary function for the AWM is deﬁned as
N 
R 




Q( new, old) = 
pold(r  yˆn) log( pnew(r) p(yˆn  r, xn)) 
(11.32) 
n=1 r=1
It should be noted that the a posteriori estimate p(yˆn  r, xn) for the nth data point from the rth segmentation expert remains ﬁxed. The conditional density function pold(r  yˆn) is computed using the Bayes rule as
pold(r 
 
yˆ 
) 
= 
p(yˆn  r, xn) p(r) 
(11.33) 




n 

R 
p(yˆn  j, xn) p( j) 







j=1 

634 
Singh and Bovis 
In order to maximize the estimate of the likelihood function given by the auxiliary
function, update equations are required for the mixing coefﬁcients. These can be
obtained by differentiating with respect to the parameters set equal to zero. For
the AWM, the update equations are taken from [27]. For the rth segmentation
expert
1 
N 





pnew(r) = 

pold(r  yˆn) 
(11.34) 
N 



n=1 

The complete AWM algorithm is shown below.
Algorithm 2: AWM ALGORITHM
1. 
Initialise: Set p(r) = 1/R. 

2. 
Iterate: Perform Estep and Mstep until the 
change in Q func 

tion, Eq. (11.31), between iterations is less than some convergence thresh 


old AVMconverge = 25. 

3. 
EM Estep: 


(a) Compute pold(r  yˆn) using Eq. (11.32). 


(b) Evaluate the Q function, the expectation 
of the loglikelihood 

of the complete training data samples 
given the observa 

tion, xn, and the current estimate of the parameters using Eq. 


(11.31). 

4. 
ˆ 

EM Mstep: This consists of maximising Q with respect to each parame 
ter in turn:
1. The new estimate of the segmentation expert weightings for the rth component P new(r) is given by Eq.(11.33).
11.4.4.3.3 Estimating the A Posteriori Probability. Using the AWM
combination strategy in mammographic CAD, a posteriori estimates are re
quired for each data point following the experts’ combination (one for the nor
mal and one for the suspicious class). To determine these estimates, the AWM
model is computed for the ﬁrst class, thereby obtaining the a posteriori estimate
p(yˆn = ω1  xn, ). From this, the estimate of the second class is determined as p(yˆn = ω2  xn, ) = 1 − p(yˆn = ω1  xn, ). We now proceed to the results
A KnowledgeBased Scheme for Digital Mammography 
635 
section to evaluate our novel contributions of weighted GMM segmentation experts and the novel AWM combination strategy.
11.4.5Results of Applying Image Segmentation Expert Combination
The aim of our experiments was to (i) perform a comparison between the four proposed models of image segmentation. The baseline comparison with a simple GMM based image segmentation and an MRF model in [18] shows that our proposed models easily outperform the baseline models. (ii) To compare the performance of the AWM combination strategy against the ensemble combination rules. Section 11.4.5.1 compares the four models on the two databases, and section 11.4.5.2 compares the AWM approach with ensemble combination rules approach on the two databases.
Our segmentation performance evaluation is performed on 400 mammograms selected from the DDSM. The ﬁrst 200 mammograms contain lesions and the remaining 200 mammograms are normal (used only for training purposes). Each of these mammograms has been categorized into one of the four groups representing different breast density, such that each category has 100 mammograms. The partitioning of the mammograms has been performed manually on the basis of the target breast density according to DDSM ground truth. The results will be reported in terms of the Az value that represents the area under the ROC curve as well as sensitivity (the segmentation evaluation for testing is based on groundtruth information as given in DDSM).
The grouping of mammograms by breast density is applicable only to the supervised approaches. Supervised approaches segmenting a mammogram with a speciﬁc breast density type use a trained observed intensity model constructed with only training samples from that breast type. Thus, each trained observed intensity model will be specialized in the segmentation of a mammogram with a speciﬁc breast type. We adopt a ﬁvefold crossvalidation strategy. Using this procedure, a total of ﬁve training and testing trials are conducted, and each time the data appearing in training does not appear as testing. For each of the ﬁvefolds, equal numbers of normal and suspicious pixels are used to represent training examples from their respective classes. These sample pixels are randomly sampled from the training images. In the unsupervised
636 




Singh and Bovis 


Table 11.10: Mean AZ for each breast type and segmentation 


strategy. 













Breast type 
WGMMS 
WGMMSMRF 
WGMMU 
WGMMUMRF 








1 
0.68 
0.70 
0.66 
0.59 


2 
0.66 
0.66 
0.66 
0.60 


3 
0.72 
0.80 
0.75 
0.75 


4 
0.66 
0.76 
0.68 
0.74 



Mean 
0.68 
0.73 
0.68 
0.67 








Winning strategies are given in bold.
case, there is no concept of training and testing and each image is treated individually.
11.4.5.1Comparison of the Four models
(WGMMS, WGMMU , WGMMMRFS , and WGMMUMRF)
A crossvalidation approach is used to determine the optimal number of component Gaussians, for each breast type. The determined value of mis then used for all training folds comprising each breast type. To determine the optimal value of m, models with a different number of components are trained and evaluated with a WGMMS strategy, using an independent validations set. Model ﬁtness is quantiﬁed by examining the log likelihood resulting from the validation set. Training ﬁles are created by taking 200 samples randomly drawn with replacement from each normal and abnormal images for each breast type. For training we use 50 training images per breast type (n = 25 normal, n = 25 abnormal) giving a training size of 10,000 samples per breast type. Repeating the procedure for 50 remaining validation image per breast type, we get 10,000 samples for validation.
In our evaluation procedure the aim is to determine the correct number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) in order to plot the ROC curve. A detailed summary of how each segmented region is classed as one of these is detailed in [18]. The results are shown in Table 11.10 grouped on the basis of breast density. It is easily concluded that the supervised strategy with MRF is a clear winner. Interestingly, the performance of this method is superior for denser images compared to fatty ones. A simple
A KnowledgeBased Scheme for Digital Mammography 
637 
explanation for this phenomenon could be based on the model order selection where m = 1 for the abnormal class of the fatty breast types. A more sophisticated approach to determining model order might improve the segmentation of these breast types. Without the hidden MRF model, the supervised strategy is inferior to the unsupervised approach on the denser breasts.
11.4.5.2Comparison of Combination Strategies: Ensemble Combination Rules vs. AWM
In order to develop a number of experts that can be combined, we extract different grayscale and texture data per pixel in the images. The grayscale values of the pixels are intensity values, and texture features are extracted from pixel neighborhood. The following table shows the different feature experts used in our analysis based on different features. Each expert can be implemented with one of the four segmentation models described earlier.
Expert 
Description of pixel feature space 
Dimensionality 



gray 
Original gray scale 
1 
enh 
Contrast enhanced gray scales 
1 
dwt1 
Wavelet coefﬁcients from {DL1 H , D1H H , D1HL , SL1 L } 
4 
dwt2 
Wavelet coefﬁcients from {DL2 H , D2H H , D2HL , SL2 L } 
4 
dwt3 
Wavelet coefﬁcients from {DL3 H , D3H H , D3HL , SL3 L } 
4 
laws1 
Laws coefﬁcients from E5 impulse response matrix 
5 
laws2 
Laws coefﬁcients from L5 impulse response matrix 
5 
laws3 
Laws coefﬁcients from R5 impulse response matrix 
5 
laws4 
Laws coefﬁcients from W 5 impulse response matrix 
5 
laws5 
Laws coefﬁcients from S5 impulse response matrix 
5 



We now present the results on 200 test mammograms that contain lesions. The details of training and testing scheme are the same as detailed in section 11.4.2. As we mentioned earlier, each breast is classiﬁed as one of the four types (1, predominantly dense; 2, fat with ﬁbroglandular tissue; 3, heterogeneously dense; and 4, extremely dense) and the results are presented for data from each type. Table 11.11 shows the test results on sensitivity of the
640 
Singh and Bovis 
SEGMENTED IMAGES

















































Suspicious 



Suspicious 



Suspicious 



Suspicious 




regions 



regions 



regions 



regions 




in image of 



in image of 



in image of 



in image of 




breast type 1 



breast type 2 



breast type 3 



breast type 4 

























































Region prefiltering 



















Area threshold 

Area threshold 



Area threshold 



Area threshold 


Feature extraction 


















PCA 





















Trained 


Trained 




Trained 




Trained 




classifier 


classifier 




classifier 




classifier 






































Final image 


Final image 
Final image 
Final image 


with false– 


with false– 
with false– 
with false– 


positives 


positives 
positives 
positives 


removed 


removed 
removed 
removed 





























Figure 11.7: Schematic overview of falsepositive reduction strategy within the
adaptive knowledgebased model.
11.5A Framework for the Reduction of FalsePositive Regions
This section describes the approach used within the adaptive knowledgebased model for the reduction of falsepositive regions. Figure 11.7 shows a schematic overview of the approach adopted. Using the actual breast type grouping predicted by the breast classiﬁcation component, a segmented mammogram is directed to one of four process ﬂows. Each process ﬂow, shown in Fig. 11.7,
A KnowledgeBased Scheme for Digital Mammography 
641 
comprises the same functionality. This is discussed in more detail in the following subsections.
11.5.1Postprocessing Steps for Filtering Out False Positives
11.5.1.1 Region Preﬁltering
Feature extraction is computationally expensive. A common strategy [6, 7, 36] to reduce the number of regions considered for falsepositive reduction is achieved by applying a size test. By eliminating suspicious regions smaller than a predeﬁned threshold Tarea, the number of falsepositive regions can be reduced. For the expert radiologist interpreting a ﬁlm mammogram during screening, it is common to disregard any suspicious ROI less than 8 mm in diameter [37]. In mammographic CAD with computer automation, the size threshold is reduced and a common value for Tarea is the number of pixels corresponding to an area of 16 mm2 [6, 7, 36]. In the adaptive knowledgebased model, the area threshold is set at 19.5 mm2 corresponding to a region diameter of 5 mm for all breast type groupings. The DDSM used in this evaluation are digitized such that each pixel is 50 m. Following subsampling by a factor of four, an area threshold of 19.5 mm2 is equivalent to Tarea = 122 pixels, thus any suspicious region following segmentation with an area less than this value is marked as normal.
11.5.1.2 Feature Extraction
Features are extracted to characterise a segmented region in the mammogram. Feature vectors from masses are assumed to be considered different from normal tissue, and based on a collection of their examples from several subjects, a system can be trained to differentiate between them. The main aim is that features should be sensitive and accurate for reducing false positives. Typically a set or vector of features is extracted for a given segmented region.
From the pixels that comprise each suspicious ROI passing the preﬁltering size test described above, a subset of gray scale, textural, and morphological features used in previous mammographic studies are extracted. The features extracted are summarized in Table 11.14.
642 
Singh and Bovis 
Table 11.14: Summary of features extracted by feature grouping giving 316 features in total
Grouping 
Type 
Description 
Number 




Gray scale 
Histogram 
Mean, variance, skewness, kurtosis, 
5 


and entropy. 
15 × 15 
Textural 
SGLD 
From SGLD matrices constructed in 5 



different directions and 3 different 



distances 15 features [38, 39] are 



extracted. 
5 × 5 

Laws 
Texture energy [6] extracted from 25 



mask convolutions. 
4 × 12 

DWT 
From DWT coefﬁcients of 4 subbands 



at 3 scales the following statistical 



features are extracted: mean, 



standard deviation, skewness, 



kurtosis. 


Fourier 
Spectral energy from 10 Fourier rings. 
10 

Fractal 
Fractal dimension feature. 
1 
Morphological 
Region 
Circularity [4] area. 
2 




11.5.1.3 Principal Component Analysis
The result of feature extraction is a 316dimensional feature vector describing various grayscale histogram, textural, and morphological characteristics of each region. The curse of dimensionality [27] is a serious constraint in many pattern recognition problems and to maintain classiﬁcation performance, the dimensionality of the input feature space must be kept to a minimum. This is especially important when using an ANN classiﬁer, to maintain a desired level of generalization [32]. Principal component analysis (PCA) is a technique to map data from a highdimensional space into a lower one and is used here for such a purpose.
To use PCA in the adaptive knowledgebased model in an unbiased way, the PCA coefﬁcients, comprising eigenvalues and eigenvectors, are determined from an independent training set. In mapping to a lower dimensionality, only eigenvalues ≥ 1.0 are considered and the eigenvectors from training are applied to a testing pattern. Testing and training folds are formed using 10fold cross validation [32] such that an unbiased PCA transformation can be obtained for each testing sample.