Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Биомеханика

Файл:

Kluwer - Handbook of Biomedical Image Analysis Vol

.2.pdf

Скачиваний:

107

Добавлен:

10.08.2013

Размер:

25.84 Mб

Скачать

☆

<<< < Предыдущая 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 7576 / 8476 77 78 79 80 81 82 83 84 > Следующая >>>

734	Kallergi, Heine, and Tembey

Table 13.2: Feature set selected from the shape analysis of the segmented individual calciﬁcations and clusters and demographic dataa

Feature No.	Feature	Nature of feature

1	Age of the patient	Demographic feature;
		describes the patient
Individual calciﬁcation characteristics
2	Mean—Area of calciﬁcation	Describes the morphology (shape)
3	Mean—Compactness
4	Mean—Moments
5	Mean—Fourier Descriptor (FD)	Describes the margins
6	Mean—Eccentricity
7	Mean—Spread (S)
8	Number of calciﬁcations in cluster	Regional descriptor;
	(median of range)	describes distribution
Cluster characteristics
9	SD—Area	Describes the morphology
10	SD—Compactness
11	SD—Moments
12	SD—Fourier Descriptor	Describes the margins
13	SD—Eccentricity
14	SD—Spread

a Features are limited to morphological and distributional characteristics (with the exception of “age”) in order to reproduce the visual analysis system and indirectly use the classiﬁcation as a measure of segmentation.

13.5 Classiﬁcation Algorithm

Classiﬁcation was done with a three-layer, feed-forward artiﬁcial neural network (ANN) consisting of an input layer, one hidden layer, and an output layer. The NevProp4 backpropagation software was used in this study. NevProp4 is a general backpropagation algorithm developed by Philip H. Goodman at the University of Nevada, Reno [60]. Figure 13.15 shows a diagram of the network structure.

The feature vector of the input layer consisted of 14 elements (features) deﬁned in the previous stage (Table 13.2) and a bias element [60]. The hidden layer consisted of 12 nodes and the output layer had one node. For each cluster, the network was given the set of shape features at its input layer, merged these inputs internally using the hidden and output layers, and assigned a value in the range of 0–1, where 0 was the target output for the benign cases and 1 was the

Computer-Aided Diagnosis of Mammographic Calciﬁcation Clusters					735
F1	I1
		H1
F2	I2
		H2
F3	I3			Percent
		H3	O	likelihood of
F4	I4	:		malignancy
F4	I4	:	Output layer
	:	:	Output layer
		:
		:
	:	:
	:

: H12

F14 I14

Input layer

Hidden layer

Figure 13.15: Diagram of the NevProp4 artiﬁcial neural network (ANN) used for cluster classiﬁcation. This is a standard three-layer, feed-forward ANN where F1–F14 are the input features, I1–I14 are the input units, H1–H12 are the hidden units, and O is the output layer [20, 59, 60].

target output for the cancer cases. This value could be interpreted as a percent likelihood for a cluster to be malignant.

The generalization error of the ANN classiﬁer was estimated by the “leave- one-out” resampling method [61, 62]. Leave-one-out is a method generally recommended for the validation of pattern recognition algorithms using small datasets. The use of this approach usually leads to a more realistic index of performance and eliminates database problems such as small size and not fully representative contents and problems associated with the mixing of training and testing datasets [61, 63]. In the leave-one-out validation process, the network was trained on all but one of the cases in the set for a ﬁxed number of iterations and then tested on the one excluded case. The excluded case was then replaced, the network weights were reinitialized, and the training was repeated by excluding a different case until every case had been excluded once. For N cases, each exclusion of one case resulted in N–1 training cases, 1 testing case and a unique set of network weights. As the process was repeated over all N, there were N(N−1) training outputs and N testing outputs from which the training and testing mean square error (MSE) was, respectively, determined.

In addition to the leave-one-out method, other resampling approaches have been proposed for CADiagnosis algorithm training that could yield unbiased

736	Kallergi, Heine, and Tembey

results and provide meaningful and realistic estimates of performance. A preference toward the bootstrap technique is found in the literature although this is strongly dependent on the application and availability of resources [64]. There is considerable work reported in the ﬁeld and we will not elaborate more in this chapter. The reader, however, should be aware of the bias issues associated with large feature sets and small sample sizes and the possible methods of training and testing an algorithm. An approach should generally be selected that yields no overestimates of performance and avoids training bias.

The clinical value of CADiagnosis methods is usually assessed in two stages: First, computer performance is evaluated based on truth ﬁles deﬁned by the experts and biopsy information using computer generated receiver operating characteristic (ROC) curves [65, 66]. Computer ROC is implemented in the evaluation of classiﬁcation algorithms where sensitivity and speciﬁcity indices are generated by adjusting the algorithms’ parameters. Classiﬁcation algorithms differentiate usually between benign vs. cancer lesions, disease vs. not disease, disease type 1 vs. disease type 2, etc. The pairs of sensitivity and speciﬁcity generated by these algorithms can be plotted as a true positive fraction (TPF) vs. false positive fraction (FPF) to form an ROC curve [65]. Publicly available software tools, e.g., the ROCKIT from Metz at the University of Chicago [67], may be used to ﬁt the data and estimate performance parameters such as the area under the curve, AZ , its standard error (SE), conﬁdence intervals, and statistical signiﬁcance.

Following the laboratory evaluation, a true ROC experiment is usually performed that involves relatively large number of cases and human observers [68]. The cost and time requirements of an observer ROC study are signiﬁcant impediments in its implementation and such analysis is usually reserved for fully optimized techniques, namely for techniques that have been through rigorous computer ROC evaluation. Computer ROC evaluation poses speciﬁc requirements on database size and contents and the criteria used for the estimation of TPF and FPF values at the detection or the classiﬁcation level. We will not labor on these issues in this chapter; guidelines may be found in several publications in the ﬁeld of CAD and elsewhere [65, 69, 70]. We will only mention that a sufﬁciently large set should be selected for CADiagnosis validation to meet the requirements of the classiﬁcation scheme, while the contents of the dataset should be such as to address the speciﬁc clinical goals of the methodology. In addition, performance criteria should follow clinical guidelines and

Computer-Aided Diagnosis of Mammographic Calciﬁcation Clusters

737

be consistently and uniformly applied throughout the validation process. In the CADiagnosis algorithm applications presented below, equal numbers of benign and malignant cases with calciﬁcation clusters were used and almost all cluster shapes described in the BIRADS Lexicon [71] were represented in the sets. Performance parameters such as number of TP and FP clusters at the segmentation output or TPF and FPF at the classiﬁcation output were estimated based on well-deﬁned criteria that were consistently applied to all experiments.

13.6 CADiagnosis Algorithm Applications

Several applications of the algorithm described here have been reported in the literature [20, 28, 29]. In this chapter, we summarize the most important ones, report on new experiments that are linked to segmentation issues and reveal some of the open questions remaining in this area.

13.6.1Mammographic Cluster Classiﬁcation—Single View Application

A set of 260 single-view mammograms with calciﬁcation clusters was ﬁrst used for the validation of the CADiagnosis algorithm described previously. The set included 138 calciﬁcation clusters associated with benign disease that are commonly referred to as benign calciﬁcations or clusters and 122 calciﬁcation clusters associated with cancer that are commonly referred to as malignant calciﬁcations or clusters. All mammograms were selected from the patient ﬁles of the H. Lee Mofﬁtt Cancer Center & Research Institute at the University of South Florida. Original mammograms were acquired on two different mammography systems, both accredited by the American College of Radiology (ACR) and having similar performances. A DuPont Microvision ﬁlm combined with a Kodak Min-R (one-sided) screen was used for all mammograms. Films were digitized with a DBA (DBA Inc., Melbourne, FL) ImagClear R3000 CCD-based ﬁlm digitizer with a pixel size of 30 m, a pixel depth of 16 bits, and a nonlinear response to optical density [72]. Full images were resized to 60 m by mathematical interpolation keeping the pixel depth the same. For this application, 512 × 512 pixel ROIs were processed. ROIs were selected from the full 60 m images to contain the calciﬁcation cluster of interest.

738	Kallergi, Heine, and Tembey

Mammographic views were either cranio-caudal (CC) or medio-lateral oblique (MLO) views of the right of left breast. Two hundred two (202) views from this set were images of the same cluster that is they were CC and MLO views of the same breast and the same patient. For this application, however, they were considered as independent samples. This is common practice in the ﬁeld not only because of the rarity of the data but also because most CADetection schemes today are applied to single views only and do not usually consider the full mammogram or the “other” breast view in the process. A bias is certainly expected when views from the same patient and of the same cluster are treated as independent samples and this bias could affect performance. This is investigated in the following application example described in section 13.6.2.

The CADiagnosis scheme of Fig. 13.3 was applied to the set of 260 ROIs ﬁrst using the symmlet wavelet ﬁlter and then using the donut ﬁlter in the detection/segmentation stage. The computer ROC curves obtained from the two classiﬁcation experiments are shown in Fig. 13.16. The corresponding AZ performance index values and standard errors (SE) are included on the ﬁgure. The difference between the two curves was statistically signiﬁcant indicat-

1.0

0.8

				Symmletl t Waveletl t
				Az=0..86;; SE=0..02
0.6				Donutt Filterilt
0.6				Az=0..89;; SE=0..02
TPF
0.4
0.2
0.0

	0	0.2	0.4	0.6	0.8	1
				FPF

Figure 13.16: Computer ROC plots of the TPF and FPF pairs obtained from the classiﬁcation of 260 clusters. The dashed curve corresponds to the results obtained with the symmlet wavelet ﬁlter and the solid curve corresponds to the results obtained with the donut ﬁlter. The estimated area indices AZ and corresponding SE values are included in the insert.

Computer-Aided Diagnosis of Mammographic Calciﬁcation Clusters

739

ing that cluster classiﬁcation using the donut ﬁlter in the detection and segmentation stage was signiﬁcantly better than classiﬁcation using the symmlet wavelet.

13.6.2Mammographic Cluster Classiﬁcation—Two-View Application

In this application of the algorithm, the two views of the same cluster were combined for the selection of the classiﬁcation features [59]. A total of 101 paired clusters were available for this test. The 14 features of Table 13.2 were ﬁrst determined on the 101 CC and 101 MLO views of the cluster and then averaged. The set of average feature values were then used as input to the classiﬁcation stage of the algorithm (Fig. 13.3). The computer generated ROC curves of the classiﬁcation performance of the algorithm obtained with the symmlet wavelet and the donut ﬁlter are shown in Fig. 13.17. Similar to the previous experiment, the classiﬁer with the donut ﬁlter outperformed the classiﬁer with the

1
0.8
0.6
TPF
0.4
			Symmlet Wavelet
			Az=0.93; SE=0.02
0.2			Donut Filter
			Az=0.96; SE=0.02
0
0	0.2	0.4	0.6	0.8	1

FPF

Figure 13.17: Computer ROC plots of the TPF and FPF pairs obtained from the classiﬁcation of 101 clusters using two-view information for feature estimation. The dashed curve corresponds to the results obtained with the symmlet wavelet ﬁlter and the solid curve corresponds to the results obtained with the donut ﬁlter. The estimated area indices AZ and corresponding SE values are included in the insert.

740	Kallergi, Heine, and Tembey

symmlet wavelet. And both outperformed their respective performances on the single-view application. The results suggest that the combination of views for feature estimation seems beneﬁcial to the classiﬁcation process.

Two views could lead to the deﬁnition of more robust features improving classiﬁcation performance independent of the segmentation method used in the process. But, is averaging the best approach to feature selection from the two mammographic views? Our results seem to indicate that averaging is promising. However, they are somewhat counterintuitive since averaging carries the risk of introducing a fuzziness to an otherwise good descriptor, i.e., a feature that was a good descriptor in one view but poor in the other may lose its robustness once averaged. So, should we average or possibly combine features from the two views for the generation of a larger feature set? The answer to this question is not clear and more work is needed to determine the best feature combination from two mammographic views.

13.6.3Impact of False Positive Segmentations on Classiﬁcation

In this third experiment, we examined the impact of the FP segmentations on the performance of the classiﬁer. As seen in Figs. 13.13 and 13.14, several signals remain in the segmentation output that are not true calciﬁcations, individual, or clusters. These signals enter the classiﬁcation stage of the algorithm and are likely to affect performance. To determine the degree of this effect, we ﬁrst estimated the number of TP and FP segmented clusters. This was done be comparing the segmentation output to manual outlines of the clusters and their major calciﬁcations generated by expert mammographers. The guidelines and conventions described elsewhere [70] were followed for these estimates. Speciﬁcally, a segmented group of calciﬁcations was considered a TP when it contained at least three segmented true calciﬁcations [71]. A FP cluster was one that consisted of at least three segmented objects outside the area of the true cluster within a distance of ≤1 cm from each other. Following the above guidelines, we determined that for a 100% TP rate, an average of 2.8 FP clusters were segmented per image with the symmlet wavelet and an average of 2.0 FP clusters were segmented with the donut ﬁlter. A reduction in either FP rate was always followed by a reduction in the TP rate to levels that were not acceptable

Computer-Aided Diagnosis of Mammographic Calciﬁcation Clusters

741

by the classiﬁcation stage that, in our case, is heavily based on morphology and distribution characteristics of the individual calciﬁcations and their clusters.

To study the impact of the FP signals on performance but without losing TP information, we did the following experiments:

(a)The 512 × 512 pixel ROIs of the 260 clusters were automatically reduced to 200 × 200 pixels. As a result, several of the edge effects and associated false signals were eliminated concentrating the analysis on the center of the region where the signal of interest (cluster) should normally be present. Both algorithm versions were applied to the reduced-size ROIs. Results suggested that the classiﬁcation of both the benign and malignant cases might be improved by up to 15% for the algorithm using the symmlet wavelet ﬁlter and up to 10% for the algorithm using the donut ﬁlter. The smaller improvement in the latter case was expected because the donut ﬁlter did not show major edge effects as the symmlet wavelet did in the original ROIs (see Figs. 13.13 and 13.14). This seemed to be an easy and fast remedy to the problem of FP signals with one downside. Namely, if the clusters were off-center in the initial ROI either due to their physical location in the breast (e.g., close to the chest wall or the skin area) or due to the initial ROI selection, then important information was lost and classiﬁcation could not be done successfully.

(b)In a second experiment, all FP clusters and all single, isolated false signals that were outside the boundaries of the true cluster were manually eliminated from the 512 × 512 pixel ROIs. This manual elimination was done for a subset of 30 cases that contained small calciﬁcation clusters (3–10 calciﬁcations per cluster). The original and FP-free ROIs were then used for feature estimation and classiﬁcation. The elimination of the FP signals improved the classiﬁcation of both benign and malignant cases. Signiﬁcant classiﬁcation improvement was observed for both benign and malignant calciﬁcation clusters and both algorithm versions. Classiﬁcation errors were reduced up to 30% for the benign cases and up to 50% for the malignant cases. Further analysis of these results revealed that the presence of very small false objects in the segmentation output degraded classiﬁcation performance more than large false objects such as those originating from the edge artifacts.

742	Kallergi, Heine, and Tembey

13.7 Conclusions

CADiagnosis is an area that merges the ﬁelds of signal processing and pattern recognition for the creation of tools that can have a signiﬁcant impact in health care delivery and patient management. CADiagnosis algorithms usually involve several modules that need to be separately optimized and validated for an overall optimum performance. In this chapter, we presented a CADiagnosis methodology for the differentiation between benign and malignant breast calciﬁcation clusters in mammograms. We speciﬁcally looked into one of the aspects of the algorithm, namely the impact of segmentation in the overall classiﬁcation process, and the role of multiresolution analysis in the segmentation process.

Our classiﬁcation approach was based primarily on morphological and distributional features of mammographic calciﬁcations and, hence, the role of segmentation was particularly important in the overall implementation and performance. Knowing the limitations of image segmentation techniques that were further exaggerated by our additional challenge to preserve morphology and distribution, we developed two multiresolution ﬁlters that were able to yield successful and clinically promising results. Although far from perfect segmentations, the symmlet wavelet and the donut ﬁlter adequately preserved the characteristics of the calciﬁcations as required by the overall algorithm’s design. A new ﬁlter, labeled as the “donut ﬁlter,” was introduced for mammogram processing that seems to offer a robust solution to the problems associated with the detection and segmentation of mammographic images. The new ﬁlter was not utilized to its full potential and several implementation pathways remain to be explored. Its initial testing, however, yielded promising results and its usefulness could go beyond mammography to other medical imaging applications.

An important question at the end of the experiments presented here is whether similar classiﬁcation performance can be achieved, either with the symmlet wavelet or the donut ﬁlter, for images generated from various sources. For example, for images generated by different ﬁlm digitizers (laserbased vs. charge-couple-device-based systems), or by different imaging systems (screen/ﬁlm vs. direct digital systems), or with different resolution characteristics (pixel size and bit depth). Preliminary work with different data types suggests that similar classiﬁcation results may be obtained if a standardization process is applied to the images prior to processing. As long as pixel size and depth are within acceptable ranges for CADiagnosis applications in mammography,

Computer-Aided Diagnosis of Mammographic Calciﬁcation Clusters

743

a standardization algorithm can easily convert the characteristics of any set of data to those for which the CADiagnosis system was initially trained and optimized keeping performance consistent [20, 73].

An interesting spin-off application of our initial development originated from the FP impact observation on classiﬁcation performance. We found that classiﬁcation results could be used as an indirect measure of segmentation quality particularly when the classiﬁcation scheme is based solely on morphological and distributional characteristics like the one described here. Segmentation evaluation is one of the most challenging issues in medical image processing. It usually requires objective and accurate “ground truth” or “gold standard” information that is often unattainable in medical imaging where the human observer is commonly the only source of “ground truth” information. Using the classiﬁer’s output for indirect segmentation validation may offer an advantage over more traditional techniques that use absolute measures of shape and size and require exact ground truth information. After all, it is the clinical outcome that is important in these applications.

Finally, the described CADiagnosis scheme seems to be amendable to a variety of applications beyond breast cancer screening and early diagnosis. The input feature set and classiﬁcation output could be modiﬁed and expanded to address problems associated with the diagnostic patient and speciﬁc breast disease types involving calciﬁcations, e.g., ductal carcinoma in-situ, for the development of computer tools that go beyond detection and diagnosis into the domains of prognosis, patient management, and follow-up.

13.8 Acknowledgments

The authors acknowledge the valuable assistance of Angela Salem in the generation of the image databases used for algorithm development and testing, and of Joseph Murphy in the processing of the data.

Questions

1.Role of CADetection and CADiagnosis techniques in breast cancer and mammography. What is the relationship of the two systems?

<<< < Предыдущая 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 7576 / 8476 77 78 79 80 81 82 83 84 > Следующая >>>

Соседние файлы в предмете Биомеханика

#
10.08.201320.85 Mб108Intermediate Physics for Medicine and Biology - Russell K. Hobbie & Bradley J. Roth.pdf
#
10.08.2013966.8 Кб83Intermediate Probability Theory for Biomedical Engineers - JohnD. Enderle.pdf
#
10.08.201335.92 Mб67Introduction to Biomedical Engineering - John D. Enderle et al.pdf
#
10.08.20131.72 Mб130Introduction to Statistics for Biomedical Engineers - Kristina M. Ropella.pdf
#
10.08.201310.58 Mб110Kluwer - Handbook of Biomedical Image Analysis Vol.1.pdf
#
10.08.201325.84 Mб107Kluwer - Handbook of Biomedical Image Analysis Vol.2.pdf
#
10.08.201316.35 Mб116Kluwer - Handbook of Biomedical Image Analysis Vol.3.pdf
#
10.08.20137.87 Mб1851Laser-Tissue Interactions Fundamentals and Applications - Markolf H. Niemz.pdf
#
10.08.20132.76 Mб125Mathematics for Life Sciences and Medicine - Takeuchi Iwasa and Sato.pdf
#
10.08.20131.85 Mб65Metabolic Engineering - T. Scheper and Jens Nielsen.pdf
#
10.08.201310.41 Mб68Micro-Nano Technology for Genomics and Proteomics BioMEMs - Ozkan.pdf