Telechargé par fakfad

Taifi-Fakir

Mammogram Classification using
Nonsubsampled Contourlet Transform
and Gray-Level Co-occurrence Matrix
for Diagnosis of Breast Cancer
Khaddouj Taifi, TIAD Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco
Naima Taifi, EREIM Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco
Mohamed Fakir, TIAD Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco
Said Safi, LIMATI Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco
ABSTRACT
Mammography is a well-known method used for the detection of breast cancer and detection. it is an
essential step in computer-aided diagnosis systems (CAD); it’s can be very helpful for radiologist in
detection and diagnosing abnormalities earlier and faster than traditional screening programs. In this area
many researchers worked for developing algorithms to analyze these images and may also assist doctors
make decisions.
This paper presents an evaluation of the performance of the feature extraction using Gray-Level Cooccurrence Matrix to all the detailed coefficients the Discrete Wavelet transform (DWT) and
Nonsubsampled Contourlet Transform (NSCT) of the region of interest (ROI) of a mammogram were used
to decompose an ROI into several scale. The detection of masses is more difficult than the detection of
microcalcifications due to the similarity between masses and background tissue as F - Fatty, G - Fattyglandular and D - Dense-glandular, we investigated the application of multiresolution texture features to
reduce false positive detection in a computerized mass detection program. We also evaluated the robustness
of the classification model by studying its performance with various feature training/test by accuracy
measures and to validate the efficacy of the suggested scheme, simulation has been carried out using MIAS
database.
A classiﬁer system based on K- Nearest -Neighbors (KNN), Support Vector Machine (SVM) are used. The
accuracy measures are computed with respect to normal, abnormal for MIAS database these accuracy
measures are 94.12% and 88.89% respectively with SVM and KNN by Nonsubsampled Contourlet
Transform but the accuracy measures are 76 % with SVM and KNN by Discrete Wavelet transform. The
best results from all the transforms NSCT and GLCM the Fatty tissues are always obtained for  =0 and
distance the GLCM d=1 and for the comparison between NSCT and DWT the results shows that the NSCT
gives the good result that dwt for all orientation.
Keywords: Mammogram, NSCT, DWT, GLCM, Mass, SVM, KNN, MIAS, Accuracy, Texture analysis
INTRODUCTION
Currently, breast cancer is the first cancer for women in worldwide and its incidence is increasing,
Therefore, the search for an analyzing images of the breast to aid system diagnostic attract the attention of
many researchers. There are, at present, a number of techniques used for the medical imaging for breast
cancer diagnosis are: Ultrasound (imaging ultrasound), IRM imaging (Magnetic resonance) and
mammography. Various studies have confirmed this is the detection of early stage breast cancer may
improve prognosis. mammography technique remains the essential detecting breast, the most efficient in
monitoring and early detection of breast cancer. It helps to highlight potential radiological signs such as
suspicious opacities which can translate from malignant lesions. However, despite significant progress in
terms of equipment, all radiologists recognize the difficulty of interpreting mammograms which further
increased by the type of breast tissue examined. Mammographic images show a contrast between the two
main constituents of the breast fatty tissue and connective-ﬁbrous matrix. In general, it is extremely difficult
to deﬁne normality of mammographic images: Indeed, the appearance of the mammary gland is extremely
variable depending on the patient’s age and the period during which the mammogram is done.
Many researchers have proposed the algorithms for mass. (S. Beura et al., 2015), presented an approach
for Mammogram classiﬁcation using two dimensional discrete wavelet transform and gray-level cooccurrence matrix for detection of breast cancer. (Yu. Zhang et el., 2010) presented a novel segmentation
method for identifying mass regions in mammograms. For each ROI, an enhancement function was applied
proceeded with a filters. Next, energy features based on the co-occurrence matrix of pixels were computed.
(P. Rahmati et el., 2009) presented a region-based active contour approach to segment masses
in digital mammograms. The algorithm used a Maximum Likelihood approach based on the
calculation of the statistics of the inner and the outer region. (M.M. Eltoukhy et al., 2010) presented an
approach for breast cancer diagnosis in digital mammogram using curvelet transform. After decomposing
the mammogram images in curvelet basis, a special set of the biggest coefficients is extracted as feature
vector.
The literature survey reveals about the existing classification schemes for digital mammogram images.
However, most of them are not able to provide a good accuracy. In this paper, we have proposed an effective
feature extraction algorithm using Nonsubsampled Contourlet Transformation based multiresolution
analysis and the Wavelet transform Discrete along with gray-level co-occurrence matrix (GLCM) to
compute texture features for mammographic images. use these signiﬁcant features, a SVM and KNN have
been used as classifier to predict the mammogram, whether it is a normal or abnormal. In addition, the
severity with respect to malignant or benign is also estimated in abnormal cases. The flow chart for proposed
extraction and classification is shown in (see Figure 1). The rest of this paper is organized as follows:
Section 2 deals with the proposed scheme, where extraction of features and classiﬁcation is discussed in
detail. Section 3 describes the experimental results and analysis. Section 4 gives the concluding remarks.
Figure1. block diagram of the proposed scheme for classiﬁcation of mammograms using SVM and KNN
Extraction of region of interest (ROI)
It may be noted that Mammography images are often affected by different types of noise that are due to
acquisition parameters, such as the exposure time and the strength of compression of the breast, artifacts in
their background. The object area also contains the pectoral muscles. A human visual system can easily
ignore these artifacts in the interpretation, this is not the case in an automated system and these artifacts
may interfere with the interpretation process. More recently, work on the extraction of the breast area and
removal of artifacts in mammography (M. Wirth et al.,2005; L. Belkhodja et al., 2009; J. Nagi et al.,2011)
have proven their effectiveness in the development of an automatic diagnostic aid in mammography.
All these areas are unwanted portions for the texture analysis due to which the full mammographic image
is unsuitable for feature extraction and subsequent classification. Therefore, a cropping operation has been
applied on mammogram images to extract the regions of interests (ROIs) which contain the abnormalities,
excluding the unwanted portions of the image.
We used in our work based images “MIAS”: “http://peipa.essex.ac.uk/ipa/pix/mias/” and the following link
provide information on the nature, location the of abnormality present “http:
//peipa.essex.ac.uk/info/mias.html”. The link above gives you the center of clusters the abnormal area as
the center of ROI as shown in (see Figure 2). From the center you can extract regions of interest. Original
images are of size 1024 ×1024, the regions of interest can be either 256 × 256, 128 ×128 or 64 ×64
depending on your choice.
Figure2. Cropping of ROI from mammographic image referring the center of the abnormal area
For the extraction of normal ROI, the same cropping procedure is performed on normal mammographic
images with random selection of location. Thus, in this phase, the ROIs extracted are free from the
background information and noises. Figure 3, Figure 4 and Figure 5 show some extracted ROIs containing
different classes of abnormality the different type tissues present in mammograms.
Figure3. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and
benign classes respectively the Fatty tissues
Figure4. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and
benign classes respectively the Fatty-glandular tissues
Figure5. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and
benign classes respectively the Dense-glandular tissues
Discrete Wavelet Transform
One of the multiresolution analysis tools that has been widely used in image processing is wavelet analysis.
Originally proposed in the form of Mallat's pyramidal algorithm, an image can be successfully decomposed
into detail sub-bands at different level of resolutions. The decomposition was done by filtering the images
using pair of low pass (G) and high pass (H) filter, followed by down sampling of factor of 2, first along
rows and columns (see Figure 6). This decomposition is known as 2-dimensional (2D) separable discrete
wavelet transform (DWT). Our detection method decomposes the original image into sub-bands with lowlow Approximation (LL), low-high vertical (LH), high-low horizontal (HL), and high-high diagonal (HH)
components (see Figure 6). In the overall system the LL sub-band is further decomposed into another four
sub-bands. Three stages of decomposition are necessary because. The lowest frequency sub-band that is
generated is set to zero, since the other sub-bands contain the high frequency information
microcalcifications and masses. After this decomposition stage, we obtain then an image that contains only
the high frequency information see (S. G. Mallat., 1989; M. Vetterli et al., 1992).
Figure6. Filter bank implementation of 2-D wavelet transform
Nonsubsampled Contourlet Transformation
(MN. Do et all., 2005) proposed the contourlet transform as a directional multiresolution image
representation that can efficiently capture and represent smooth object boundaries in natural images. The
contourlet transform is constructed as a combination of the Laplacian pyramid (Lu. Yue et al., 2006) and
the directional filter banks (DFB) (PJ. Burt et al., 19983). The contourlet transform can efficiently capture
the intrinsic geometric structures such as contours in an image and can achieve better expression of image
than the wavelet transform. Moreover, it is easily adjustable for detecting fine details in any orientation
along curvatures, which results in more potential for effective analysis of images.
However, the contourlet transform is lack of shift-invariance due to the down sampling and up sampling,
in 2006, Cunha et al. proposed the nonsubsampled contourlet transformation (NSCT) (RH. Bamberger et
al.,1992) which is a fully shift-invariant, multiscale, and multidirectional expansion that has better
directional frequency localization and a fast implementation. NSCT consists of two ﬁlter banks, i.e. the
nonsubsampled pyramid filter bank (NSPFB) and the nonsubsampled directional filter bank (NSDFB) as
shown in (see Figure 7.a), which split the 2-D frequency plane in the sub-bands illustrated in (see Figure
7.b). The NSPFB provides nonsubsampled multi-scale decomposition and captures the point
discontinuities. The NSDFB provides nonsubsampled directional decomposition and links point
discontinuities into linear structures.
Figure7. Nonsubsampled contourlet transform. (a) NSFB structure that implements the NSCT; (b)
Idealized frequency partitioning
Gray-Level Co-occurrence Matrix
Haralick proposed method the matrix of co-occurrence of gray levels. This approach is to explore the spatial
dependency of texture by constructing a co-occurrence matrix in an orientation and a distance between the
pixels of the image. Then, upon extraction of information based on the parameter the Haralick is defined
as: contrast, entropy, homogeneity of the variance, the variance of the amounts, the variance of the
differences, the average sum, correlation, energy, uniformity, entropy sums, entropy differences, the
correlation information1, and the correlation. The success of this method depends on the choice parameters:
the orientation and the distance between two neighboring pixels.
A co-occurrence matrix measures the probability of occurrence of pixel pairs located at a certain distance
in the image, it is based on the calculation of the probability P (i, j, d, θ).
The angular directions used in the calculation of GLCM are respectively: θ = 0, 45,90,135 degrees with the
distance d = 1,2, 3... (see Figure 8)
Figure8. Directionality used in the gray level co-occurrence matrix
The texture descriptors derived from GLCM are cluster shade, contrast, energy, Homogeneity and
Correlation.

Energy
n
n
E   Pd , (i, j ) 2
(1)
i 0 j 0

Homogeneity
n
n
1
P (i, j )
2 d ,
j  0 1  (i  j )
H  
i 0

(2)
Contrast
n
n
C   (i  j ) 2 Pd , (i, j )
(3)
i 0 j 0

Correlation
n
Cor 
n
 ijP  (i, j )   
i 0 j 0
d,
i
j
(4)
 i j
With:
i   i  j i.Pd , (i, j )
(5)
 j   i  j j.Pd , (i, j )
(6)
 i   i  j  i  i  Pd , (i, j )
(7)
 j   i  j  i   j  Pd , (i, j )
(8)
2
2
Classification
In the classification phase are use the nearest Neighbor classifier (KNN) (O. Boiman et al., 2008) and
Support Vector Machine (SVM) (J. C. Burges et al.,1998; B.Scholkopfand et all., 2002; U. Krebel.,
1992).
EXPERIMENTAL RESULTS AND ANALYSIS
To validate the proposed feature extraction and mammogram classification scheme, simulations have been
carried out in the MATLAB environment. For the analysis of the proposed method, mammographic images
are taken from database such as Mammographic Image Analysis Society (MIAS) database:
http://peipa.essex.ac.uk/ipa/pix/mias/.
MIAS databases provide appropriate information based on types of background tissues, and the class of
abnormalities present in the mammograms. The class of abnormality consists of normal, abnormal class;
the abnormal class is divided in to two sub-classes such as benign and malignant. The MIAS database
contains 322 images, which are categorized into three according to tissue types like fatty, fatty-glandular
and dense-glandular. Out of 322 images, 207 images are normal, 115 images are abnormal, and again among
abnormal images the number of benign and malignant types are 64 and 51 respectively. Each
mammographic ROI has been taken of size 256×256 used in the feature extraction phase to ﬁnd several
types of features.
Table1. Number of Training and Testing samples
Type of tissues
Fatty
Fatty-glandular and dense tissues
Abnormal
24
26
Normal
24
20
Training
31
28
Testing
17
18
In this paper, Nonsubsampled Contourlet Transformation it is easily adjustable for detecting fine details in
any orientation as Figure 9, Figure 10 and Figure 11. Furthermore, for GLCM is computed from image
result the Nonsubsampled Contourlet Transform with the distance parameter (D) the GLCM. The value of
D has been taken 1 and 2. From each GLCM, a total of four feature descriptors such as contrast, correlation,
energy and Homogeneity form a feature descriptor matrix.
Figure9. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet Transform.
a, b and c of ROIs represent normal, malignant and benign classes respectively the Fatty tissues
enhancement
Figure10. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet
Transform. a, b and c of ROIs represent normal, malignant and benign classes respectively the
Fatty-glandular tissues enhancement
Figure11. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet
Transform. a, b and c of ROIs represent normal, malignant and benign classes respectively the
Dense-glandular tissues enhancement
Measures for Performance Evaluation
To measure Accuracy a medical test, we test some people the presence of disease. Some of these people
have the disease, and our test said they are positive. They are called true positives (TP). Some have the
disease, but the test says they do not. They are called false negatives (FN). Some do not have the disease,
and the test said they do not - true negative (TN). Finally, there might be people in good health who have a
positive result - false positives (FP). Thus, the number of true positives, false negatives, true negatives and
false positives add up to 100% of the whole is shown in (Table 2). A number of different measures are
commonly used to evaluate the performance of the proposed method. These measures including
classification accuracy (Ac) (R. Nithya et al., 2011) is calculated from confusion matrix. The confusion
matrix describes and predicted classes of the proposed method
Ac 
TP  TN
TP  TN  FP  FN
(9)
Table2. Confusion matrix
Actual
Predicted
Positive
Negative
Positive
TP(True Positive)
FN(False Negative)
Negative
FP(False Positive)
TN(True Negative)
TP- correct classification of abnormal.
FP- incorrect classification of abnormal.
TN- correct classification of normal.
FN- incorrect classification of normal.
The result is achieved by integrating the NSCT and GLCM feature as input of SVM with Polynomial, Rbf,
Linear and Sigmoid kernel function, followed KNN classiﬁer to choose the best one based on detection
rate. According to the (Table 3). For SVM classiﬁer, the results were similar in detection rate for NSCT
and GLCM feature which we get 94.12% for kernel function ’rbf’ and ’linear’, but for DWT the value of
detection rate decrease of 76%.
Table3. Accuracy the NSCT and DWT by SVM for classification with set distance D=1 for Fatty and
Dense-glandular tissues
Methods
NSCT
DWT
orientation
θ =0
θ =45
θ =90
θ =135
θ =0
θ =45
θ =90
θ =135
rbf
0,94
0,94
0,94
0,88
0,64
0,70
0,76
0,76
Fatty tissues
linear
quadratic
0,88
0,70
0,94
0,70
0,88
0,70
0,88
0,76
0,76
0,76
0,70
0,58
0,64
0,70
0,76
0,58
rbf
0.72
0.66
0.66
0.66
0.61
0.77
0.77
0.77
Dense-glandular tissues
linear
quadratic
0.6
0.38
0.50
0.50
0.44
0.38
0.61
0.44
0.66
0.61
0.55
0.72
0.66
0.72
0.66
0.66
From the Figure. 12 the best results from all the transforms (NSCT and GLCM) the Fatty tissues are always
obtained for θ=0.
Figure12. Accuracy the knn classiﬁer of various feature descriptors at
 =0,45,90 and 135 with set
distance d=1 for Fatty tissues
From the Figure 13 and Figure 14 the best results from all the transforms (NSCT and GLCM) the Fatty
tissues and Dense glandular tissues, θ=0, 45, 90 and 135 are always obtained for d = 1.
Figure13. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance
d=1, d=2 for Fatty tissues
Figure14. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance
d=1 for Fatty tissues with DWT and NSCT
Figure15. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance
d=1 for Fatty-glandular tissues and Dense-glandular tissues with DWT and NSCT.
From the Figure 14 and Figure 15 the best results from all the transforms (NSCT and DWT) the Fatty
tissues and Dense glandular tissues, θ=0, 45, 90 and 135 are always obtained for d= 1 and NSCT gived best
accuracy
CONCLUSION
In this paper, we propose an efficient mammogram classification scheme to support the decision of
radiologists. The scheme use NSCT, DWT and GLCM in succession to derive feature matrix form
mammograms. To validate the efficacy of the suggested scheme, simulation has been carried out using
MIAS database. In conclusion, the obtained result demonstrates that NSCT and GLCM with  =0 and d=1
gives the best results for Fatty tissues, and for the comparison between NSCT and DWT the results shows
that the NSCT gives the good result that dwt for all orientation. An accuracy of 88.89% has been obtained
for normal-abnormal in MIAS database by KNN, for SVM with kernel function 'rbf' and 'linear' shows
the best accuracy rate 94.12%.
REFERENCES
Bamberger, RH., & Smith, MJT. (1992). A filter bank for the directional decomposition of images:
theory and design. IEEE Trans. Signal Proc. 40(4): 882-893.
Beura, S., Majhi, B., & Dash, R. (2015). Mammogram classiﬁcation using two dimensional
discrete wavelet transform and gray-level co-occurrence matrix for detection of breast cancer,
Neurocomputing, 154, 1-14.
Belkhodja, L., & Benamrane, N. (2009) Approche d'extraction de la région globale d’intérêt et
suppression des artefacts radiopaques dans une image mammographiqu. IMAGE'09 Biskra.
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest neighbor based image
classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (VPR).
Burt, PJ., & Adelson, EH. (1983). The Laplacian pyramid as a compact image code. IEEE Trans.
Comm. 31: 532-540.
Burges, J. C. (1998). A tutorial on support vector machines for pattern recognition, Data Mining
and Knowledge Discovery, 2(2), pp. 121-167.
Do, MN., & Vetterli, M. (2005). The contourlet transform: an efficient directional multiresolution
image representation. IEEE Trans.Image Proc. 14(12).
Eltoukhy, M.M., Faye, I., & Samir, B.B. (2010). Breast cancer diagnosis in digital mammogram
using multiscale curvelet transform. Comput. Med. Imag.Graphics, 34.
Kreβel, U. (1999). Pairwise classiﬁcation and support vector Machines, In Advances in Kernel
Methods: Support Vector Learnings, MIT Press, Cambridge, MA, 255-268.
Mallat, S.G. (1989). A theory for multiresolution signal decomposition: The wavelet
representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693.
Nagi, J., Sameem, A., Nagi, F., & Syed, K. (2011). Automated breast profile segmentation for ROI
detection using digital mammograms. IEEE Biomedical Engineering and Sciences. IECBES
Nithya, R., & Santhi, B. (2011). Comparative study on feature extraction method for breast cancer
classification, 33(2).
Scholkopfand, B.A., & Smola, J. (2002). Learning with Kernels, MIT Press.
Rahmati, P., & Ayatollahi. (2009). Maximum Likelihood Active Contours Specialized for
Mammography Segmentation. Biomedical Engineering and Informatics. BMEI '09. 2nd
International Conference.
Vetterli, M., & Herley, C. (1992). Wavelets and ﬁlter banks: Theory and design. IEEE Trans.
Sig. Process, 40, 2207–2232.
Wirth, M., & Nikitenko, D. (2005). Suppression of stripe artifacts in mammograms using weighted
median filtering. Springer Link. doi:10.1007/11559573-117.
Yue, Lu., & Minh N. (2006). A new contourlet transform with sharp frequency localization. Proc.
of IEEE International Conference on Image Processing.
Zhang, Yu., Tomuro, N., Furst, J., & Stan Raicu, D. (2010). A Contour-based Mass Segmentation
in Mammograms. Scientific Commons.