Mammogram Classification using Nonsubsampled Contourlet Transform and Gray-Level Co-occurrence Matrix for Diagnosis of Breast Cancer Khaddouj Taifi, TIAD Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco Naima Taifi, EREIM Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco Mohamed Fakir, TIAD Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco Said Safi, LIMATI Laboratory, Sultan Moulay Slimane University, Beni Mellal, Morocco ABSTRACT Mammography is a well-known method used for the detection of breast cancer and detection. it is an essential step in computer-aided diagnosis systems (CAD); it’s can be very helpful for radiologist in detection and diagnosing abnormalities earlier and faster than traditional screening programs. In this area many researchers worked for developing algorithms to analyze these images and may also assist doctors make decisions. This paper presents an evaluation of the performance of the feature extraction using Gray-Level Cooccurrence Matrix to all the detailed coefficients the Discrete Wavelet transform (DWT) and Nonsubsampled Contourlet Transform (NSCT) of the region of interest (ROI) of a mammogram were used to decompose an ROI into several scale. The detection of masses is more difficult than the detection of microcalcifications due to the similarity between masses and background tissue as F - Fatty, G - Fattyglandular and D - Dense-glandular, we investigated the application of multiresolution texture features to reduce false positive detection in a computerized mass detection program. We also evaluated the robustness of the classification model by studying its performance with various feature training/test by accuracy measures and to validate the efficacy of the suggested scheme, simulation has been carried out using MIAS database. A classifier system based on K- Nearest -Neighbors (KNN), Support Vector Machine (SVM) are used. The accuracy measures are computed with respect to normal, abnormal for MIAS database these accuracy measures are 94.12% and 88.89% respectively with SVM and KNN by Nonsubsampled Contourlet Transform but the accuracy measures are 76 % with SVM and KNN by Discrete Wavelet transform. The best results from all the transforms NSCT and GLCM the Fatty tissues are always obtained for =0 and distance the GLCM d=1 and for the comparison between NSCT and DWT the results shows that the NSCT gives the good result that dwt for all orientation. Keywords: Mammogram, NSCT, DWT, GLCM, Mass, SVM, KNN, MIAS, Accuracy, Texture analysis INTRODUCTION Currently, breast cancer is the first cancer for women in worldwide and its incidence is increasing, Therefore, the search for an analyzing images of the breast to aid system diagnostic attract the attention of many researchers. There are, at present, a number of techniques used for the medical imaging for breast cancer diagnosis are: Ultrasound (imaging ultrasound), IRM imaging (Magnetic resonance) and mammography. Various studies have confirmed this is the detection of early stage breast cancer may improve prognosis. mammography technique remains the essential detecting breast, the most efficient in monitoring and early detection of breast cancer. It helps to highlight potential radiological signs such as suspicious opacities which can translate from malignant lesions. However, despite significant progress in terms of equipment, all radiologists recognize the difficulty of interpreting mammograms which further increased by the type of breast tissue examined. Mammographic images show a contrast between the two main constituents of the breast fatty tissue and connective-fibrous matrix. In general, it is extremely difficult to define normality of mammographic images: Indeed, the appearance of the mammary gland is extremely variable depending on the patient’s age and the period during which the mammogram is done. Many researchers have proposed the algorithms for mass. (S. Beura et al., 2015), presented an approach for Mammogram classification using two dimensional discrete wavelet transform and gray-level cooccurrence matrix for detection of breast cancer. (Yu. Zhang et el., 2010) presented a novel segmentation method for identifying mass regions in mammograms. For each ROI, an enhancement function was applied proceeded with a filters. Next, energy features based on the co-occurrence matrix of pixels were computed. (P. Rahmati et el., 2009) presented a region-based active contour approach to segment masses in digital mammograms. The algorithm used a Maximum Likelihood approach based on the calculation of the statistics of the inner and the outer region. (M.M. Eltoukhy et al., 2010) presented an approach for breast cancer diagnosis in digital mammogram using curvelet transform. After decomposing the mammogram images in curvelet basis, a special set of the biggest coefficients is extracted as feature vector. The literature survey reveals about the existing classification schemes for digital mammogram images. However, most of them are not able to provide a good accuracy. In this paper, we have proposed an effective feature extraction algorithm using Nonsubsampled Contourlet Transformation based multiresolution analysis and the Wavelet transform Discrete along with gray-level co-occurrence matrix (GLCM) to compute texture features for mammographic images. use these significant features, a SVM and KNN have been used as classifier to predict the mammogram, whether it is a normal or abnormal. In addition, the severity with respect to malignant or benign is also estimated in abnormal cases. The flow chart for proposed extraction and classification is shown in (see Figure 1). The rest of this paper is organized as follows: Section 2 deals with the proposed scheme, where extraction of features and classification is discussed in detail. Section 3 describes the experimental results and analysis. Section 4 gives the concluding remarks. Figure1. block diagram of the proposed scheme for classification of mammograms using SVM and KNN Extraction of region of interest (ROI) It may be noted that Mammography images are often affected by different types of noise that are due to acquisition parameters, such as the exposure time and the strength of compression of the breast, artifacts in their background. The object area also contains the pectoral muscles. A human visual system can easily ignore these artifacts in the interpretation, this is not the case in an automated system and these artifacts may interfere with the interpretation process. More recently, work on the extraction of the breast area and removal of artifacts in mammography (M. Wirth et al.,2005; L. Belkhodja et al., 2009; J. Nagi et al.,2011) have proven their effectiveness in the development of an automatic diagnostic aid in mammography. All these areas are unwanted portions for the texture analysis due to which the full mammographic image is unsuitable for feature extraction and subsequent classification. Therefore, a cropping operation has been applied on mammogram images to extract the regions of interests (ROIs) which contain the abnormalities, excluding the unwanted portions of the image. We used in our work based images “MIAS”: “http://peipa.essex.ac.uk/ipa/pix/mias/” and the following link provide information on the nature, location the of abnormality present “http: //peipa.essex.ac.uk/info/mias.html”. The link above gives you the center of clusters the abnormal area as the center of ROI as shown in (see Figure 2). From the center you can extract regions of interest. Original images are of size 1024 ×1024, the regions of interest can be either 256 × 256, 128 ×128 or 64 ×64 depending on your choice. Figure2. Cropping of ROI from mammographic image referring the center of the abnormal area For the extraction of normal ROI, the same cropping procedure is performed on normal mammographic images with random selection of location. Thus, in this phase, the ROIs extracted are free from the background information and noises. Figure 3, Figure 4 and Figure 5 show some extracted ROIs containing different classes of abnormality the different type tissues present in mammograms. Figure3. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and benign classes respectively the Fatty tissues Figure4. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and benign classes respectively the Fatty-glandular tissues Figure5. Mammographic ROIs of MIAS database. a, b and c of ROIs represent normal, malignant and benign classes respectively the Dense-glandular tissues Discrete Wavelet Transform One of the multiresolution analysis tools that has been widely used in image processing is wavelet analysis. Originally proposed in the form of Mallat's pyramidal algorithm, an image can be successfully decomposed into detail sub-bands at different level of resolutions. The decomposition was done by filtering the images using pair of low pass (G) and high pass (H) filter, followed by down sampling of factor of 2, first along rows and columns (see Figure 6). This decomposition is known as 2-dimensional (2D) separable discrete wavelet transform (DWT). Our detection method decomposes the original image into sub-bands with lowlow Approximation (LL), low-high vertical (LH), high-low horizontal (HL), and high-high diagonal (HH) components (see Figure 6). In the overall system the LL sub-band is further decomposed into another four sub-bands. Three stages of decomposition are necessary because. The lowest frequency sub-band that is generated is set to zero, since the other sub-bands contain the high frequency information microcalcifications and masses. After this decomposition stage, we obtain then an image that contains only the high frequency information see (S. G. Mallat., 1989; M. Vetterli et al., 1992). Figure6. Filter bank implementation of 2-D wavelet transform Nonsubsampled Contourlet Transformation (MN. Do et all., 2005) proposed the contourlet transform as a directional multiresolution image representation that can efficiently capture and represent smooth object boundaries in natural images. The contourlet transform is constructed as a combination of the Laplacian pyramid (Lu. Yue et al., 2006) and the directional filter banks (DFB) (PJ. Burt et al., 19983). The contourlet transform can efficiently capture the intrinsic geometric structures such as contours in an image and can achieve better expression of image than the wavelet transform. Moreover, it is easily adjustable for detecting fine details in any orientation along curvatures, which results in more potential for effective analysis of images. However, the contourlet transform is lack of shift-invariance due to the down sampling and up sampling, in 2006, Cunha et al. proposed the nonsubsampled contourlet transformation (NSCT) (RH. Bamberger et al.,1992) which is a fully shift-invariant, multiscale, and multidirectional expansion that has better directional frequency localization and a fast implementation. NSCT consists of two filter banks, i.e. the nonsubsampled pyramid filter bank (NSPFB) and the nonsubsampled directional filter bank (NSDFB) as shown in (see Figure 7.a), which split the 2-D frequency plane in the sub-bands illustrated in (see Figure 7.b). The NSPFB provides nonsubsampled multi-scale decomposition and captures the point discontinuities. The NSDFB provides nonsubsampled directional decomposition and links point discontinuities into linear structures. Figure7. Nonsubsampled contourlet transform. (a) NSFB structure that implements the NSCT; (b) Idealized frequency partitioning Gray-Level Co-occurrence Matrix Haralick proposed method the matrix of co-occurrence of gray levels. This approach is to explore the spatial dependency of texture by constructing a co-occurrence matrix in an orientation and a distance between the pixels of the image. Then, upon extraction of information based on the parameter the Haralick is defined as: contrast, entropy, homogeneity of the variance, the variance of the amounts, the variance of the differences, the average sum, correlation, energy, uniformity, entropy sums, entropy differences, the correlation information1, and the correlation. The success of this method depends on the choice parameters: the orientation and the distance between two neighboring pixels. A co-occurrence matrix measures the probability of occurrence of pixel pairs located at a certain distance in the image, it is based on the calculation of the probability P (i, j, d, θ). The angular directions used in the calculation of GLCM are respectively: θ = 0, 45,90,135 degrees with the distance d = 1,2, 3... (see Figure 8) Figure8. Directionality used in the gray level co-occurrence matrix The texture descriptors derived from GLCM are cluster shade, contrast, energy, Homogeneity and Correlation. Energy n n E Pd , (i, j ) 2 (1) i 0 j 0 Homogeneity n n 1 P (i, j ) 2 d , j 0 1 (i j ) H i 0 (2) Contrast n n C (i j ) 2 Pd , (i, j ) (3) i 0 j 0 Correlation n Cor n ijP (i, j ) i 0 j 0 d, i j (4) i j With: i i j i.Pd , (i, j ) (5) j i j j.Pd , (i, j ) (6) i i j i i Pd , (i, j ) (7) j i j i j Pd , (i, j ) (8) 2 2 Classification In the classification phase are use the nearest Neighbor classifier (KNN) (O. Boiman et al., 2008) and Support Vector Machine (SVM) (J. C. Burges et al.,1998; B.Scholkopfand et all., 2002; U. Krebel., 1992). EXPERIMENTAL RESULTS AND ANALYSIS To validate the proposed feature extraction and mammogram classification scheme, simulations have been carried out in the MATLAB environment. For the analysis of the proposed method, mammographic images are taken from database such as Mammographic Image Analysis Society (MIAS) database: http://peipa.essex.ac.uk/ipa/pix/mias/. MIAS databases provide appropriate information based on types of background tissues, and the class of abnormalities present in the mammograms. The class of abnormality consists of normal, abnormal class; the abnormal class is divided in to two sub-classes such as benign and malignant. The MIAS database contains 322 images, which are categorized into three according to tissue types like fatty, fatty-glandular and dense-glandular. Out of 322 images, 207 images are normal, 115 images are abnormal, and again among abnormal images the number of benign and malignant types are 64 and 51 respectively. Each mammographic ROI has been taken of size 256×256 used in the feature extraction phase to find several types of features. Table1. Number of Training and Testing samples Type of tissues Fatty Fatty-glandular and dense tissues Abnormal 24 26 Normal 24 20 Training 31 28 Testing 17 18 In this paper, Nonsubsampled Contourlet Transformation it is easily adjustable for detecting fine details in any orientation as Figure 9, Figure 10 and Figure 11. Furthermore, for GLCM is computed from image result the Nonsubsampled Contourlet Transform with the distance parameter (D) the GLCM. The value of D has been taken 1 and 2. From each GLCM, a total of four feature descriptors such as contrast, correlation, energy and Homogeneity form a feature descriptor matrix. Figure9. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet Transform. a, b and c of ROIs represent normal, malignant and benign classes respectively the Fatty tissues enhancement Figure10. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet Transform. a, b and c of ROIs represent normal, malignant and benign classes respectively the Fatty-glandular tissues enhancement Figure11. Enhancement mammographic ROIs of MIAS database by Nonsubsampled Contourlet Transform. a, b and c of ROIs represent normal, malignant and benign classes respectively the Dense-glandular tissues enhancement Measures for Performance Evaluation To measure Accuracy a medical test, we test some people the presence of disease. Some of these people have the disease, and our test said they are positive. They are called true positives (TP). Some have the disease, but the test says they do not. They are called false negatives (FN). Some do not have the disease, and the test said they do not - true negative (TN). Finally, there might be people in good health who have a positive result - false positives (FP). Thus, the number of true positives, false negatives, true negatives and false positives add up to 100% of the whole is shown in (Table 2). A number of different measures are commonly used to evaluate the performance of the proposed method. These measures including classification accuracy (Ac) (R. Nithya et al., 2011) is calculated from confusion matrix. The confusion matrix describes and predicted classes of the proposed method Ac TP TN TP TN FP FN (9) Table2. Confusion matrix Actual Predicted Positive Negative Positive TP(True Positive) FN(False Negative) Negative FP(False Positive) TN(True Negative) TP- correct classification of abnormal. FP- incorrect classification of abnormal. TN- correct classification of normal. FN- incorrect classification of normal. The result is achieved by integrating the NSCT and GLCM feature as input of SVM with Polynomial, Rbf, Linear and Sigmoid kernel function, followed KNN classifier to choose the best one based on detection rate. According to the (Table 3). For SVM classifier, the results were similar in detection rate for NSCT and GLCM feature which we get 94.12% for kernel function ’rbf’ and ’linear’, but for DWT the value of detection rate decrease of 76%. Table3. Accuracy the NSCT and DWT by SVM for classification with set distance D=1 for Fatty and Dense-glandular tissues Methods NSCT DWT orientation θ =0 θ =45 θ =90 θ =135 θ =0 θ =45 θ =90 θ =135 rbf 0,94 0,94 0,94 0,88 0,64 0,70 0,76 0,76 Fatty tissues linear quadratic 0,88 0,70 0,94 0,70 0,88 0,70 0,88 0,76 0,76 0,76 0,70 0,58 0,64 0,70 0,76 0,58 rbf 0.72 0.66 0.66 0.66 0.61 0.77 0.77 0.77 Dense-glandular tissues linear quadratic 0.6 0.38 0.50 0.50 0.44 0.38 0.61 0.44 0.66 0.61 0.55 0.72 0.66 0.72 0.66 0.66 From the Figure. 12 the best results from all the transforms (NSCT and GLCM) the Fatty tissues are always obtained for θ=0. Figure12. Accuracy the knn classifier of various feature descriptors at =0,45,90 and 135 with set distance d=1 for Fatty tissues From the Figure 13 and Figure 14 the best results from all the transforms (NSCT and GLCM) the Fatty tissues and Dense glandular tissues, θ=0, 45, 90 and 135 are always obtained for d = 1. Figure13. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance d=1, d=2 for Fatty tissues Figure14. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance d=1 for Fatty tissues with DWT and NSCT Figure15. Accuracy the knn classier of various feature descriptors at θ=0,45,90 and 135 with set distance d=1 for Fatty-glandular tissues and Dense-glandular tissues with DWT and NSCT. From the Figure 14 and Figure 15 the best results from all the transforms (NSCT and DWT) the Fatty tissues and Dense glandular tissues, θ=0, 45, 90 and 135 are always obtained for d= 1 and NSCT gived best accuracy CONCLUSION In this paper, we propose an efficient mammogram classification scheme to support the decision of radiologists. The scheme use NSCT, DWT and GLCM in succession to derive feature matrix form mammograms. To validate the efficacy of the suggested scheme, simulation has been carried out using MIAS database. In conclusion, the obtained result demonstrates that NSCT and GLCM with =0 and d=1 gives the best results for Fatty tissues, and for the comparison between NSCT and DWT the results shows that the NSCT gives the good result that dwt for all orientation. An accuracy of 88.89% has been obtained for normal-abnormal in MIAS database by KNN, for SVM with kernel function 'rbf' and 'linear' shows the best accuracy rate 94.12%. REFERENCES Bamberger, RH., & Smith, MJT. (1992). A filter bank for the directional decomposition of images: theory and design. IEEE Trans. Signal Proc. 40(4): 882-893. Beura, S., Majhi, B., & Dash, R. (2015). Mammogram classification using two dimensional discrete wavelet transform and gray-level co-occurrence matrix for detection of breast cancer, Neurocomputing, 154, 1-14. Belkhodja, L., & Benamrane, N. (2009) Approche d'extraction de la région globale d’intérêt et suppression des artefacts radiopaques dans une image mammographiqu. IMAGE'09 Biskra. Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest neighbor based image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (VPR). Burt, PJ., & Adelson, EH. (1983). The Laplacian pyramid as a compact image code. IEEE Trans. Comm. 31: 532-540. Burges, J. C. (1998). A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2), pp. 121-167. Do, MN., & Vetterli, M. (2005). The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans.Image Proc. 14(12). Eltoukhy, M.M., Faye, I., & Samir, B.B. (2010). Breast cancer diagnosis in digital mammogram using multiscale curvelet transform. Comput. Med. Imag.Graphics, 34. Kreβel, U. (1999). Pairwise classification and support vector Machines, In Advances in Kernel Methods: Support Vector Learnings, MIT Press, Cambridge, MA, 255-268. Mallat, S.G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693. Nagi, J., Sameem, A., Nagi, F., & Syed, K. (2011). Automated breast profile segmentation for ROI detection using digital mammograms. IEEE Biomedical Engineering and Sciences. IECBES Nithya, R., & Santhi, B. (2011). Comparative study on feature extraction method for breast cancer classification, 33(2). Scholkopfand, B.A., & Smola, J. (2002). Learning with Kernels, MIT Press. Rahmati, P., & Ayatollahi. (2009). Maximum Likelihood Active Contours Specialized for Mammography Segmentation. Biomedical Engineering and Informatics. BMEI '09. 2nd International Conference. Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Trans. Sig. Process, 40, 2207–2232. Wirth, M., & Nikitenko, D. (2005). Suppression of stripe artifacts in mammograms using weighted median filtering. Springer Link. doi:10.1007/11559573-117. Yue, Lu., & Minh N. (2006). A new contourlet transform with sharp frequency localization. Proc. of IEEE International Conference on Image Processing. Zhang, Yu., Tomuro, N., Furst, J., & Stan Raicu, D. (2010). A Contour-based Mass Segmentation in Mammograms. Scientific Commons.