applsci-15-11689

Academic Editor: Christos Bouras

Received: 3 October 2025

Revised: 28 October 2025

Accepted: 29 October 2025

Published: 31 October 2025

Citation: Ali, M.D.; Iqbal, M.A.; Lee,

S.; Duan, X.; Kim, S.K. Explainable AI

Based Multi Class Skin Cancer

Detection Enhanced by Meta Learning

with Generative DDPM Data

Augmentation. Appl. Sci. 2025,15,

11689. https://doi.org/10.3390/

app152111689

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license

(https://creativecommons.org/

licenses/by/4.0/).

Article

Explainable AI Based Multi Class Skin Cancer Detection

Enhanced by Meta Learning with Generative DDPM

Data Augmentation

Muhammad Danish Ali 1, Muhammad Ali Iqbal 2, Sejong Lee 3, Xiaoyun Duan 4and Soo Kyun Kim 2,*

1Department of Electronic Engineering, Jeju National University, Jeju 63243, Republic of Korea;

[email protected]

2Department of Computer Engineering, Jeju National University, Jeju 63243, Republic of Korea;

[email protected]

3School of Computer Science and Engineering Yeungnam University, 280 Daehak-ro,

Gyeongsan 38541, Republic of Korea; [email protected]

4School of Software, Anyang Normal University, Anyang 455002, China; [email protected]

*Correspondence: [email protected]

Abstract

Despite the widespread success of convolutional deep learning frameworks in computer

vision, signiﬁcant limitations persist in medical image analysis. These include low image

quality caused by noise and artifacts, limited data availability compromising robustness

on unseen data, class imbalance leading to biased predictions, and insufﬁcient feature

representation, as conventional CNNs often fail to capture subtle patterns and complex

dependencies. To address these challenges, we propose DAME (Diffusion-Augmented

Meta-Learning Ensemble), a uniﬁed architecture that integrates hybrid modeling with

generative learning using the Denoising Diffusion Probabilistic Model (DDPM). The DDPM

component improves resolution, augments scarce data, and mitigates class imbalance. A

hybrid backbone combining CNN, Vision Transformer (ViT), and CBAM captures both local

dependencies and long-range spatial relationships, while CBAM further enhances feature

representation by adaptively emphasizing informative regions. Predictions from multiple

hybrids are aggregated, and a logistic regression meta classiﬁer learns from these outputs

to produce robust decisions. The framework is evaluated on the HAM10000 dataset, a

benchmark for multi-class skin cancer classiﬁcation. Explainable AI is incorporated through

Grad CAM, providing visual insights into the decision-making process. This synergy

mitigates CNN limitations and demonstrates superior generalizability, achieving 98.6%

accuracy, 0.986 precision, 0.986 recall, and a 0.986 F1-score, signiﬁcantly outperforming

existing approaches. Overall, the proposed framework enables accurate, interpretable, and

reliable medical image diagnosis through the joint optimization of contextual modeling,

feature discrimination, and data generation.

Keywords: skin cancer; convolutional neural networks (CNN); deep learning; meta learning;

Convolutional Block Attention Module (CBAM); Data Augmentation with Diffusion

Models (DDPMs)

1. Introduction

Skin cancer is one of the most common and aggressive cancers worldwide, leading to

signiﬁcant health deterioration or even loss of life. In the United States alone, it is estimated

that over 9500 individuals are diagnosed with skin cancer every day, while more than

Appl. Sci. 2025,15, 11689 https://doi.org/10.3390/app152111689

Appl. Sci. 2025,15, 11689 2 of 34

two individuals lose their lives due to this disease [

1

,

2

]. Unfortunately, skin cancer is

not limited to developed nations, as recent research from Asian countries also reveals its

growing incidence and severity as a public health and clinical concern. According to the

World Health Organization, reported skin cancer cases result in approximately 853 deaths

per year. India faces a similar challenge, with an estimated 1.5 million new instances

identiﬁed annually. In China, there has been a signiﬁcant rise in various types of skin

cancer, particularly in urban regions. Overall, among all types of cancer affecting Asian

countries, skin cancer accounts for approximately 2 to 4 percent of cases, highlighting the

signiﬁcant burden of this disease in the region [

3

–

5

]. Skin cancer is generally classiﬁed into

several categories, as illustrated in Figure 1.

Skin cancer includes a wide range of malignant pathologies, including dermatoﬁbro-

mas, melanoma, vascular lesions, actinic keratosis, basal cell carcinomas, melanocytic nevi,

and benign keratoses. Identifying and preventing these types of skin cancer at an early

stage is critical for preserving life. Most people often face challenges in scheduling regular

check-ups due to a lack of availability, limited access to healthcare, and individual circum-

stances. Moreover, the initial undervaluation of skin irregularities can lead to advancement

into critical, life-threatening stages [6–8].

However, the diagnosis of skin cancer continues to be an essential but challenging task.

The advancement of computer-aided techniques for diagnosing skin lesions has become

a top priority in recent research. The ABCD rule, which focuses on asymmetry, irregular

borders, distinctive color, and dermatological features, is one of the most frequently em-

ployed approaches. Dermatologists widely use the ABCD rule to diagnose skin cancer.

Nevertheless, it may be challenging to differentiate between malignant and non-cancerous

images due to factors such as noise, poor contrast, and uneven boundaries.

Figure 1. Seven types of skin cancer are included in the HAM10000 dataset.

Appl. Sci. 2025,15, 11689 3 of 34

1.1. Role of Machine Learning and Deep Learning in the Diagnosis of Skin Cancer

Accurate diagnosis of skin cancer is a crucial area of research, and the potential

of machine learning, in particular, offers potential for signiﬁcant improvements [

9

–

11

].

The key to successful treatment and increased chances of survival lies in early detection.

While conventional diagnostic approaches have long been the standard, the introduction

of advanced technologies such as deep learning and transfer learning has opened up

new opportunities. These innovative techniques are enhancing both the accuracy and

speed of skin cancer diagnosis, representing a signiﬁcant advancement in medical research

and a beacon of hope for the future. Artiﬁcial neural networks are used in deep learn-

ing, a type of machine learning, to identify feature patterns speciﬁc to different kinds of

skin lesions [12–14].

For instance, a neural network can be trained on a large dataset of skin cancer im-

ages. Then, when presented with a new image, it can quickly and accurately identify

potential cancerous lesions. Skin cancer can be diagnosed using convolutional neural

networks (CNNs), a particular type of neural network that has demonstrated remarkable

performance in image-based diagnosis [

15

,

16

]. CNNs can speed up the diagnostic process

and improve accuracy by identifying essential features such as texture, color, and pattern.

Additionally, CNNs can be enhanced to better extract relevant information in medical

images by incorporating attention mechanisms [17].

Although there have been advancements in deep neural network architectures and

attention mechanisms, skin cancer diagnosis remains a challenging task due to inter-

class and intra-class variations. Due to factors such as growth stage, patient demograph-

ics, or environmental conditions, lesions of the same type, such as melanoma, can dif-

fer signiﬁcantly in size, color, shape, and texture. This variation makes it difﬁcult for

models to generalize across cases within the same class. Another challenge is inter-

class similarity, since benign and malignant lesions may be visually similar and it is

hard to distinguish between the two with the help of deep learning models as well as

medical professionals [18,19].

It is this similarity that results in a higher chance of misclassiﬁcation, especially in

borderline cases. Class imbalance is another problem with medical imaging. In the skin

cancer datasets, some of the classes are highly represented as compared to others. This also

makes it harder to train the models and to generalize, as most available datasets are small

and do not represent any varieties of skin or lesion manifestation [20,21].

Noise is another source of error that can obscure important lesion features and lower

diagnostic accuracy in dermoscopy images and include hair and lighting differences,

and imaging artifacts, which contribute to further complexity. Moreover, more sophisticated

models that may use transformers or attention mechanisms are more accurate, but are

computationally complex and therefore not applicable in real-time clinical use, especially

with resource-constrained environments. Lastly, when trained on small or imbalanced

datasets, deep learning models are prone to overﬁtting, resulting in poor generalization to

new and unseen cases.

To address these challenges, we proposed the DAME framework (Diffusion-Augmented

Meta-Learning Ensemble) as shown in Figure 2that integrates the local feature extraction

capabilities of ResNet50 and VGG19 with the global context modeling of vision transformers

for explainable medical image classiﬁcation.

Appl. Sci. 2025,15, 11689 4 of 34

Figure 2. The proposed innovative Diffusion-Augmented Meta-Learning Ensemble Framework.

1.2. Research Contribution

The main contribution of this research is as follows:

1:

Our research proposes the DAME (Diffusion-Augmented Meta-Learning Ensemble)

framework, a uniﬁed multi-architecture deep learning model that synergistically

combines convolutional backbones (ResNet50 and VGG19) with a Vision Transformer

(ViT) module.

2:

To enhance local and global feature representation, the architecture incorporates the

Convolutional Block Attention Module (CBAM), which adaptively reﬁnes spatial and

channel-wise information.

3:

The proposed model is enhanced through the integration of generative modeling,

speciﬁcally the Denoising Diffusion Probabilistic Model (DDPM), which facilitates

robust feature learning under data scarcity, class imbalance, and noise.

4:

We incorporated a meta classiﬁer trained on hybrid model predictions to reﬁne the

decision boundary further, enabling accurate and generalizable detection of skin

cancer metastases.

5:

This research introduces a novel approach to enhance black-box model explain-

ability by applying Grad CAM to visualize and highlight regions that impact clas-

siﬁcation outcomes. The organization of the paper in the subsequent sections is

as follows:

Appl. Sci. 2025,15, 11689 5 of 34

Section 2reviews related work, Section 3covers problem formulation and the research

objective, and Section 4covers the proposed methodology. Section 5covers the experimental

setting, Section 6covers the results and evaluation, and Section 7covers the discussion.

Section 8covers the limitations and future work, and Section 9covers the conclusions.

2. Related Work

With the widespread application of complex neural network technologies across

biomedical domains, healthcare imaging analysis has emerged as a fundamental technique

for supporting clinical decision-making. It has also become a primary research focus at the

intersection of visual computing and healthcare intelligence. Nevertheless, the complex

multidimensional characteristics of clinical images, insufﬁcient data availability, and dif-

ﬁculties in annotation continue to pose persistent challenges to training efﬁciency and

model generalizability. To address these issues, scholars have thoroughly investigated data

augmentation strategies, representation learning techniques, and the development of novel

classiﬁcation frameworks. Table 1provides a summary of recent studies in skin cancer

classiﬁcation and the identiﬁed research gap

2.1. Medical Image Feature Extraction and Classiﬁcation

Huang et al. [

22

–

24

] studied the application of multispectral imaging technology for

the identiﬁcation and classiﬁcation of skin cancer. They speciﬁcally focused on seborrheic

keratosis (SK), squamous cell carcinoma (SCC), and basal cell carcinoma (BCC). The experi-

mental observation of the HIS-based system demonstrates a performance enhancement of

7.5% over the conventional RGB-based method. This enhancement is primarily attributed

to the increased dataset size employed for training convolutional neural networks; given

the computational demands of image processing tasks, larger and heterogeneous datasets

are needed to ensure that the CNNs are developed and tested as a whole. There is an-

other important detail that can be improved upon, which is the size of the dataset used

to train convolutional neural networks. This development can contribute greatly to the

result. Future research should therefore emphasize dataset augmentation and precision

enhancement before extending to other architectural aspects.

Yang et al. [

25

] proposed a multipurpose convolutional neural network model of multi-

class categorization of seven types of skin lesions. Despite their promises, the results of the

segmentation are not good regarding their relevance to the work. Moreover, classiﬁcation

accuracy varied across lesion categories, with only two classes achieving satisfactory

predictive performance. The validation dataset comprises 7.5% of the total data, which

remains reliable as representative samples were ensured during the stratiﬁed splitting

process. Even though the proposed multipurpose deep neural network was promising in

terms of binary cancer detection and lesion segmentation, it is not suitable in complex multi-

class classiﬁcation. The challenges mentioned above are the reasons why the increased

research and developments in the ﬁeld of skin cancer classiﬁcation are important.

Priyadharshini et al. [

26

,

27

] introduced an Extreme Learning framework using the

Teaching–Learning-Based Optimization (TLBO) approach. The ELM functions as an efﬁ-

cient and precise one-hidden-layer, unidirectional neural network to extract texture features

for skin cancer categorization. Simultaneously, the TLBO algorithm enhances model param-

eters to improve performance. This combination aims to categorize skin lesions as benign

or malignant.

In Abhiram et al. [

28

], an image classiﬁcation framework named Deskinned was pro-

posed for the identiﬁcation of skin lesions. Their model was optimized and assessed using

the HAM10000 dataset, and its results were compared against three widely recognized

pre-trained frameworks: Inception V3, VGG16, and AlexNet. With a substantially greater

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

applsci-15-11689

Documents connexes

Faire une suggestion

Produits

Assistance

Produits

Assistance

applsci-15-11689

Documents connexes

Faire une suggestion

Produits

Assistance

Ajouter ce document à la (aux) collections

Ajouter ce document à enregistré

Suggérez-nous comment améliorer StudyLib