
XGBoost for IDS on WSN Cyber Attacks with
Imbalanced Data
Aji Gautama Putrada
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
ajigps@telkomuniversity.ac.id
Nur Alamsyah
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
Syafrial Fachri Pane
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
Mohamad Nurkamal Fauzan
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
Abstract—A wireless sensor network (WSN) is also vulnerable
to cyber-attacks, just other systems connected to the computer
network, which makes the intrusion detection system (IDS) for
WSN an interesting research study. However, IDS datasets are
usually associated with imbalanced data because attacks usually
occur in low frequency. This study proposes the application
of XGBoost in IDS on WSN cyber attacks that experience
imbalanced data. We obtained the attack dataset on WSN
from Kaggle, which data on blackhole, grayhole, flooding, and
scheduling attacks. We use decision trees and naive Bayes to
benchmark the performance of our proposed method. Then the
precision, recall, receiver operating curve (ROC), and area under
curve (AUC) value is to evaluate our IDS model. The test results
show that the three classes have moderate imbalanced data, while
one class, the flooding attack class, has severe imbalanced data.
Compared to the two benchmark methods, decision tree and
naive Bayes, XGBoost has the best AUC for scheduling, normal,
grayhole, flooding, and blackhole classes with values of 0.987,
0.9963, 0.9994, 0.9997, and 0.9999 respectively.
Index Terms—intrusion detection system, wireless sensor net-
work, extreme gradient boosting, data imbalance
I. INTRODUCTION
Wireless sensor networks (WSN) is an emerging topic
which, as the name suggests, is a sensor that is spread out and
connected to a computer network to monitor certain values in
its implementation environment [1]. WSN research is about
optimization of network topology [2], optimization of cluster
head selection [3], and optimization of routing [4]. WSN
application areas are around agriculture [5], gas, and fire
detection [6]. Because it connects to the computer network,
WSN is also vulnerable to cyber attacks, so the intrusion
detection system (IDS) for WSN is also a concern [7].
Thank you to the Directorate of Research and Community Service (PPM)
Telkom University for funding this research.
IDS can use several machine learning methods as detection
methods in WSN. Gite et al. [8] implements a decision tree on
WSN to detect blackhole, wormhole, grayhole, and distributed
denial of service (DDoS) attacks with an accuracy of 70%.
Mehmood et al. [9] made an IDS to detect DDoS flooding on
WSN with na¨
ıve Bayes. However, the IDS dataset is usually
associated with imbalanced data because attacks usually occur
in a low-frequency [10].
Several studies use extreme gradient boosting (XGBoost)
for the detection method on imbalanced data [11]. Qiu et
al. [12] applies XGBoost to credit card fraud detection and
shows that XGBoost is better than other methods of detecting
imbalanced data. Applying XGBoost in imbalanced data on
IDS for WSN is a research opportunity.
This study proposes the application of XGBoost in IDS
on WSN cyber attacks that experience imbalanced data. We
obtain the attack dataset on WSN from Kaggle, which contains
data on blackhole, grayhole, flooding, and scheduling attacks.
We use decision trees and naive Bayes to benchmark the
performance of our proposed method. We use precision, recall,
receiver operating curve (ROC), and the area under curve
(AUC) value to evaluate our IDS model.
To the best of our knowledge, there has never been a study
that has applied XGBoost for IDS on WSN cyber attacks that
have imbalanced data. Here are our research contributions:
1) a fast IDS for WSN with an optimized prediction model
2) a novel IDS concept using edge computing
3) model that gives the best results for scheduling attack
detection
The remainder of this paper uses the following writing
systematics: Section II discusses related works. Section III
shows the draft of our proposal. Section IV reports the test
results and discusses the results against state-of-the-art papers.
979-8-3503-9660-7/22/$31.00 ©2022 IEEE 1