Jouini Wassim WSR2010

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220063716
On decision making for dynamic conﬁguration adaptation problem in
cognitive radio equipments: a multi-armed bandit based approach
Article · March 2010
CITATION
READS
1
63
3 authors:
Wassim Jouini
Christophe Moy
École Supérieure d'Electricité
Université de Rennes 1
30 PUBLICATIONS 261 CITATIONS
236 PUBLICATIONS 1,172 CITATIONS
SEE PROFILE
Jacques Palicot
École Supérieure d'Electricité
369 PUBLICATIONS 2,490 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
ESL Application-level Modeling View project
ANR MOPCOM View project
All content following this page was uploaded by Christophe Moy on 26 June 2014.
The user has requested enhancement of the downloaded file.
SEE PROFILE
On decision making for dynamic configuration
adaptation problem in cognitive radio
equipments: a multi-armed bandit based
approach.
Wassim Jouini, Christophe Moy, Jacques Palicot,
SUPELEC/IETR,
France
{wassim.jouini, christophe.moy, jacques.palicot}@supelec.fr
set of parameters to operate with great flexibility and
efficiency (e.g., change the bandwidth of the devices,
switch from one communication protocol to another,
minimize the energy consumption of a device, etc.).
Soon after the emergence of the SDR field, several
scientists have studied ways to control at best these
parameters leading to the emergence of a new research field, named Cognitive Radio [1].
The concept of Cognitive Radio (CR) presents itself as the technology that will have the autonomy and
the cognitive abilities to become aware of its environment as well as of its own operational abilities. The
purpose of this new concept is to meet the user’s expectations, i.e., maximizing his profit without compromising the efficiency of the network. Thus, it
presupposes the capacity to collect information from
its surrounding environment (perception), to digest it
(learning, decision making and predicting problems)
and to act in the best possible way by considering several constraints and the available information. Therefore, it is a new paradigm of wireless communication
whose purpose is to combine Software Defined Radio technologies and Cognitive Abilities in order to
achieve Cognitive Radio equipments.
Sensing [2][3] and reconfiguration [4][5] have been
quite intensively investigated in the community and
are out of the scope of this paper. However, on the
decision making side, only a few methods were suggested by the community and most of them are still
in their infancy. Eventually, the promises of this new
technology are as high as the challenges it sets.
The purpose of this paper is twofold: On the one
hand, we aim at presenting a quick survey on the several decision making challenges the CR community
Abstract— We introduce in this paper the notion of
“design space” as a conceptual object that defines a set
of cognitive radio decision making problems by their
constraints rather than by their degrees of freedom.
We identified, in our analysis work, three dimensions of
constrains: the environment’s, the equipment’s and the
user’s related constrains. Moreover , we define and use
the notion of a priori knowledge, to show that the tackled challenges by the radio community to solve configuration adaptation decision making problems have often the same design space, however they differ by the a
priori knowledge they assume available. Consequently,
we suggest in this paper, the “a priori knowledge” as
a classification criteria to discriminate the main proposed techniques in the literature to solve configuration
adaptation decision making problems. In the rest of the
paper we propose to further study a particular decision making framework where no a priori (or limited)
information is provided to the cognitive radio equipment. An approach based on tools borrowed from the
multi-armed bandit community is discussed. Finally,
our simulation results highlight that by customizing algorithms developed for solving the multi-armed bandit,
efficient engineering solutions to some problems met in
cognitive radio can indeed be built.
Index Terms— Cognitive radio, decision making
problems, dynamic configuration adaptation, multiarmed bandit, Upper Confidence Bound, design space,
a priori knowledge, survey.
I. I NTRODUCTION
Recent hardware advances have offered the possibility to design software solutions to problems which
were requiring in the past hardwired signal processing devices. With this added software layer, equipments based on this technology, referred to as software defined radios (SDR), are able to control a large
1
has been dealing with during this last 10 years, as
well as the main solutions and tools suggested by the
CR literature to deal with these challenges. This survey focuses on CR equipments’ based decision making and learning challenges. On the other hand, we
complete this survey by tackling a particular online
decision making issue where the CR equipment operates in an unknown environment [6].
The outline of the rest of this paper is the following: we start by introducing and defining a conceptual object referred to as design space in Section II.
The main purpose of this object is to suggest that the
cognitive radio design problem is defined by a set of
constrains rather than by its degrees of freedom. We
identified, in our analysis work, three dimensions of
constrains: the environment’s, the equipment’s and
the user’s related constrains. Moreover, in Section
III, we define and use the notion of a priori knowledge, to show that the tackled challenges by the radio
community to solve configuration adaptation decision
making problems have often the same design space,
however they differ by the a priori knowledge they
assume available. Consequently, in section III, we
suggest the “a priori knowledge” as a classification
criteria to discriminate the main proposed techniques
in the literature to solve configuration adaptation decision making problems. In Section IV, we further
detail one particular decision making tool borrowed
from the machine learning community. We suggest to
use it in a cognitive radio context when dealing with
environments where almost no a priori knowledge is
available and where the performance evaluation is uncertain. Section V presents several simulations to validate our approach on an academic dynamic configuration adaptation problem. Finally, Section VI concludes.
Fig. 1. Cognitive radio decision making context.
When designing such CR equipments the main
challenge is to find an appropriate way to correctly
dimension its cognitive abilities according to its environment as well as to its purpose (i.e., providing a
certain service to the user). Several papers in the literature have already been concerned by this matter
however their description of the problem usually remained fuzzy (e.g., [1][8][9]). We summerize their
analysis by defining three “constraints” on which the
design of a CR equipment will depend: First, the
constraints imposed by the surrounding environment,
then the constraints related to the user’s expectations
and finally, the constraints inherent to the equipment.
These constraints help dimensioning the CR decision
making engine. Consequently, an a priori formulation of these elements helps the designer to implement the right tools in order to obtain a flexible and
adequate cognitive radio.
1) The environment constraints: since a cognitive
radio is a wireless device that operates in a surrounding communicating environment, it shall respect its
rules (e.g., allocated frequency bands, tolerated interference,etc.). Thus the behavior of cognitive radio
equipments is highly coordinated by the constraints
imposed by the environment. As a matter of fact, if
the environment allows no degrees of freedom to the
equipments, this latter has no choice but to obey and
thus looses all cognitive behavior. On the other side,
if no constraints are imposed by the environment, the
cognitive radio will still be constrained by its own operational abilities and the expectations of the user.
2) User’s expectations: when using his wireless
device for a particular application (voice communication, data, streaming and so on), the user is expecting a certain quality of service. Depending on
the awaited quality of service, the cognitive radio can
identify several criteria to optimize, such as, minimizing the bit error rate, minimizing energy consump-
II. COGNITIVE RADIO DESIGN SPACE
A. Cognitive radio design related constraints
A Cognitive Radio (CR) equipment can be defined
as a communication system aware of its environment
as well as of its operational abilities and capable of
using them intelligently. Consequently it is assumed
that the device has the ability to collect information
through its sensors and that it can use that information
to adapt itself to its surrounding environment as described in Figure 1. That presupposes cognitive abilities enabling CR equipments to deal with all the collected information in order to make appropriate decisions [1][7].
2
In Figure 2, we represent two sub-spaces referred
to as actual design space and virtual design space.
On the one hand, the virtual design space refers to the
upper bound support of the design space where every dimension is considered independently from the
others. Its volume can be interpreted as the largest
space of decision problems one could define from the
three dimensions. On the other hand, the actual design space is included in the virtual design space. It
results from the reduction of the design space when
taking into account the correlation between the different constraints imposed by every dimension of the design space. For instance, some constraints on the environment such as, “imposed fixed waveform” might
disable some objectives such as “find a waveform that
maximizes the spectral efficiency”.
Fig. 2. Cognitive radio decision making design space.
tion, maximizing spectral efficiency, etc. If the user
is too greedy and imposes too many objectives, the
designing problem to solve might become intractable
because of the constraints imposed by the surrounding environment and the platform of the cognitive radio. However if the user is expecting nothing, then
again there is no need for a flexible cognitive radio.
Usually it is assumed that the user is reasonable in a
sense that he will accept the best he could get with a
minimum cost as long as the quality of service provided is above a certain level.
3) Equipment’s operational abilities: These limitations are perhaps the most obvious since one cannot ask the cognitive radio equipment to adapt itself
more than what it can perform (sense and/or act). It
is usually assumed in the cognitive radio literature
that the equipment is an ideal software defined radio, and thus, that it has all the needed flexibility for
the designed framework. On a real application the
efficiency of cognitive radio equipments depends of
course on the degrees of freedom (or equivalently the
constraints) inherent to the wireless platform used to
communicate. As examples of commonly analyzed
degrees of freedom one can find: modulation, pulse
shape, symbol rate, transmit power, etc.
C. Dynamic Configuration Adaptation-DCA
As an illustrative exemple that we will use for the
rest of the paper, we define the design space of the so
called dynamic configuration adaptation (DCA) problem. Within this framework, we assume that the environment constrains the cognitive radio by allowing
only K possible configurations to use. This condition characterizes the environment and the equipment.
Moreover we assume that there exist M ≥ 1 objectives that evaluates how well the equipment performs
to meet the users expectations.
To conclude, we usually observe in the literature
that these characterizations are implicitly made, then
final assumptions are done to define the decision making framework. These assumptions concern what we
refer to as the “a priori model knowledge”. In the
next section, we introduce and explain the notion of
a priori knowledge and we present a brief state of
the art on decision making for cognitive radio configuration adaptation using the particular DCA design
space. We show that although the design space is the
same, depending on the a priori model knowledge,
different approaches are suggested by the community
to tackle the defined decision making problems.
B. Design space
We denote by cognitive radio design space an abstract three dimensional space that characterizes the
CR decision making engine as shown in Figure 2. It
is indeed abstract since it does not have any rigorous mathematical meaning but it is only used to visually and conceptually illustrate the dependencies of
the CR decision making engine to the ”design dimensions”: environment, parameters (usually referred to
as knobs) and objectives (or criteria defined from the
user’s expectations).
III. DYNAMIC CONFIGURATION ADAPTATION
PROBLEM : CHALLENGES AND SUGGESTED
APPROACHES
The a priori knowledge is a set of assumptions
made by the designer on the amount and representation of the available information to the decision making engine when it first deals with the environment.
As a matter of fact, “knowledge” is defined by the
Oxford English Dictionary as: (i) expertise, and skills
3
approach had a large success especially due to the XG
project (neXt Generation) supported by the DARPA
(e.g. [10] and for spectrum sharing: [11]). As a matter of fact, if the knowledge is well represented and
provided to the equipment as a set of rules, the decision making process becomes very simple. However
this approach has a few drawbacks:
• The behavior of the designed system is not
adapted to a particular user but to all users and
to a set of probable environments. Moreover in
order to acquaint the CR decision making engine
with valuable and large knowledge, an important
amount of effort is needed from the designer.
• Expert knowledge is mainly based on models.
Thus the system might behave in a poor way
when it is facing unexpected dynamic in the environment.
The techniques based on expert systems can, however be supported by several other tools to help them
acquire new knowledge on the environment or help
them avoid conflicts between different configuration
adaptation rules.
acquired by a person through experience or education; the theoretical or practical understanding of a
subject, (ii) what is known in a particular field or
in total; facts and information or (iii) awareness or
familiarity gained by experience of a fact or situation. Consequently, within the cognitive radio framework, we can define the a priori knowledge as the
set of theoretical or practical assumptions provided
by the designer to the CR decision making engine.
These assumptions, if they are accurate, provide the
CR with valuable information on the problem to deal
with. These remarks lead us to suggest that the decision making problems the cognitive radio will have to
deal with are defined by the set {design space, a priori
knowledge}. The more accurate the a priori knowledge is the more efficient the cognitive radio can be.
In the next subsections we briefly describe the different approaches provided by the community depending on the a priori knowledge assumed relevant
to tackle the environment the CR might face during its
life time. In Figure 3 we see a suggestion to classify
these techniques depending on the a prioi knowledge
provided to the cognitive decision making engine.
A. Expert approach
B. Exploration based decision making: Genetic Algorithms
The expert approach relies on the important
amount of knowledge collected by telecommunication engineers and researchers. This knowledge is
based on theoretical consideration and practical measures on the environment and radio communication
parameters. It was first suggested by Mitola in his
Ph.D. dissertation on cognitive radio [1]. Through intensive off-line simulations, expert systems are provided with a set of inference rules. These rules are
then used on-line to adapt the equipment depending
on the context faced by cognitive radio equipments.
Thus the more available knowledge the better the
equipment can adapt itself to its surrounding dynamic
environment. However, this knowledge is usefully as
long as if the cognitive radio can represent its knowledge in a way that enables to exploit it and to react to
the environment by adequate adaptations of its operating configuration.
For that purpose, Mitola suggested representing the
knowledge of cognitive radio equipments using a new
dedicated language radio communication: “Radio
Knowledge Representation Language” (RKRL)[1].
This representation of knowledge uses web semantic
such as XML (eXtensible Markup Language), EDF
(Resource Description Framework) and OWL (Web
Ontology Language). The expert knowledge based
In some contexts, one can consider that there is a
priori knowledge available on the complex relationships existing between, the metrics observed, the parameters to adapt and the criteria to satisfy as described in Figure 4. In this case the problem appears
to be a multi-criteria optimization problem. Within
this framework, the CR decision making engine aims
at finding the best parameters to meet the users expectations by solving a set of equations as shown in Table
II, Figure 4). This problem is known to be complex
for several reasons:
• there exist no universal definition of optimality
in this case. Thus the solution of this problem
are satisfactory (or not) with respect to a certain function, usually named fitness that evaluates how well the criteria were satisfied.
• Thus usually a large space of possible “good”
configurations can be available.
• The criteria are correlated and can be in conflict
(e.g., Figure 4).
If we assume that the previously mentioned off-line
expert rule extraction phase has not been (or partially)
accomplished an exploration of the space of possible
configurations is needed.
This defined cognitive radio decision making
framework was first analyzed by Christian James
4
Fig. 3. Suggested decision making techniques depending on the assumed a priori knowledge.
Fig. 4. Multi-criteria optimization problem [12].
learning tools aim at representing the functional relationship between the environment (through the sensed
metrics), the systems parameters and the criteria to
satisfy, they need a direct interaction with the environment in order to build a posteriori knowledge on
their environment. In this paper we sub classify these
methods depending on the way they learn and exploit their rules. On the on hand (i), we find a set
of techniques that separates exploration and exploitation phases. On the other hand (ii), we find other techniques more flexible that combine both processes.
In the first mentioned case (i) we find several tools
such as Artificial Neural Networks or statistical learning already used and exploited in other domain requiring some cognitive abilities (robotics, video games,
etc.). These methods have two phases: a phase of pure
“exploration” where the CR decision making engine
learns and infers to find (explicitly or implicitly) decision making rules, then uses in a second phase this a
posteriori knowledge to make decision. Since these
learning techniques rely on a first learning phase,
a large amount of data and computational power is
needed in order to extract reliable knowledge. This
difficulty is already known concerning ANN for instance. It is still true for statistical learning. As noticed by Weingart in his paper [15], the provided techniques are still computationally prohibitive, and not
ready yet to be used in a real equipment. However
if the first phase is well achieved the second phase is
usually very simple and doesn’t require much time or
Rieser and Thomas W. Rondeau. They suggested the
use of Genetic Algorithms (GA) to tackle this framework [8][12]. Genetic algorithms were first designed
to mimic Darwin’s evolutionary theory and are well
known for their capacity to adapt themselves to a
changing environment. Their work showed that under this design space and with the described a priori
knowledge, the genetic algorithms provide cognitive
radios with an efficient and flexible decision making
engine.
C. Learning approaches: exploration and exploitation
As we argued in the previous subsections and as
several other authors [13][9] notices, “Many CR proposals, such as [12][13][14], rely on a priori characterization of these performance metrics which are
often derived from analytical models. Unfortunately,
[. . . ], this approach is not always practical due to
e.g., limiting modeling assumption, non-ideal behaviors in real-life scenarios, and poor scalability” [13].
To avoid these limitations and in order to tackle more
realistic scenarios, many methods based on learning techniques were suggested: Artificial Neuronal
Networks (ANN), Evolving connectionist systems
(ECS), statistical learning, regression models and so
on. All of these approaches have their cons and pros,
however they all have in common that they mainly
rely on the real environment to try and infer from it
decision making rules for CR equipments. Since this
5
energy [13].
in the second case (ii), we find promising techniques recently introduced to the community and
still need to be further investigated [9][6]. These
techniques try to provide the CR with a flexible and
incremental learning decision making engine. In the
case of ECS based decision making engine, Colson
suggested the use of an evolving neural network
[16][17]. Unlike the usual ANN, the ECS-NN can
change its structure without “forgetting” already
learned knowledge. Thus new rules can be learned
by adding new neurons to the neural structure. In
order to be efficient the architecture proposed in [9]
needs some expert advice (a priori knowledge) on
the several available configurations. These added
information ranks the different configurations based
on some criteria (robustness, spectral efficiency,
etc.) but without knowing a priori which one is
more adequate when facing a certain environment.
The suggested tools in [6] however assumes that
no a priori knowledge is provided and that the
performance of the equipment can only be estimated
when trying a specific configuration. These tools are
based on the so-called Multi-Armed Bandit (MAB)
framework and will be further detailed in Section IV.
Fig. 5. Slot representation for a radio equipment controlled by
a cognitive decision making engine. A slot is divided into 4 periods. During the first period, the cognitive decision making engine
senses the environment and chooses the next configuration. If the
new configuration is different from the current one, a reconfiguration is carried out during the second period before communicating. If a reconfiguration is not needed, the CR equipment keeps
the current configuration to communicate. At the end of every
slot, the cognitive decision making engine computes a reward that
evaluates its performance during the communication process. It
is assumed here that τ1 + τ2 + τ4 are small with respect to τ3 .
IV. DYNAMIC CONFIGURATION ADAPTATION
PROBLEM
A. General Framework
The general framework tackled in this section is described in Figure 6. A particular case of this problem
has been introduced to the CR community in a previous paper [6]. In this section we extend the framework to a more realistic scenario. Within this framework (as for the previously analyzed one in [6]) the
problem appears as a particular instance of the well
know multi-armed bandit problem.
A multi-armed bandit is a simple machine learning
problem based on an analogy with the traditional slot
machine (one armed bandit) but with more than one
lever. When pulled at a time t = 0, 1, 2, ..., each lever
(or machine) k ∈ {k = 1, ..., K} provides a reward
rt drawn from a distribution θk associated to that specific lever. The objective of the gambler is to maximize the collected reward sum through iterative pulls.
It is classically assumed that the gambler has no initial
knowledge about the levers. However it is important
to understand that many CR applications may provide
some information that shall be used to design better
policies. For the sake of generality, and in order to
cope with the worst situations, we ignore on purpose
some of that information. The crucial tradeoff the
gambler faces at each trial is between “exploitation”
of the lever that has the highest expected payoff and
“exploration” to get more information about the expected payoffs of the other levers. In this paper we assume that the different payoffs drawn from a machine
are independent and identically distributed (i.i.d.) and
that the independence of the rewards holds between
the machines. However the different machines reward
distributions {θ1 , θ2 , ..., θK } are not supposed to be
the same. We invite the reader to refer to the previ-
To conclude on this first part of the paper, we would
like to enhance the fact that the proposed classification in this paper shows that a CR equipment cannot
depend on only one core decision making tool but on
a pool of techniques. Everytime it faces an environment, the equipment needs to have an estimation of
its a priori knowledge and on its reliability. To tackle
a particular context, the general process can be summarized through three questions: What can’t I do (design space)? What do I already know (a priori knowledge)? And what technique should I select to solve
the decision making problem?
In the next section we further detail a particular
case of partial monitoring under uncertainty known
as multi-armed bandit framework. Within this framework we assume that we only have very limited a
priori knowledge on the environment and on the CR
itself, which makes senses within a CR framework.
The purpose of the method suggested in this section in
to offer a balance between exploration and exploitation phase without interrupting the communication
process, i.e., while providing a certain service to the
user.
6
Thus there are two possible solutions: on one the
hand, we could use statistical learning to try and infer a relationship between the performance of one
configuration in one context and the performance of
the same configuration in a different context. These
methods can be efficient; however it is often at the
cost of a large overhead in terms of computation
time and memory. On the other hand we could assume, if possible, that for two “close” contexts (e.g.,
SN R1 = 9 dB and SN R2 = 9.05 dB ) the performance of a configuration doesn’t change much. Then
we can group several contexts and form a cluster.
That would enable us to divide the context into several clusters (e.g., We can represent a large interval of
SNRs by several clusters : [0 20]=[0 1]∪[1 2]. . . [19
20]). And finally address locally the learning problem in every cluster as one MAB problem on a fixed
context. Consequently, we can duplicate the tools already used in the case of one MAB problem to deal
with the case where we have several MAB problems,
one in every cluster. Within this framework, every
cluster shall have its own learning algorithm to estimate, on average depending on the cluster size, the
best configuration.
In this paper we prefer the latter approach that is
very intuitive and doesn’t cost a lot of the already limited computational resources in a CR equipment. The
learning tools used are the same already presented in
[6]. In order to make this paper as self-sufficient as
possible, we present the main ideas of the so-called
Upper Confidence Bound (UCB) indexes in the next
paragraph.
At every instant t, an upper confidence bound index
is computed for every machine k. This upper confidence bound index, denoted by Bk,t,Tk (t) , is computed from the gathered information it until the slot
number t and gives an optimistic estimation of the expected reward of machine k.
Let Bk,t,Tk (t) denote the index of the policies we
are dealing with:
1-CR equipment:
• K possible configurations Ck , k ∈ {k =
1, ..., K}, verifying the operational constraints
but with unknown performances.
• A cognitive decision making engine: can learn
and make decisions to help the CR equipment to
improve its behavior.
2-Time representation:
• Time divided into slots t = 0, 1, 2, ... (Figure 5)
• At the beginning of every slot t, the cognitive decision making engine decides to reconfigure or
not the CR equipment.
3-Environment and performance evaluation:
• Typical observations: SNR, BER, network load,
throughput, spectrum bands, etc.
• A numerical signal is computed at the end of
every slot t and informs the cognitive decision
making engine of the performance of the CR
equipment. The numerical signal obtained when
using configuration Ck is a function of the observations and the configurations.
• The numerical results computed with a configuration Ck are assumed to be i.i.d. and drawn
from an unknown stochastic distribution θk .
Fig. 6. Description of the Dynamic Configuration Adaptation
problem.
ously mentioned paper [6] for more details about the
equivalence between this CR decision making problem and the MAB framework.
In the case of CR problems, these distributions θk
depend on external parameters that the environment
reveals at the beginning of every slot (for instance the
SNR in Section V). Thus the dynamic of the problem
is the following: first, the equipment senses the context of the environment (e.g., the current SNR), then
depending on the outcome of this sensing, chooses a
configuration to try. At the end of the transmitted slot,
the CR can compute a signal that evaluates its performances during that specific slot. Finally, the CR
decision making engine takes into account the new
collected information to update its configuration selection policy.
Bk,t,Tk (t) = X k,Tk (t) + Ak,t,Tk (t)
(1)
where X k,Tk (t) is the sample mean of the machine
k after been played Tk (t) times at the step t, and
Ak,t,Tk (t) is an upper confidence bias added to the
sample mean.
A policy π computes from it these indexes from
which it deduces an action at as follows:
B. Suggested approach
The tackled framework in [6] corresponds to the
herein described problem with a fixed context (e.g.,
fixed SNR value for all slots). However when the
context changes we cannot assume a priori that the
acquired knowledge is still valid in the new context.
at = π(it ) = arg max(Bk,t,Tk (t) )
k
7
(2)
Parameters: K, exploration coefficient α
Input: it
Output: at
Algorithm:
If: t ≤ K return at = t + 1
Else:
Pt−1
• Tk (t) ←
1
,1 ∀k
m=0
q {Im =k}
α. ln(t)
• Ak,t,Tk (t) ←
Tk (t) , ∀k
Pt−1
•
•
rm .1
Bk,t,Tk (t) ← m=0 Tk (t){Im =k} + Ak,t,Tk (t) , ∀k
return at = arg max(Bk,t,Tk (t) )
k
Fig. 7. A tabular version of the U CB1 algorithm for selecting
the next configuration at .
Fig. 8. Performance of the different configurations depending
on the SNR.
We describe hereafter two specific upper confidence
biases Ak,t,Tk (t) that will be used in our simulations.
Assuming that the rewards are upper bounded by a
positive real b > 0 we find:
1) U CB1 : is defined by Ak,t,Tk (t) such that [18]:
s
b2 .α. ln(t)
Ak,t,Tk (t) =
(3)
Tk (t)
play always this best candidate. However since
we usually do not have this optimal division of the
context space, we suffer a second loss due to the gap
existing between the optimal division of the context
space and the actual division of the context space.
For our application in a cognitive radio context, we
adapt the expression of the regret and suggest the
form in Equation (5). Let Im denote the selected
configuration at the slot number m then:
2) U CBV : is defined by Ak,t,Tk (t) such that [19]:
s
2ξ.Vk (t). ln(t) 3.c.b.ξ. ln(t)
+
(4)
Ak,t,Tk (t) =
Tk (t)
Tk (t)
The regret of a policy π ∈ Π at time t (after t decisions) is defined as follows:
E[Rtπ ]
=
t−1
X
(µ∗ (SN Rm ) − µIm (SN Rm ))
(5)
m=0
where Vk refers to the empirical variance of the configuration k in the particular cluster considered.
Finally both of these indexes are very simple to
computes as we can see it in Figure 7 where a tabular
version of the UCB algorithm is proposes in the case
of the U CB1 index. The case of the U CBV index is
strictly equivalent.
where µk (SN Rm ) is the expected performance of
the configuration k, at the slot number m under a context SN Rm . And µ∗ (SN Rm ) = max{µk (SN Rm )}
k
In the next section, we exploit the herein described
algorithms within the DCA problem, implement the
proposed approach and discuss the parameters. Then
through several simulations, we show that the general
implemented has empirically a logarithmic regret.
C. Performance Evaluation.
To evaluate the performance of these policies,
it is convenient to use the notion of “regret”. The
general idea behind the “regret” can be summarized
as follows: if the gambler knew a priori which one
was the best arm, he would only pull that one, and
hence, maximize the expectancy of the collected
rewards. However, since he lacks that essential
information he will suffer unavoidable loss due to
suboptimal pulls. In a similar way, if it is possible
to find an adequate clustering of the environment’s
context such that for every cluster there is one and
only one “best candidate”, then the gambler could
V. S IMULATIONS
A. Experimental protocol
For the simulations, we used 5 different configurations denoted by {C1 , C2 , . . . , C5 }. The curves that
appear in Figure 8 represent the throughputs TCk of
the different configurations Ck , k ∈ {1, 2, . . . , 5},
as a function of the SNR. Their expressions are inspired from real radio communication problems however, for the sake of generality; we only use them as
tools for the simulations.
8
Usually, the radio equipments are dimensioned to
provide a service within a certain interval of SNRs.
This leads to a worst case analysis. Thus if we expect the designed system to provide the user with the
highest throughput for a low SNR (around 6 dB in
this case), then C1 would be the chosen configuration. However, in a Cognitive Radio context, we aim
at finding a way to “jump” from a curve to another
depending on the SNR, in order to stay on the curve
that maximizes the performance of the equipment for
all SN Rs.
454
lets define j = [454 940 454
2
2 940] where j(1)
is a parameter associated to C1 , j(2) to the configuration C2 and so on. As for j, let M = [4 8 8 16 16].
And Let np = 20 then the performance criterion used
(i.e., throughput in our case) has the following form:
Fig. 9. Average cumulated regret when using UCB algorithms
to tackle DCA problems..
B. Results
j(k).log2 (M )
TCk (SN R) =
[1−
j(k) + np
s
1
3.SN R.log2 (M (k))
(1 − p
).erf c(
)]
2.(M (k) − 1)
(M (k))
Figure 9 shows the evolution of the average cumulated regret for the different UCB policies. For
the two policies, the cumulated regret first increases
rather rapidly with the slot number and then more and
more slowly. This shows that the CR decision making
engines based on UCB policies are able to process the
past information in an appropriate way even though
such that configurations leading to high rewards are
favored with time. Moreover by choosing a cluster
size small enough to have a good local approximation
on the configuration performances, yet not too small
(otherwise the algorithm would spend most of its time
exploring), we see that the proposed algorithms have
logarithmic regrets which is known to be order optimal for the classic MAB problem. Figure 10 show
that although the designed system is facing a randomly changing environment with a high variance, it
manages to learn the different optimal configurations
depending on the context. In both figures, U CBV
has a more satisfactory behavior than U CB1 . It is
probably due to the fact that U CBV takes advantage
of the variance to orient its learning behavior. However, several questions still remain regarding the dependency of the designed algorithm to its parameters
such as the clusters size. As a matter of fact, the system needs to adapt its clusters to optimize its behavior. Moreover, it doesn’t exploit the sparse information on the environment by communicating with the
other clusters. All these questions and several others
are currently under investigation. The results of these
investigations will be suggested to the community in
a future work.
During the rest of this paper, we consider that the estimations of the throughput received by the CR decision making engine are drawn from Bernoulli distributions θk such that for all SNR we verify
TCk (SN R) = E[θk (SN R)]
(6)
Moreover we consider that the DCA problem exist
only for a bounded SN R ∈ [SN Rmin SN Rmax ] and
that the SNR follows is a random variable. In this case
the variable SN RdB = 10.log10 (SN R) is assumed
to follow a Gaussian distribution with mean 10 dB
and standard deviation of 4 dB, SN RdBmin = 6 dB
and SN RdBmax = 14 dB. The SN R interval was
divided into 24 equal clusters in order to have a good
learning resolution.
The parameters used for the UCB algorithms were
chosen to make sure that these algorithm have logarithmic regret. As a matter of fact, theoretical analysis are provided in [18][19] where it is shown that
parameters α ≤ 1 (case of U CB1 ) and {ξ ≤ 1 and/or
c ≤ 1/3}(case of U CBV ) are risky and could lead
to a bad learning behavior of the algorithm. Thus we
implemented our work using the critical values α = 1
and {ξ =1 and/or c = 1/3}. Moreover we chose
b = 4 as an upper bound of the possible rewards (cf.
Figure 8). As a matter of fact in this case b is larger
than any possible outcome of the transmission process.
9
[3] C. Moy, A. Bisiaux, and S. Paquelet. An ultra-wide band
umbilical cord for cognitive radio systems. PIMRC’05,
Berlin, Septembre 2005.
[4] A. Kountouris and C. Moy. Reconfiguration in software
radio systems. Second Karlsruhe Workshop on Software
Radios, Karlshruhe, Germany, 20-21, March 2002.
[5] J.P. Delahaye, P. Leray, C. Moy, and J. Palicot. Anaging
Dynamic Partial Reconfiguration on Heterogeneous SDR
Platforms. SDR Forum Technical Conference05, Anaheim
(USA), November 2005.
[6] W. Jouini, D. Ernst, C. Moy, and J. Palicot. Multi-armed
bandit based policies for cognitive radio’s decision making
issues. In Proceedings of the 3rd international conference
on Signals, Circuits and Systems (SCS), November 2009.
[7] S. Haykin. Cognitive radio: brain-empowered wireless
communications. IEEE Journal on Selected Areas in Communications, 23, no. 2:201–220, Feb 2005.
[8] C.J. Rieser. Biologically Inspired Cognitive Radio Engine Model Utilizing Distributed Genetic Algorithms for Secure and Robust Wireless Communications and Networking.
PhD thesis, Virginia Tech, 2004.
[9] N. Colson, A. Kountouris, A. Wautier, and L. Husson. Cognitive decision making process supervising the radio dynamic reconfiguration. In Proceedings of Cognitive Radio
Oriented Wireless Networks and Communications, page 7,
2008.
[10] DARPA XG Working Group. The XG vision. request for
comments. BBN Technologies, Cambridge MA, USA, Tech.
Rep. Version 2.0, January 2004.
[11] L. Berlemann, S. Mangold, and B. H. Walke. Policy-based
reasoning for spectrum sharing in radio networks. In Proceedings of IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN),
Baltimore, MD, USA, November 2005.
[12] T. W. Rondeau, D. Maldonado, D. Scaperoth, and C.W.
Bostian. Cognitive radio formulation and implementation.
IEEE Proceedings CROWNCOM, Mykonos, Greece, 2006.
[13] N. Baldo and M. Zorzi. Fuzzy logic for cross-layer
optimization in cognitive radio networks. IEEE Consumer Communications and Networking Conference, January 2007.
[14] Charles Clancy, Joe Hecker, and Erich Stuntebeck. Applications of machine learning to cognitive radio networks. IEEE
Wireless Communications Magazine, 14, 2007.
[15] T. Weingart, D. Sicker, and D. Grunwald. A statistical
method for reconfiguration of cognitive radios. IEEE Wireless Commun. Mag.,vol. 14, no. 4, pp. 3440, August 2007.
[16] N. Kasabov. ECOS : Evolving connectionist systems and
the eco learning paradigm. International Conference on
Neural Information Processing, Kitakyushu, Japan, Oct.
1998.
[17] N. Kasabov. Evolving connectionist systems. the knowledge engineering approach. 2nd ed. New York : Springer,
2007.
[18] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of multi-armed bandit problems. Machine learning,
47(2/3):235–256, 2002.
[19] J.-Y. Audibert, R. Munos, and C. Szepesvri. Tuning bandit
algorithms in stochastic environments. In Proceedings of
the 18th international conference on Algorithmic Learning
Theory, 2007.
Fig. 10. Percentage of times a UCB-based policy selects the
optimal configuration.
VI. C ONCLUSIONS
In this paper, we presented a quick yet original
state of the art on the different configuration adaptation challenges faced by the cognitive radio decision making community. We suggested that most of
these challenges have the same constraints however
they differ by the a priori knowledge they assume
available. Consequently, we suggested the “a priori
knowledge” as a classification criteria to discriminate
the main proposed techniques in the literature to solve
configuration adaptation decision making problems.
Moreover we tackled the configuration adaptation decision making problem when no a priori (or very limited) information is provided to the CR equipment.
We argued that this problem is a particular instance
of the well known multi-armed bandit paradigm and
can be efficiently addressed through UCB algorithms.
Our simulation results have highlighted that by customizing algorithms developed for solving the multiarmed bandit, efficient engineering solutions to some
problems met in cognitive radio can indeed be built.
ACKNOWLEDGMENTS
This work was supported by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM++
(contract n. 216715).
R EFERENCES
[1] J. Mitola. Cognitive radio: An integrated agent architecture for software defined radio. PhD Thesis, Royal Inst. of
Technology (KTH), 2000.
[2] R. Hachemani, J. Palicot, and C. Moy. A new standard recognition sensor for cognitive radio terminal. EUSIPCO’07, Poznan, Pologne, 3-7 septembre 2007.
10
View publication stats