CEP16 3tanks

Telechargé par abouda saif
Optimal learning control of oxygen saturation using
a policy iteration algorithm and a proof-of-concept
in an interconnecting three-tank system
Anake Pomprapa
n
, Steffen Leonhardt, Berno J.E. Misgeld
Philips Chair for Medical Information Technology, RWTH Aachen University, Aachen, Germany
article info
Article history:
Received 30 November 2015
Received in revised form
18 June 2016
Accepted 25 July 2016
Keywords:
Policy iteration algorithm
Optimal control
Reinforcement learning
Control of oxygen saturation
Biomedical control system
Closed-loop ventilation
abstract
In this work, policy iteration algorithm(PIA) is applied for controlling arterial oxygen saturation that
does not require mathematical models of the plant. This technique is based on nonlinear optimal control
to solve the HamiltonJacobiBellman equation. The controller is synthesized using a state feedback
conguration based on an unidentied model of complex pathophysiology of pulmonary system in order
to control gas exchange in ventilated patients, as under some circumstances (like emergency situations),
there may not be a proper and individualized model for designing and tuning controllers available in
time. The simulation results demonstrate the optimal control of oxygenation based on the proposed PIA
by iteratively evaluating the Hamiltonian cost functions and synthesizing the control actions until
achieving the converged optimal criteria. Furthermore, as a practical example, we examined the per-
formance of this control strategy using an interconnecting three-tank system as a real nonlinear system.
&2016 Elsevier Ltd. All rights reserved.
1. Introduction
Hypoxia, or oxygen deciency, is a common consequence of
respiratory insufciency. If a patient with hypoxia is not properly
treated on time, the prolonged state of impaired oxygenation can
lead to potentially lethal conditions, such as cerebral hypoxia,
cardiac malfunction, or multiple organ failure. The most effective
therapy is to supply the patient with an increased oxygen fraction
in the ventilated air, which is referred to as oxygen therapy (Tarpy
& Celli, 1995). A proper control output of this therapy is the re-
setting oxygen content in the body, including peripheral oxygen
saturation (SpO
2
), arterial oxygen saturation (SaO
2
) or partial
pressure arterial oxygen (PaO
2
), can be used in a negative feedback
conguration. A single-input single-output (SISO) system can be
formulated and a closed-loop control of oxygen therapy can be
realized in clinical scenarios, which are unique, complex, and life-
threatening.
To our knowledge, closed-loop ventilation for oxygen therapy
developed gradually (Brunner, 2002) and was dependent on pro-
gress in control engineering. Early publications date back to 1975:
this paper describes the control of SpO
2
(positioned at the ear)
with a bangbangcontrol of a controlling input fraction of in-
spired oxygen (FiO
2
) and the results of limit cycles achieved in
anesthetized dogs (Mitamura, Mikami, & Yamamoto, 1975). In
1985, a linear quadratic regulator (LQR) was shown for the dual
control of oxygen and carbon dioxide (Giard, Bertrand, Robert, &
Pernier, 1985). In 1987, a multiple-model adaptive control (MMAC)
was applied for the control of SpO
2
in mongrel dogs and the re-
sults were compared with a proportional integral (PI) controller
(Yu et al., 1987). In 1991, the multivariable inputs of FiO
2
and po-
sitive end-expiratory pressure (PEEP) were applied to control PaO
2
using a proportionalintegralderivative (PID) controller in four
mongrel dogs (East, Tolle, MCJames, Farrell, & Brunner, 1991). In
2004, a non-linear adaptive neuro-fuzzy inference system (ANFIS)
model was used to estimate shunt in combination with dynamic
changes of blood gases for controlling PaO
2
(Kwok, Linkens,
Mahfouf, & Mills, 2004). Further simulation of septic patients was
carried out based on a hybrid knowledge/model-based advisory
control. Recently, feedback-oriented oxygen therapy was pre-
sented in preterm infants, where the controlled variable was SpO
2
(Claure & Bancalari, 2009). In addition, based on the works in our
research group, a proportionalintegral (PI) controller with gain
scheduling (Walter et al., 2009), a Smith predictor with an internal
PI controller (Lüpschen, Zhu, & Leonhardt, 2009), a knowledge-
based controller (Pomprapa, Misgeld, Lachmann, & Leonhardt,
2013), and, potentially, a self-tuning adaptive controller (Pom-
prapa, Pikkemaat, Lüpschen, Lachmann, & Leonhardt, 2010) and a
funnel controller (Pomprapa, Alfocea, Göbel, Misgeld, & Leonhardt,
2014) can be used for this particular control problem. Moreover, in
2014, SpO
2
between 92% and 94% was targeted based on the
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/conengprac
Control Engineering Practice
http://dx.doi.org/10.1016/j.conengprac.2016.07.014
0967-0661/&2016 Elsevier Ltd. All rights reserved.
n
Corresponding author.
E-mail address: [email protected] (A. Pomprapa).
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
Control Engineering Practice (∎∎∎∎)∎∎∎∎∎∎
automatic adjustment of FiO
2
using a step change approach (Saihi
et al., 2014). Based on a literature search, three different categories
were found for the control system design and their advantages and
disadvantages are given below.
A model-based approach is a traditional procedural design in
the control community. Generally, it requires a mathematical de-
scription of the system behavior. The difculty of this approach
lies in the selection of an appropriate model structure for this
particular problem and has to deal with model uncertainties based
on varying conditions of the system. Therefore, self-tuning adap-
tive control was introduced for this control problem (Pomprapa
et al., 2010). With this model-based approach, system performance
and further corrective actions can be preanalyzed.
A model-free approach requires neither prior knowledge of the
system nor system identication, which is a major advantage.
Therefore, the time-consuming process for determining an in-
dividual model can be signicantly reduced, which is certainly
benecial for a patient with severe hypoxia. In fact, the patient
requires an immediate therapeutic action. Hence, this approach is
benecial in the real clinical scenarios. Some of the proposed
control techniques were, for example, a knowledge-based con-
troller (Pomprapa, Misgeld, Lachmann et al., 2013) or a funnel
controller (Pomprapa, Alfocea, et al., 2014). Nevertheless, the
drawback is the requirement for experience to reach a good
performance.
An intelligent hybrid approach of combined model-based and
model-free methods (Kwok et al., 2004) requires a knowledge and
model-based advisory system for intensive care ventilation. The
strength and the weakness have been mentioned for both model-
based approach and model-free approach. The design of this ap-
proach was based on neuro-fuzzy system (ANFIS), which should
closely resemble to mental operation. However, neither an animal
experiment nor a clinical result has yet been reported.
For best patient benets, we employ a model-free approach in
this work. Therefore, we examine the application of a generally
known control strategy to the control of oxygenation based on
SaO
2
measurement. For the rst time (rst results on this new
approach have been prepublished in Pomprapa, Mir Wais, Walter,
Misgeld, & Leonhardt, 2015), we propose to use a reinforcement
learning optimal control method, called the policy iteration algo-
rithm (PIA) (Bhasin et al., 2013;Vamvoudakis & Lewis, 2010;
Vrabie, Pastravanu, Abu-Khalaf, & Lewis, 2009), for this particular
application. This control strategy should provide an optimal so-
lution for not only treating hypoxia, but also avoiding hyperoxia
(excess oxygen in the lungs) (Clark, 1974;Kallet & Matthay, 2013;
Mach, Thimmesch, & Pierce, 2011). The PIA is classied as re-
inforcement learning with the actorcritic architecture framework,
which conventionally learns behavior through interactions with a
dynamic environment and a survey for the development of re-
inforcement learning can be found in Kaelbling, Littman, and
Moore (1996). Based on this particular approach, it improves the
effectiveness of learning power using a policy gradient for sto-
chastic search for an optimal result. The technique is based on a
recursive two-step iteration, namely policy improvement and
policy evaluation. These steps are repeated iteratively until
achieving the stopping criteria for the converged optimal solution
(Vamvoudakis & Lewis, 2010;Vrabie et al., 2009). Practical im-
plementations of PIA control in other areas include optimal-load-
frequency control in a power plant (Wang, Zhou, & Wen, 1993), a
DCDC converter (Wernrud, 2007), adaptive steering control of a
tanker ship (Xu, Hu, & Lu, 2007), and a hybrid system of a jumping
robot (Suda & Yamakita, 2013). Therefore, it is motivated to apply
this modern control strategy to biomedical applications and, spe-
cically, to closed-loop ventilation therapy. Computer simulations
are used to investigate the feasibility of this controller in oxygen
therapy. Subsequently, a proof-of-concept is implemented in a real
application for a nonlinear interconnecting three-tank system in
order to control the water level in the last tank. In fact, the dy-
namics of this three-tank system resembles that of a patient with
respiratory deciency, i.e. nonlinear and with time-delay behavior.
In this article, Sections 2.1 and 2.2 provide the statement of the
medical perspective and the formulation of optimal control pro-
blem. Section 3 presents a control system design using the PIA for
nonlinear optimal control based on the HamiltonJacobiBellman
(HJB) equation. The system identication of the cardiopulmonary
system to identify a mathematical model for further design of the
PIA controllers, and a simulation of this control strategy is pro-
posed in Sections 4 and 5, respectively. A practical example of this
controller is presented in Section 6 for controlling the water level
in a nonlinear interconnecting three-tank system. Section 7 pre-
sents a discussion, and our conclusions are presented in Section 8.
2. Problem formulation
2.1. Statement of the medical perspective
The overall transfer function from the settings of inspired
oxygen fractions FiO
2
to the oxygen saturation SaO
2
measured in
arterial blood depends on many factors, including lung function
and blood transport (which then depends on cardiac output, state
of circulation, etc.). Let G(s)¼SaO
2
(s)/FiO
2
(s) be the transfer func-
tion describing the system under investigation. G(s) certainly is
nonlinear (e.g. due to the nonlinear saturation function, see West,
2011) and depends on many unknown and time-variant conditions
(like e.g. uid status, heart function, or many diseases). Often, for
the individual patient requiring articial ventilation, this model is
not available. Furthermore, as there usually is no time for in-
dividual model selection and parameter identication, this has
been the motivation to investigate the performance of control al-
gorithms which do not require explicit models.
As an illustrative example and in order to roughly understand
the dynamics, we provide the transfer function G(s) of a female
domestic pig weighing 34 kg, which we obtained during a la-
boratory experiment (Lüpschen et al., 2009). In this special case,
respiratory distress was induced by repeated lung lavage (ARDS-
model, acute respiratory distress syndrome; Ashbaugh, Bigelow,
Petty, & Levine, 1967;Lachmann, Robertson, & Vogel, 1980).
2.2. Optimal control problem
We consider a specic class of nonlinear systems, namely afne
nonlinear systems of the following form:
() () () ()
̇=+· ()
tf gutxxx,1
where
χ()∈ ⊆tx
n
denotes the states of the system in a vector
form of ndimension,
υ()∈ ⊆ut
represents the control input or
FiO
2
, and
χυ×→:n
-
is Lipschitz continuous on
χ
υ
×
, such that
the state vector
(
)
tx
is unique for a given initial condition
x
0
. In-
itially, it is assumed that the system is stabilizable. This type of
model has been used to describe the dynamics of various plants,
for example a robot manipulator (Sun, Sun, Li, & Zhang, 1999), a
continuous stirred-tank reactor (CSTR) (Kamalabady & Salahshoor,
2009) and a non-interconnecting three-tank system (Orani, Pisa-
no, Franceschelli, Giua, & Usai, 2011).
Let us consider a cost function
(
)V
x
given by Eq. (2), which is to
be minimized.
τττ()= (() ()) ()
Vrudxx,, 2
T
0
let
(
)
xr,u
be determined by
()+
Q
u
u
RxT
.If
()
Q
x
is positive-
A. Pomprapa et al. / Control Engineering Practice (∎∎∎∎)∎∎∎∎∎∎2
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
denite and continuously differentiable (for example, if
=x0
,
then
()=
Q
x
0
and
()>
x
for all
x
), then
R
represents a
positive-denite matrix for a penalty or weighting of the control
input, and
T
denotes a time when the states are sufciently close
to zero.
The Hamiltonian of this control problem can be dened by
(*)= ( () ())+∇ *((())+ (())() ()
HVrut Vf gut
xx xx,u, t, t t , 3
xx
where the gradient
*
V
x
satises the following HJB equation and
x
symbolizes a partial differential in
x
. We note that
(∇
*)= ()
HuV
xmin , , 0 4
ux
The optimal solution is provided by
*=− *()
ugV
Rx
1
2x, 5
T
x
1
which is an algorithm for policy improvement (Ohtake & Yama-
kita, 2010). For a linear continuous-time system, the HJB equation
transforms to the well-known Riccati equation (Vamvoudakis,
Vrabie, & Lewis, 2009). The converged solution is equivalent to a
result from a linearquadratic-regulator (LQR) controller. If the
optimal control policy (Eq. (5)) is substituted in the nonlinear
Lyapunov equation, the HJB equation can be derived in terms of
the Hamiltonian cost function (Vamvoudakis & Lewis, 2010).
For this particular control problem, we insert the optimal so-
lution of Eq. (5) into Eq. (3) with the condition provided in Eq. (4).
It yields
=()+
**
+∇ *()((())+ (())*()
Quu Vf gtu
Rxxx0x t . 6
T
x
This equation can be rewritten as
=()+*()() *()() ()*() ()
QVf VggV
xxRxx0x x
1
4x,
7
TTT
xx x
1
where
*()=
V
0
0
and this is a formulation of the HJB equation. The
aim of implementation of optimal control is to solve the HJB
equation (Eq. (7)) in order to nd the Hamiltonian cost function
and substitute it in Eq. (5) for the control policy. However, with a
nonlinear nature, it is generally difcult to nd the solution to the
HJB equation. In this article, the policy iteration algorithm is in-
troduced for the optimal control of oxygen saturation based on the
animal model of induced acute respiratory distress syndrome
(ARDS), which is given in the subsequent sections. The iterative
process is therefore introduced for solving this optimal control
problem (Vamvoudakis & Lewis, 2010;Vrabie et al., 2009).
3. Policy iteration algorithm
The evaluation function equation (2) can be modied in the
following form:
τττ
˜(())= (() ()) +˜(( + )) ()
+
VrudVT
xx xt,t, 8
t
Tt
where
~()=
V
0
0
and T represents a sampling time used in a process
of evaluation. Hence,
˜(()
)V
tx
can be solved in time domain using
Eq. (8) and the control signal
*
u
can be updated based on Eq. (5).
Subsequently, the function of
˜(()
)V
tx
can be estimated by either a
linear function or a nonlinear function of
(
)
tx
, so that all unknown
parameters can be computed by using a simple least-squares al-
gorithm, a gradient descent algorithm, a recursive least squares
algorithm (Vrabie et al., 2009) or a neural network (Bhasin et al.,
2013) for parameter estimation. The proof of stability and guar-
anteed convergence for the control policy has been provided by
Vrabie et al. (2009) for a linear system and by Vamvoudakis and
Lewis (2010) for a nonlinear system under the assumption that the
system is stabilizable for a linear case and the solution of nonlinear
Lyapunov equation is smooth and positive denite for the non-
linear case. The block diagram of the PIA is shown in Fig. 1, with an
internal actor and critic subsystem of state feedback architecture.
Based on Fig. 2, the procedure for PIA implementation is pro-
vided as follows:
1. Assign an initial controller that can stabilize the system.
2. Compute the evaluation function
˜(()
)V
tx
numerically in time
domain, with
˜()=
V
0
0
.
3. Based on the response of
˜(()
)V
tx
in the previous step, estimate
all unknown parameters in the evaluation function using the
least squares algorithm (Vrabie et al., 2009) or other parameter
estimation techniques.
4. Evaluate a norm between the updated estimated unknown
parameters and the previous estimated updated parameters
compared to a predened value (
ϵ
) (a convergence test). A PIA
must be terminated if the norm value is satised or the esti-
mated parameters are converged. Otherwise, an update of the
controller is required based on Eq. (5) for policy improvement
and a return to Step 2 to compute the evaluation function
˜(()
)V
tx
for further policy evaluation.
Fig. 1. Block diagram of the policy iteration algorithm of full state feedback
structure.
Fig. 2. Flowchart for the implementation of the policy iteration algorithm.
A. Pomprapa et al. / Control Engineering Practice (∎∎∎∎)∎∎∎∎∎∎ 3
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
Aowchart of this algorithm is presented in Fig. 2. A pre-
requisite for this algorithm is that an initial controller that can
stabilize the system does exist. In practice, a simple proportional
integral (PI) controller can be used as the controller for the rst
evaluation of the system performance. Thus, PIA will optimize the
Hamiltonian cost function at each iteration in order to achieve an
extremal cost for the underlying nonlinear system. Using these
steps, online implementation can be realized, leading to an opti-
mal controller for the nonlinear system. It is worth noting that an
iteration of policy evaluation and policy improvement is required
until achieving the converged optimal result of the estimated
unknown parameters (
ω
). Therefore, a repeated control im-
plementation must be active in order to update the new control
policy and compute further values of the cost function, which is a
basic characteristic of this control scheme. The details of the
complete algorithm can be found in Vrabie et al. (2009).
4. System identication of the oxygenation model
A model of the cardiopulmonary system is identied using a
large animal model based on induced acute lung injury. The
mathematical model was identied based on measurements from
a female domestic pig weighing 34 kg. The animal received pre-
medication, and was then anesthetized and tracheotomized in
supine position. A catheter for measurement of SaO
2
(CeVOX,
Pulsion Medical System SE, Feldkirchen, Germany) was inserted
into the carotid artery. Under the condition of stable ventilatory
response, the animal underwent mechanical ventilation (SERVO
300, Marquet Critical Care AB, Solna, Sweden). The system con-
guration is shown in Fig. 3; a medical panel PC (POC-153, Ad-
vantech Co., Ltd, Taipei, Taiwan) was used for data acquisition.
During baseline ventilation of the animal, acute respiratory
distress syndrome (ARDS) was induced using repetitive lung la-
vage for washout of surfactant (Lachmann et al., 1980). Warm
isotonic saline solution of 9% was used during the entire process of
ARDS induction. All animal procedures were approved by the local
ethical authorities. Surfactant depletion results in increased al-
veolar surface tension, which facilitates alveolar collapse or partial
lung collapse (Ballard-Croft, Wang, Sumpter, & Zhou, 2012).
To implement system identication, multi-step FiO
2
settings
were given to extract the underlying porcine dynamics during
induced ARDS. Measurement of the inputoutput relationship is
presented in Fig. 4. We separated the data into two data sets. The
rst part, containing 75% of all measured data, was used for system
identication, while the second data set was used for validation.
System identication was initially carried out with the assumption
of a rst-order model with time delay (Lüpschen et al., 2009).
After estimating the unknown parameters with a validation data
set, the following plant transfer function was derived at a certain
operating point (FiO
2
¼0.59 and SaO
2
¼93.3%). This transfer func-
tion, given in Eq. (9), incorporates the dynamics of the mechanical
ventilator, the porcine dynamics, and the dynamics of the SaO
2
sensor:
()= ()
() =+·()
ss
ss
GSaO
FiO
0.153
17.29 1 e9
s
2
2
11
It should be noted that a healthy subject normally has
>S
aO
2
97% in ambient air at standard altitude air pressure. A state of
hypoxia can be observed in Fig. 4. The FiO
2
value was given more
than that of the ambient air (FiO
2
¼0.21) in order to maintain
output SaO
2
in the range of 85100%. For this particular system, a
time-delay of 11 s is identied. For further implementation for a
control system design, the Padé-approximation technique was
applied to approximate the delay. This is in fact a nonlinear sys-
tem, but we use only a small region and thus can roughly estimate
the system by a linear model. The plant transfer function is sim-
plied as follows:
()= +·−+ − +
++ + ()
ss
ss s
ss s
G0.153
17.29 1
1.091 0.4959 0.09016
1.091 0.4959 0.09016 10
32
32
This mathematical model is used for a design of the PIA controller
for controlling SaO
2
. Closed-loop ventilation is formed based on
oxygen therapy in order to optimally adjust the controlling input
FiO
2
for this hypoxic cardiopulmonary system.
The frequency responses are also provided in Fig. 5 for both
system identication data set and the validation data set. In the
system identication data set, the model represents the measured
data well in the sense of least mean squares algorithm for para-
meter estimation.
Different model alternatives have been studied to a higher-or-
der model and the model evaluation are based on the mean
squares error (MSE). The summaries of model evaluation are given
in Table B1. According to the results given in Table B1 in the ap-
pendix, the rst-order model with time-delay based on the Padé-
approximation (Eq. (10)) gives the best prediction in terms of MSE
for the data set of model validation. Therefore, this model will be
used for further simulation of closed-loop oxygen therapy.
5. Simulation of the closed-loop oxygen therapy
Based on the identied model of Eq. (10), the derived transfer
function is served as an unknown model of pathophysiology of the
pulmonary system for this simulation of the closed-loop oxygen
therapy. It is therefore replaced the Systemblock diagram as
previously given in Fig. 4. The full procedure for implementing PIA
as given in the owchart is then carried out with an initial
Fig. 3. Block diagram of an oxygen therapy system based on a porcine model of
induced ARDS.
Fig. 4. Inputoutput response for system identication based on a porcine model of
induced ARDS (Lüpschen et al., 2009).
A. Pomprapa et al. / Control Engineering Practice (∎∎∎∎)∎∎∎∎∎∎4
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
proportional-integral (PI) controller with parameters K
p
¼1 and
K
i
¼0.5 in order to stabilize the system. Based on this pre-
determined control structure, the simulation can be set up and all
system states and the control input u(t) are recorded with a xed
sampling time of 100 ms. A cost function V(x) based on Eq. (2) can
be computed by setting Q(x)¼I and R¼1 regardless of the system
model. Further evaluation function
˜(
)V
x
can be thereafter calcu-
lated based on Eq. (8). Subsequently,
*
V
x
can be computed for a
control policy, as given in Eq. (5). The synthesized control action is
iteratively implemented until achieving the converged optimal
solution.
For a control system design of PIA, the evaluation function
˜(
)V
x
is set up for the approximation of
˜(()
)V
tx
based on four different
states, namely
xxx,,
123
and
x
4
, where the time delay was calcu-
lated by a third-order Padé approximation, dened as follows:
()
˜
=+ + + ++
++++ ()
Vxxxx
wxwxxwxxwxxwxwxx
wxxwxwxxwx
,,,
,11
12 3 4
11
2212 313 41 4 52
2623
72 4 83
293 4 10 4
2
where
{
+
wi I:
i
and
}
i10
is the set of unknown parameters,
which are linearly independent based on different basis functions.
This equation is the simplest parametric form of second-order
surface squared by four states for tting the evaluation function,
which requires further mathematical manipulation for a policy
update. A higher-order surface would be alternatively applied for a
better tting, but has not pursued due to the potential danger of
overtting.
Based on the animal model of induced acute lung injury pro-
vided in Eq. (9) (Lüpschen et al., 2009), we assumed that all in-
ternal states are accessible for state feedback conguration of the
PIA controller. This is a drawback. If the system states are not
physically accessible and further design of state observer is de-
nitely required (Lu & Ho, 2014). With this strong assumption for
simulation, the stable converged solution of
wi
at the third itera-
tion is given that
=w6.8
2
1
,
=w8.4
5
2
,
=w14.47
3
,
=w8.35
4
,
=w10.52
5
,
=w18.3
0
6
,
=w9.6
3
7
,
=w14.1
8
8
,
=w15.81
9
, and
=w14.32
10
, using the initial PI controller. The evolution of
wi
is
given in the Appendix (Fig. A1).
Fig. 6 shows the control performance of a step response
(
()=yt 1
ref
) at an initial iteration with a stable result for the control
of oxygenation.
The control performance at the third iteration is also shown
with the optimal result for the control of oxygenation in Fig. 7 that
all states are driven to become zero with the settling time about
60 s compared to that of 600 s by the initial PI controller, as shown
in Fig. 6.
This simulation shows the non-minimum phase at the onset of
control, which the output
(
)
t
y
or SaO
2
responded to in the reverse
direction. This was due to the delay effect because of a positive
zero position after implementation of the Padé approximation.
Furthermore, the optimal response can be achieved after applica-
tion of the PIA at the third iteration in terms of cost function and
Fig. 5. Comparison of Bode diagram for the identied system model and the measured data set.
Fig. 6. Control performance at an initial iteration for the control of oxygenation.
Fig. 7. Comparative control performance of policy iteration algorithm for the
control of oxygenation.
A. Pomprapa et al. / Control Engineering Practice (∎∎∎∎)∎∎∎∎∎∎ 5
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
1 / 10 100%
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !