Telechargé par abouda saif

CEP16 3tanks

publicité
Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Control Engineering Practice
journal homepage: www.elsevier.com/locate/conengprac
Optimal learning control of oxygen saturation using
a policy iteration algorithm and a proof-of-concept
in an interconnecting three-tank system
Anake Pomprapa n, Steffen Leonhardt, Berno J.E. Misgeld
Philips Chair for Medical Information Technology, RWTH Aachen University, Aachen, Germany
art ic l e i nf o
a b s t r a c t
Article history:
Received 30 November 2015
Received in revised form
18 June 2016
Accepted 25 July 2016
In this work, “policy iteration algorithm” (PIA) is applied for controlling arterial oxygen saturation that
does not require mathematical models of the plant. This technique is based on nonlinear optimal control
to solve the Hamilton–Jacobi–Bellman equation. The controller is synthesized using a state feedback
configuration based on an unidentified model of complex pathophysiology of pulmonary system in order
to control gas exchange in ventilated patients, as under some circumstances (like emergency situations),
there may not be a proper and individualized model for designing and tuning controllers available in
time. The simulation results demonstrate the optimal control of oxygenation based on the proposed PIA
by iteratively evaluating the Hamiltonian cost functions and synthesizing the control actions until
achieving the converged optimal criteria. Furthermore, as a practical example, we examined the performance of this control strategy using an interconnecting three-tank system as a real nonlinear system.
& 2016 Elsevier Ltd. All rights reserved.
Keywords:
Policy iteration algorithm
Optimal control
Reinforcement learning
Control of oxygen saturation
Biomedical control system
Closed-loop ventilation
1. Introduction
Hypoxia, or oxygen deficiency, is a common consequence of
respiratory insufficiency. If a patient with hypoxia is not properly
treated on time, the prolonged state of impaired oxygenation can
lead to potentially lethal conditions, such as cerebral hypoxia,
cardiac malfunction, or multiple organ failure. The most effective
therapy is to supply the patient with an increased oxygen fraction
in the ventilated air, which is referred to as oxygen therapy (Tarpy
& Celli, 1995). A proper control output of this therapy is the resetting oxygen content in the body, including peripheral oxygen
saturation (SpO2), arterial oxygen saturation (SaO2) or partial
pressure arterial oxygen (PaO2), can be used in a negative feedback
configuration. A single-input single-output (SISO) system can be
formulated and a closed-loop control of oxygen therapy can be
realized in clinical scenarios, which are unique, complex, and lifethreatening.
To our knowledge, closed-loop ventilation for oxygen therapy
developed gradually (Brunner, 2002) and was dependent on progress in control engineering. Early publications date back to 1975:
this paper describes the control of SpO2 (positioned at the ear)
with a ‘bang–bang’ control of a controlling input fraction of inspired oxygen (FiO2) and the results of limit cycles achieved in
n
Corresponding author.
E-mail address: [email protected] (A. Pomprapa).
anesthetized dogs (Mitamura, Mikami, & Yamamoto, 1975). In
1985, a linear quadratic regulator (LQR) was shown for the dual
control of oxygen and carbon dioxide (Giard, Bertrand, Robert, &
Pernier, 1985). In 1987, a multiple-model adaptive control (MMAC)
was applied for the control of SpO2 in mongrel dogs and the results were compared with a proportional integral (PI) controller
(Yu et al., 1987). In 1991, the multivariable inputs of FiO2 and positive end-expiratory pressure (PEEP) were applied to control PaO2
using a proportional–integral–derivative (PID) controller in four
mongrel dogs (East, Tolle, MCJames, Farrell, & Brunner, 1991). In
2004, a non-linear adaptive neuro-fuzzy inference system (ANFIS)
model was used to estimate shunt in combination with dynamic
changes of blood gases for controlling PaO2 (Kwok, Linkens,
Mahfouf, & Mills, 2004). Further simulation of septic patients was
carried out based on a hybrid knowledge/model-based advisory
control. Recently, feedback-oriented oxygen therapy was presented in preterm infants, where the controlled variable was SpO2
(Claure & Bancalari, 2009). In addition, based on the works in our
research group, a proportional–integral (PI) controller with gain
scheduling (Walter et al., 2009), a Smith predictor with an internal
PI controller (Lüpschen, Zhu, & Leonhardt, 2009), a knowledgebased controller (Pomprapa, Misgeld, Lachmann, & Leonhardt,
2013), and, potentially, a self-tuning adaptive controller (Pomprapa, Pikkemaat, Lüpschen, Lachmann, & Leonhardt, 2010) and a
funnel controller (Pomprapa, Alfocea, Göbel, Misgeld, & Leonhardt,
2014) can be used for this particular control problem. Moreover, in
2014, SpO2 between 92% and 94% was targeted based on the
http://dx.doi.org/10.1016/j.conengprac.2016.07.014
0967-0661/& 2016 Elsevier Ltd. All rights reserved.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
2
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
automatic adjustment of FiO2 using a step change approach (Saihi
et al., 2014). Based on a literature search, three different categories
were found for the control system design and their advantages and
disadvantages are given below.
A model-based approach is a traditional procedural design in
the control community. Generally, it requires a mathematical description of the system behavior. The difficulty of this approach
lies in the selection of an appropriate model structure for this
particular problem and has to deal with model uncertainties based
on varying conditions of the system. Therefore, self-tuning adaptive control was introduced for this control problem (Pomprapa
et al., 2010). With this model-based approach, system performance
and further corrective actions can be preanalyzed.
A model-free approach requires neither prior knowledge of the
system nor system identification, which is a major advantage.
Therefore, the time-consuming process for determining an individual model can be significantly reduced, which is certainly
beneficial for a patient with severe hypoxia. In fact, the patient
requires an immediate therapeutic action. Hence, this approach is
beneficial in the real clinical scenarios. Some of the proposed
control techniques were, for example, a knowledge-based controller (Pomprapa, Misgeld, Lachmann et al., 2013) or a funnel
controller (Pomprapa, Alfocea, et al., 2014). Nevertheless, the
drawback is the requirement for experience to reach a good
performance.
An intelligent hybrid approach of combined model-based and
model-free methods (Kwok et al., 2004) requires a knowledge and
model-based advisory system for intensive care ventilation. The
strength and the weakness have been mentioned for both modelbased approach and model-free approach. The design of this approach was based on neuro-fuzzy system (ANFIS), which should
closely resemble to mental operation. However, neither an animal
experiment nor a clinical result has yet been reported.
For best patient benefits, we employ a model-free approach in
this work. Therefore, we examine the application of a generally
known control strategy to the control of oxygenation based on
SaO2 measurement. For the first time (first results on this new
approach have been prepublished in Pomprapa, Mir Wais, Walter,
Misgeld, & Leonhardt, 2015), we propose to use a reinforcement
learning optimal control method, called the policy iteration algorithm (PIA) (Bhasin et al., 2013; Vamvoudakis & Lewis, 2010;
Vrabie, Pastravanu, Abu-Khalaf, & Lewis, 2009), for this particular
application. This control strategy should provide an optimal solution for not only treating hypoxia, but also avoiding hyperoxia
(excess oxygen in the lungs) (Clark, 1974; Kallet & Matthay, 2013;
Mach, Thimmesch, & Pierce, 2011). The PIA is classified as reinforcement learning with the actor–critic architecture framework,
which conventionally learns behavior through interactions with a
dynamic environment and a survey for the development of reinforcement learning can be found in Kaelbling, Littman, and
Moore (1996). Based on this particular approach, it improves the
effectiveness of learning power using a policy gradient for stochastic search for an optimal result. The technique is based on a
recursive two-step iteration, namely policy improvement and
policy evaluation. These steps are repeated iteratively until
achieving the stopping criteria for the converged optimal solution
(Vamvoudakis & Lewis, 2010; Vrabie et al., 2009). Practical implementations of PIA control in other areas include optimal-loadfrequency control in a power plant (Wang, Zhou, & Wen, 1993), a
DC–DC converter (Wernrud, 2007), adaptive steering control of a
tanker ship (Xu, Hu, & Lu, 2007), and a hybrid system of a jumping
robot (Suda & Yamakita, 2013). Therefore, it is motivated to apply
this modern control strategy to biomedical applications and, specifically, to closed-loop ventilation therapy. Computer simulations
are used to investigate the feasibility of this controller in oxygen
therapy. Subsequently, a proof-of-concept is implemented in a real
application for a nonlinear interconnecting three-tank system in
order to control the water level in the last tank. In fact, the dynamics of this three-tank system resembles that of a patient with
respiratory deficiency, i.e. nonlinear and with time-delay behavior.
In this article, Sections 2.1 and 2.2 provide the statement of the
medical perspective and the formulation of optimal control problem. Section 3 presents a control system design using the PIA for
nonlinear optimal control based on the Hamilton–Jacobi–Bellman
(HJB) equation. The system identification of the cardiopulmonary
system to identify a mathematical model for further design of the
PIA controllers, and a simulation of this control strategy is proposed in Sections 4 and 5, respectively. A practical example of this
controller is presented in Section 6 for controlling the water level
in a nonlinear interconnecting three-tank system. Section 7 presents a discussion, and our conclusions are presented in Section 8.
2. Problem formulation
2.1. Statement of the medical perspective
The overall transfer function from the settings of inspired
oxygen fractions FiO2 to the oxygen saturation SaO2 measured in
arterial blood depends on many factors, including lung function
and blood transport (which then depends on cardiac output, state
of circulation, etc.). Let G(s) ¼SaO2(s)/FiO2(s) be the transfer function describing the system under investigation. G(s) certainly is
nonlinear (e.g. due to the nonlinear saturation function, see West,
2011) and depends on many unknown and time-variant conditions
(like e.g. fluid status, heart function, or many diseases). Often, for
the individual patient requiring artificial ventilation, this model is
not available. Furthermore, as there usually is no time for individual model selection and parameter identification, this has
been the motivation to investigate the performance of control algorithms which do not require explicit models.
As an illustrative example and in order to roughly understand
the dynamics, we provide the transfer function G(s) of a female
domestic pig weighing 34 kg, which we obtained during a laboratory experiment (Lüpschen et al., 2009). In this special case,
respiratory distress was induced by repeated lung lavage (ARDSmodel, acute respiratory distress syndrome; Ashbaugh, Bigelow,
Petty, & Levine, 1967; Lachmann, Robertson, & Vogel, 1980).
2.2. Optimal control problem
We consider a specific class of nonlinear systems, namely affine
nonlinear systems of the following form:
ẋ ( t ) = f ( x) + g ( x)·u( t ),
(1)
n
where x(t ) ∈ χ ⊆  denotes the states of the system in a vector
form of n dimension, u(t ) ∈ υ ⊆  represents the control input or
FiO2, and - : χ × υ → n is Lipschitz continuous on χ × υ, such that
the state vector x(t ) is unique for a given initial condition x 0 . Initially, it is assumed that the system is stabilizable. This type of
model has been used to describe the dynamics of various plants,
for example a robot manipulator (Sun, Sun, Li, & Zhang, 1999), a
continuous stirred-tank reactor (CSTR) (Kamalabady & Salahshoor,
2009) and a non-interconnecting three-tank system (Orani, Pisano, Franceschelli, Giua, & Usai, 2011).
Let us consider a cost function V (x) given by Eq. (2), which is to
be minimized.
V (x) =
∫0
T∞
r (x(τ ), u(τ ))dτ ,
(2)
let r(x, u) be determined by Q (x) + uT Ru. If Q (x) ∈  is positive-
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
definite and continuously differentiable (for example, if x = 0 ,
then Q (x) = 0 and Q (x) > 0 for all x ), then R ∈  represents a
positive-definite matrix for a penalty or weighting of the control
input, and T∞ denotes a time when the states are sufficiently close
to zero.
The Hamiltonian of this control problem can be defined by
H (x, u, V x*) = r (x(t), u(t )) + ∇x V *(f (x(t)) + g (x(t))u(t ),
(3)
where the gradient ∇x V * satisfies the following HJB equation and
∇x symbolizes a partial differential in x . We note that
minuH (x, u, ∇x V *) = 0
(4)
The optimal solution is provided by
⎛ ⎞
⎛ ⎞
1
u* = − R−1g T ⎜ x⎟∇x V *⎜ x⎟,
⎝ ⎠
⎝ ⎠
2
(5)
which is an algorithm for policy improvement (Ohtake & Yamakita, 2010). For a linear continuous-time system, the HJB equation
transforms to the well-known Riccati equation (Vamvoudakis,
Vrabie, & Lewis, 2009). The converged solution is equivalent to a
result from a linear–quadratic-regulator (LQR) controller. If the
optimal control policy (Eq. (5)) is substituted in the nonlinear
Lyapunov equation, the HJB equation can be derived in terms of
the Hamiltonian cost function (Vamvoudakis & Lewis, 2010).
For this particular control problem, we insert the optimal solution of Eq. (5) into Eq. (3) with the condition provided in Eq. (4).
It yields
0 = Q (x) + u*T Ru* + ∇x V *(x)(f (x(t)) + g (x(t ))u*.
3
system is stabilizable for a linear case and the solution of nonlinear
Lyapunov equation is smooth and positive definite for the nonlinear case. The block diagram of the PIA is shown in Fig. 1, with an
internal actor and critic subsystem of state feedback architecture.
Based on Fig. 2, the procedure for PIA implementation is provided as follows:
1. Assign an initial controller that can stabilize the system.
2. Compute the evaluation function V˜ (x(t )) numerically in time
domain, with V˜ (0) = 0.
3. Based on the response of V˜ (x(t )) in the previous step, estimate
all unknown parameters in the evaluation function using the
least squares algorithm (Vrabie et al., 2009) or other parameter
estimation techniques.
4. Evaluate a norm between the updated estimated unknown
parameters and the previous estimated updated parameters
compared to a predefined value (ϵ) (a convergence test). A PIA
must be terminated if the norm value is satisfied or the estimated parameters are converged. Otherwise, an update of the
controller is required based on Eq. (5) for policy improvement
and a return to Step 2 to compute the evaluation function
V˜ (x(t )) for further policy evaluation.
(6)
This equation can be rewritten as
0 = Q (x) + ∇x V *T (x)f (x) −
1
∇x V *T (x)g (x)R−1g T (x)∇x V *(x),
4
(7)
where V *(0) = 0 and this is a formulation of the HJB equation. The
aim of implementation of optimal control is to solve the HJB
equation (Eq. (7)) in order to find the Hamiltonian cost function
and substitute it in Eq. (5) for the control policy. However, with a
nonlinear nature, it is generally difficult to find the solution to the
HJB equation. In this article, the policy iteration algorithm is introduced for the optimal control of oxygen saturation based on the
animal model of induced acute respiratory distress syndrome
(ARDS), which is given in the subsequent sections. The iterative
process is therefore introduced for solving this optimal control
problem (Vamvoudakis & Lewis, 2010; Vrabie et al., 2009).
Fig. 1. Block diagram of the policy iteration algorithm of full state feedback
structure.
3. Policy iteration algorithm
The evaluation function equation (2) can be modified in the
following form:
V˜ (x(t)) =
∫t
t+T
r (x(τ ), u(τ ))dτ + V˜ (x( t+ T )),
(8)
~
where V (0) = 0 and T represents a sampling time used in a process
of evaluation. Hence, V˜ (x(t )) can be solved in time domain using
Eq. (8) and the control signal u* can be updated based on Eq. (5).
Subsequently, the function of V˜ (x(t )) can be estimated by either a
linear function or a nonlinear function of x(t ), so that all unknown
parameters can be computed by using a simple least-squares algorithm, a gradient descent algorithm, a recursive least squares
algorithm (Vrabie et al., 2009) or a neural network (Bhasin et al.,
2013) for parameter estimation. The proof of stability and guaranteed convergence for the control policy has been provided by
Vrabie et al. (2009) for a linear system and by Vamvoudakis and
Lewis (2010) for a nonlinear system under the assumption that the
Fig. 2. Flowchart for the implementation of the policy iteration algorithm.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
4
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
A flowchart of this algorithm is presented in Fig. 2. A prerequisite for this algorithm is that an initial controller that can
stabilize the system does exist. In practice, a simple proportional–
integral (PI) controller can be used as the controller for the first
evaluation of the system performance. Thus, PIA will optimize the
Hamiltonian cost function at each iteration in order to achieve an
extremal cost for the underlying nonlinear system. Using these
steps, online implementation can be realized, leading to an optimal controller for the nonlinear system. It is worth noting that an
iteration of policy evaluation and policy improvement is required
until achieving the converged optimal result of the estimated
unknown parameters (ω). Therefore, a repeated control implementation must be active in order to update the new control
policy and compute further values of the cost function, which is a
basic characteristic of this control scheme. The details of the
complete algorithm can be found in Vrabie et al. (2009).
4. System identification of the oxygenation model
A model of the cardiopulmonary system is identified using a
large animal model based on induced acute lung injury. The
mathematical model was identified based on measurements from
a female domestic pig weighing 34 kg. The animal received premedication, and was then anesthetized and tracheotomized in
supine position. A catheter for measurement of SaO2 (CeVOX,
Pulsion Medical System SE, Feldkirchen, Germany) was inserted
into the carotid artery. Under the condition of stable ventilatory
response, the animal underwent mechanical ventilation (SERVO
300, Marquet Critical Care AB, Solna, Sweden). The system configuration is shown in Fig. 3; a medical panel PC (POC-153, Advantech Co., Ltd, Taipei, Taiwan) was used for data acquisition.
During baseline ventilation of the animal, acute respiratory
distress syndrome (ARDS) was induced using repetitive lung lavage for washout of surfactant (Lachmann et al., 1980). Warm
isotonic saline solution of 9% was used during the entire process of
ARDS induction. All animal procedures were approved by the local
ethical authorities. Surfactant depletion results in increased alveolar surface tension, which facilitates alveolar collapse or partial
lung collapse (Ballard-Croft, Wang, Sumpter, & Zhou, 2012).
To implement system identification, multi-step FiO2 settings
were given to extract the underlying porcine dynamics during
induced ARDS. Measurement of the input–output relationship is
presented in Fig. 4. We separated the data into two data sets. The
first part, containing 75% of all measured data, was used for system
identification, while the second data set was used for validation.
System identification was initially carried out with the assumption
of a first-order model with time delay (Lüpschen et al., 2009).
After estimating the unknown parameters with a validation data
set, the following plant transfer function was derived at a certain
operating point (FiO2 ¼0.59 and SaO2 ¼93.3%). This transfer function, given in Eq. (9), incorporates the dynamics of the mechanical
ventilator, the porcine dynamics, and the dynamics of the SaO2
sensor:
Fig. 4. Input–output response for system identification based on a porcine model of
induced ARDS (Lüpschen et al., 2009).
G(s ) =
SaO2 (s )
0.153
=
·e−11s
FiO2 (s )
17.29s + 1
(9)
It should be noted that a healthy subject normally has SaO2>
97% in ambient air at standard altitude air pressure. A state of
hypoxia can be observed in Fig. 4. The FiO2 value was given more
than that of the ambient air (FiO2 ¼0.21) in order to maintain
output SaO2 in the range of 85–100%. For this particular system, a
time-delay of 11 s is identified. For further implementation for a
control system design, the Padé-approximation technique was
applied to approximate the delay. This is in fact a nonlinear system, but we use only a small region and thus can roughly estimate
the system by a linear model. The plant transfer function is simplified as follows:
G(s ) =
0.153
−s 3 + 1.091s 2 − 0.4959s + 0.09016
· 3
17.29s + 1 s + 1.091s 2 + 0.4959s + 0.09016
(10)
This mathematical model is used for a design of the PIA controller
for controlling SaO2. Closed-loop ventilation is formed based on
oxygen therapy in order to optimally adjust the controlling input
FiO2 for this hypoxic cardiopulmonary system.
The frequency responses are also provided in Fig. 5 for both
system identification data set and the validation data set. In the
system identification data set, the model represents the measured
data well in the sense of least mean squares algorithm for parameter estimation.
Different model alternatives have been studied to a higher-order model and the model evaluation are based on the mean
squares error (MSE). The summaries of model evaluation are given
in Table B1. According to the results given in Table B1 in the appendix, the first-order model with time-delay based on the Padéapproximation (Eq. (10)) gives the best prediction in terms of MSE
for the data set of model validation. Therefore, this model will be
used for further simulation of closed-loop oxygen therapy.
5. Simulation of the closed-loop oxygen therapy
Fig. 3. Block diagram of an oxygen therapy system based on a porcine model of
induced ARDS.
Based on the identified model of Eq. (10), the derived transfer
function is served as an unknown model of pathophysiology of the
pulmonary system for this simulation of the closed-loop oxygen
therapy. It is therefore replaced the ‘System’ block diagram as
previously given in Fig. 4. The full procedure for implementing PIA
as given in the flowchart is then carried out with an initial
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
5
Fig. 5. Comparison of Bode diagram for the identified system model and the measured data set.
proportional-integral (PI) controller with parameters Kp ¼ 1 and
Ki ¼0.5 in order to stabilize the system. Based on this predetermined control structure, the simulation can be set up and all
system states and the control input u(t) are recorded with a fixed
sampling time of 100 ms. A cost function V(x) based on Eq. (2) can
be computed by setting Q(x) ¼I and R ¼ 1 regardless of the system
model. Further evaluation function V˜ (x) can be thereafter calculated based on Eq. (8). Subsequently, ∇x V * can be computed for a
control policy, as given in Eq. (5). The synthesized control action is
iteratively implemented until achieving the converged optimal
solution.
For a control system design of PIA, the evaluation function V˜ (x)
is set up for the approximation of V˜ (x(t )) based on four different
states, namely x1, x2, x3 and x4 , where the time delay was calculated by a third-order Padé approximation, defined as follows:
(
V˜ x1, x2 , x3, x4
)
= w1x12 + w2x1x2 + w3x1x3 + w4x1x4 + w5x22 + w6x2x3
+ w7x2x4 + w8x32 + w9x3x4 + w10x42 ,
(11)
where { wi: i ∈ I and i ≤ 10} is the set of unknown parameters,
which are linearly independent based on different basis functions.
This equation is the simplest parametric form of second-order
surface squared by four states for fitting the evaluation function,
which requires further mathematical manipulation for a policy
update. A higher-order surface would be alternatively applied for a
better fitting, but has not pursued due to the potential danger of
overfitting.
Based on the animal model of induced acute lung injury provided in Eq. (9) (Lüpschen et al., 2009), we assumed that all internal states are accessible for state feedback configuration of the
PIA controller. This is a drawback. If the system states are not
physically accessible and further design of state observer is definitely required (Lu & Ho, 2014). With this strong assumption for
simulation, the stable converged solution of wi at the third iteration is given that w1 = 6.82, w2 = 8.45, w3 = 14.47, w4 = 8.35,
w5 = 10.52, w6 = 18.30, w7 = 9.63, w8 = 14.18, w9 = 15.81, and
w10 = 14.32, using the initial PI controller. The evolution of wi is
given in the Appendix (Fig. A1).
Fig. 6 shows the control performance of a step response
( yref (t ) = 1) at an initial iteration with a stable result for the control
of oxygenation.
The control performance at the third iteration is also shown
with the optimal result for the control of oxygenation in Fig. 7 that
all states are driven to become zero with the settling time about
60 s compared to that of 600 s by the initial PI controller, as shown
in Fig. 6.
Fig. 6. Control performance at an initial iteration for the control of oxygenation.
+
Fig. 7. Comparative control performance of policy iteration algorithm for the
control of oxygenation.
This simulation shows the non-minimum phase at the onset of
control, which the output y(t ) or SaO2 responded to in the reverse
direction. This was due to the delay effect because of a positive
zero position after implementation of the Padé approximation.
Furthermore, the optimal response can be achieved after application of the PIA at the third iteration in terms of cost function and
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
an improved steady-state time constant. For a practical aspect in a
real patient, a step reference is iteratively required for multiple
times for policy improvement and policy evaluation until meeting
stopping criteria for an optimal control. In general, a patient with
acute lung injury has a poor oxygenation ( SaO2< 88%) and the
oxygenation goal of standard ARDSNet ventilation therapy (Pomprapa, Schwaiberger et al., 2014) is to maintain SaO2 ¼ 88–95%.
Therefore, we have plenty of SaO2 targets for a step reference
during a treatment that is compliant to this standardized ventilation protocol.
6. Practical application
To implement the PIA controller in a real application, an interconnecting three-tank system was set up with the configuration
shown in Fig. 8. Pressure sensors (RPUP001G2A, Sensortechnics
GmbH, Germany) were installed at the bottom of the tanks to
estimate the height of the fluid. A submersible pump (Barwig 0111,
Barwig Wasserversorgung, Germany) was used as an actuator to
draw the fluid from a reservoir into Tank 1 with a maximum flow
of 18 l/min. Due to the interconnection structure, fluid flows from
Tank 1 to Tank 2 and Tank 3, successively. The fluid in Tank 3 is
then drained back to the reservoir. The overall system is operated
under a dSPACE platform (MicroAutoBox II, dSPACE GmbH, Germany) with three sensor inputs corresponding to the height of the
fluid in each tanks ( x1, x2, x3) and one controllable current output
to the pump associated with adjustable flow into Tank 1 ( Q1).
A closed-loop system was formed with all accessible states,
namely the height of the fluid in the tanks ( x1, x2, x3). The aim was
to control the height x3 of Tank 3 with the adjustable flow Q1 as a
controlling input to the system, where A 0 represents the crosssectional area of the tank. All states are used for synthesis of the
controller using the PIA for this nonlinear system of an interconnecting three-tank system.
This three-tank system is proposed here as a practical example
due to its equivalence to the blood circulation system in a human
body. The submersible pump is used to represent a heart as an
energy source to supply fluid or blood to the whole system. The
submersible pump is then regulated by dSPACE control unit behaving the function of the brain. Tank 1, Tank 2 and Tank 3 symbolize blood store in tissue, venous blood store and arterial blood
store, respectively (Kron, Böhm, & Leonhardt, 1999). This particular
system has a special characteristic that all internal states in terms
of the fluid level can be fully accessed based on a low-cost pressure sensor. Therefore, PIA is applicable for this particular example. Nevertheless, the height control in a three-tank model is
much simpler than control of gas concentrations and this particular model cannot fully represent or simulate the cardiopulmonary system under acute lung injury is unique. A real patient-in-the-loop system should be set up to test the performance
of this controller. Due to a long process for new ethical approval
and a limited budget, we cannot at the moment carry out this
control algorithm in an animal trial. Instead this interconnecting
three-tank system as a nonlinear system is motivated for the implementation in order to demonstrate the tracking performance
and optimal solution of this control algorithm for a real practical
setup.
This nonlinear system can be modeled using mass balance
equations for each tank, which yields the following relationship
dx1
= − Q12 + Q 1
dt
dx
A 0· 2 = − Q 23 + Q12
dt
dx3
= − Q 30 + Q 23,
A 0·
dt
A 0·
(12)
where Q12, Q23, and Q30 represent the fluid flow between Tank
1 and Tank 2, fluid flow between Tank 2 and Tank 3, and fluid flow
Fig. 8. System configuration and system realization for computerized control of the water level in the interconnecting three-tank system as a simulation of blood circulation
system.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
from Tank 3 back to the reservoir, respectively.
With the assumption that x1 ≥ x2 ≥ x3, these can be rewritten as
dx1
= − A12· 2·g0 ·|x1 − x2| + Q 1
dt
dx
A 0· 2 = A12· 2·g0 ·|x1 − x2| − A23 · 2·g0 ·|x2 − x3|
dt
dx3
A 0·
= A23· 2·g0 ·|x2 − x3| − A30· 2·g0 ·x3 ,
dt
A 0·
(13)
7
to their final states quicker and the control target can be reached
faster in a series of iteration episode.
In this real application, u0 was introduced to reach the target
reference point of x3 = 10 cm ; the settling time was remarkably
improved for this particular interconnecting three-tank system.
Therefore, real implementation for oxygenation control should be
applicable with this quasi non-identifier approach, which can
provide a practical solution for treating patients with respiratory
failure by means of closed-loop oxygen therapy.
2
where g0 denotes the gravity (¼981 cm/s ).
They can be simplified in a state model of
7. Discussion
x1̇ = f1(x1, x2) + Q 1
x2̇ = f2 (x1, x2 , x3)
x3̇ = f3 (x2 , x3).
(14)
Based on this mathematical model, the system has a nonlinear
coupling behavior. For the actual implementation, all initial states
are zero (empty tank) and neither leakage nor additional fluid
input to any tanks is allowed. For this application of PIA controller,
it is worth noting that we do not need a mathematical model of
the nonlinear system to control height x3 of Tank 3. We require
only the state information and the computed evaluation function
for every iteration process under the assumption that the system
must be stabilizable. To control x3 = 10 cm using the PIA, the
following evaluation function was used with the three states corresponding to the height of Tanks 1, 2 and 3, respectively:
(
)
V x1, x2 , x3 = w1x12 + w2x1x2 + w3x1x3 + w4x22 + w5x2x3 + w6x32
(15)
The update control policy was in the following form:
1
∂V (x)
+ u0 ,
u* = − R−1g T (x)
2
∂x
(16)
where u0 represents a controlling input at the operating point.
The control performance with an optimal result is presented in
Fig. 9 with a sampling time of 100 ms, showing that the control
objective of x3 = 10 cm was fulfilled with minimal knowledge of
the system. A PID controller was given for the initial implementation in order to collect all information from the system
for policy evaluation. Thereafter, a new controller can be designed
based on Eq. (16) and updated into the system as policy iteration.
In the first iteration, the system reached steady state at t¼130 s,
while the performance of steady-state time improved significantly
at the second iteration to 100 s. This shows that a PIA controller is
realizable in actual practice and that results can be achieved,
which minimize the evaluation function V(x) for each iteration
using the computed policy update. Consequently, all states return
A number of potential controllers can be used for the control of
oxygenation. In general, the settling time was approximately 300 s
using a PI controller and 250 s using a Smith predictor for the
control of oxygenation in an extracorporeal membrane oxygenator
(ECMO) (Walter et al., 2009). Therefore, one of these controllers
can be applied for initial implementation in a PIA. These control
strategies ensure that the assumed need for an initial controller
that can stabilize the system can be fulfilled. A PIA controller
should then be realizable in actual clinical practice for optimal
control of oxygenation with a settling time less than 250 s; this
should be of considerable benefit to patients suffering from respiratory insufficiency. In patients with ARDS, an acceptable range
for SaO2 is 88–95% based on treatment using the ARDSNet protocol
(ARDSNet, 2000; Pomprapa, Schwaiberger et al., 2014).
In general, when looking at input-output dynamics and considering identifiability, a good choice for the sampling time is to
set T equal to 1/10 or even 1/20 of the fastest time constants that
need to be identified (Isermann & Münchhof, 2010). When considering the process dynamics provided in Eq. (9), this would lead
to the sampling time T<1.7 s.
Considering the repetitive nature of both natural and artificial
respiration, another influencing factor is the rate of respiration
(RR). In fact, the alveolar gas transport is modulated by the inhalation and exhalation process. Hence, higher frequency components of respiration rate will, at least theoretically, also be
visible in the actual gas transport rates across the alveolar membrane. As for ARDS patients breathing rates are limited by the
therapeutic protocols in addition, it is wise to consider these
practical boundary conditions as well. Specifically, the maximum
respiratory rate for the well-known ARDSNet protocol (ARDSNet,
2000) is 35 bpm (¼0.58 Hz) and it is 60 bpm ( ¼1 Hz) and for the
open lung management (Pomprapa et al., 2015). Finally, SaO2 is a
signal that is measured based on blood pulsations in the skin
(Wukitsch, Petterson, & Pologe, 1988). Hence, theoretically we also
have to consider heart rate (HR) and the theorem of Shannon
Fig. 9. Control performance of a PIA controller for an interconnecting three-tank system with a sampling time of 100 ms.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
8
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
stating that if a signal of continuous time has no frequency information higher than f Hz, a sampling time T < 1/(2f) s will
guarantee a perfect reconstruction of the original frequency content. However, in our application, we used a commercial device
(the CeVOX) for saturation measurements that handles the pulsatile nature of the saturation signal internally by using appropriate sampling times. Hence, as we only look at SaO2 as a quasicontinuous function of FiO2, it seems sufficient to select the
sampling time T¼ 0.1 s
A 34-kg pig model of induced acute lung injury was repeatedly
lavaged in order to remove pulmonary surfactant (anti-inflammatory molecules) using saline solution for 3–6 l, leading to
partial lung collapse and poor oxygenation. The pig was selected
because of its structural resemblance of pulmonary system to that
of a human in terms of size, structure and a mechanism of gas
exchange. Other animal alternatives such as sheep, or monkey can
be considered for the experiment of this particular lung disease.
The pig of this weight can therefore be used for a prediction for the
dynamics of acute lung injury. For our specific application (i.e.
oxygen therapy), a pig of that body weight is an established animal
model which can be used as a patient model for a human weighing
two or three times. Note that many physiological parameters are
pretty similar in men and pigs, such as blood pressure, resting
heart rate or SaO2. There are, however, other physiological variables that scale with weight, such as tidal volume which is usually
given as ml/kg. However, an individual patient model is unique
and we require system identification in order to understand the
dynamical system. Hence, it is of great advantage that a PIA controller can give an optimal control performance with minimal
knowledge of the system model.
In the present study, the first-order model with time delay was
used for simulation and evaluation of the PIA. The system had a
time delay of 11 s, reflecting the translational effects in a change of
SaO2 according to a change in FiO2, which also depends on the
dynamics of the mechanical ventilator and of the subject, as well
as the positioning of the SaO2 catheter. In this case, the sensor was
placed in the carotid artery (Lüpschen et al., 2009), where the
artery supplies oxygenated blood to the brain. If the sensor is
positioned elsewhere (e.g. finger, ear or tongue), this leads to a
different time delay, which complicates the design of control
system. Therefore, with the PIA controller for the control of oxygenation, all states of the system are assumed be accessible but, in
practice, a state observer for a time-delay system may need to be
designed (Lu & Ho, 2014).
In the present study, a simple least-squares algorithm was used
as the parameter estimation technique; this provided satisfactory
results in the simulative control of oxygenation and in the experimental three-tank system. The algorithm estimates the unknown parameters (wi) in the cost function and seeks the best
parametric model to describe the system. In this particular application, no high computational costs are involved. However, the
time of data collection should be long enough to collect all information on system dynamics, so that the cost function can be
approximated more accurately. However, other parameter estimation techniques can be applied in a real implementation as an
alternative, e.g. recursive least-squares algorithm for online implementation of the control algorithm (Ljung, 1999), the Kalman
filter (Kalman, 1960), the Volterra series (Volterra, 1959), blockoriented structure (Pottmann & Pearson, 1998) or a neural network
(Abu-Khalaf & Lewis, 2005) to approximate the Hamiltonian
function and eventually solve the HJB equation.
A partially model-free algorithm based on the PIA was effectively applied in the simulation of the first-order model with timedelay of oxygenation using porcine dynamics. A gradual improvement of the system performance was iteratively achieved
and the criterion to stop the algorithm was almost no change in
the estimated parameters of the cost function. Therefore, the algorithm gives an optimal converged solution. This approach is
based not only on reinforcement learning with actor–critic architecture, but also on optimal control for long-term evaluation of
system performance. In this simulation, after the third iteration, a
satisfactory result was attained with no steady-state error and a
settling time of 60 s, compared to the initial settling time of 600 s
with the initial PI controller. This clearly demonstrates an optimal
performance based on minimization of the predefined Hamiltonian cost function and improvement of the settling time, leading
to optimal control of oxygenation and ultimately ameliorating the
adverse effects of hypoxia using this control algorithm.
Principally, the actor–critic structure is a two-step process of
two different subsystems: policy improvement and policy evaluation. Policy improvement is to apply the computed control input (u*) from the critic subsystem into the plant, which is obtained
by Eq. (5) based on a weighting function of the control input (R),
the gain of input dynamics (g(x)), and the gradient of the estimated cost function. Therefore, there is no requirement for
knowledge of the internal system dynamics; this is a clear advantage of this control strategy. However, knowledge of the system structure is required to be predetermined in terms of system
order. Subsequently, the critic subsystem computes the infinite
horizon cost associated with the provided controller and justifies
the result whether to continue in another iteration, or to stop in
this iteration. This structure leads to an adaptive learning process
to optimally control the predefined output variable. However, a
possible disadvantage can be an endless iteration (or a large
number of iterations) before achieving a converged solution,
which may be undesirable in real practice for closed-loop oxygen
therapy. Therefore, further implementation is required in animal
experiments to test this control strategy in a real clinical scenario.
For both linear and nonlinear cases, the condition of stabilizability is assumed. Therefore, the first important issue for the
implementation of this control strategy is to find an initial controller that can stabilize the system. For this particular application
of ventilation therapy, oxygen therapy in a common practice can
definitely improve oxygenation and it is provable whether the
system is stabilizable by giving additional FiO2. Hence, the system
is typically stabilizable for a treatable patient. However, for extreme pulmonary malfunction where oxygen therapy may not be
sufficient, PIA may not be applicable. In such situation, other
therapeutic approaches, for example extracorporeal membrane
oxygenation (Hill et al., 1972; Walter et al., 2009), are required in
order to improve oxygenation.
In our implementation for the interconnecting three-tank system, the PIA gives the optimal result to control the water level
with no requirement for knowledge of the internal nonlinear dynamics of the system. The settling time was approximately 130 s.
This works well in the presence of uncertainty regarding the
diameter and length of the connecting tubes, noise from the
pressure sensors, and disturbance related to water ripple in the
tanks (especially in Tank 1). Based on the practical implementation, a PIA controller should be realizable with optimal performance to cope with the nonlinear time-varying cardiopulmonary
system of patients with respiratory insufficiency. As stated, it is a
considerable advantage of this control strategy that we require
minimal knowledge of the system dynamics to achieve an optimal
control solution; moreover this should offer optimal outcome for
the treatment of patients requiring oxygen therapy. In addition, it
should be noted that our experiment based on the interconnecting
three-tank system is implemented in a deterministic and timeinvariant model that does not capture all challenges of the real
unknown and time-variant system dynamics. To introduce such
conditions, further system modifications should be done for example, a setup of control valves between the tanks to introduce
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
variable flow between the tanks to induce unknown dynamics for
the whole system. Further system assessment with respect to this
setup is useful for the experimental validation.
Based on the practical implementation of this optimal control
technique, the benefits include the following: a quasi model-free
implementation for the control strategy, a smooth control performance, and efficient control policy based on the gradient of the
estimated cost function (Doya, 2000). During the iteration process,
some system parameters might gradually change and the algorithm then adaptively tunes the control input to cope with this
change. Therefore, if the plant dynamics change significantly with
respect to the time and no stable control input can be computed,
the optimal solution may not be achieved. However, this was not
experienced during testing of our practical model. For application
of this algorithm in a real clinical situation, only a small change of
patient dynamics or time invariance is assumed for each patient
for a certain time frame. This assumption means that the patient
cannot change his transfer function faster than convergence time
of the PIA controller. The PIA controller should be able to provide
an optimal solution. It should be emphasized that the learning
process requires the iteration procedure and, if the system is stabilizable at the initial iteration, it will guarantee performance for
patient safety, or hypoxia will not appear during the learning
process.
Our plan is to implement this control strategy for the control of
oxygenation in the patient-in-the-loop configuration using porcine dynamics. For a feasibility study, both invasive measurement
of SaO2 and noninvasive measurement of peripheral oxygen saturation ( SpO2) should be implemented to stabilize and control
oxygenation. The online parameter estimation should also be
carried out during policy evaluation at every iteration. With this
promising technique, system integration in various clinical applications can be realized, e.g. in home-based mechanical ventilation
for patients with apnea, during anesthesia, in the intensive care
unit, or during the transport of patients with severe hypoxia. In
addition, a policy iteration algorithm with /2 and / ∞ optimal
state feedback controller may be introduced for application to
improve robustness and disturbance rejection (Al-Tamimi, AbuKhalaf, & Lewis, 2007; Pomprapa, Misgeld, Sorgato et al., 2013).
8. Conclusion
A policy iteration algorithm (PIA) has been proposed for nonlinear control of oxygenation, which could be a potential candidate
for closed-loop oxygen therapy. Based on reinforcement learning
and optimal control theory, policy improvement and policy evaluation are alternatively implemented and iterated until achieving
an optimal solution of the Hamilton–Jacobi–Bellman (HJB) equation in order to minimize the Hamiltonian cost function. One advantage of this control strategy is that there is no requirement for
knowledge of the system dynamics. Therefore, it should be suitable for this particular application, where the model of patient
dynamics is barely known and system identification to achieve
accurate model dynamics is time-consuming and critical for patients with severe hypoxia. In addition, this algorithm is implemented in a real non-linear plant of the interconnecting threetank system to demonstrate its control performance. The results
acquired with this optimal controller indicate a promising control
algorithm for possible implementation for the control of
oxygenation.
9
constructive comments that have significantly helped to improve
this paper.
Appendix A. Evolution of unknown parameters wi
The unknown parameters wi is computed based on the leastsquares algorithm. The curve of parameter evolution in every
iteration is presented in a natural logarithmic scale to show the
converged stable results of all parameters for the control of oxygenation (Fig. A1).
Fig. A1. Evolution of the unknown parameters ( wi ) for oxygenation control of three
iterations in a natural logarithmic scale.
Appendix B. Evaluation of model selection
The evaluation of different model alternatives is carried out
based on the input-output response, given in Fig. 4. Different
model structure is defined by ( np , nz , nd ), where np , nz and nd are
the number of poles, the number of zeros and the value of timedelay, respectively. For example, the identified first-order model
with time-delay of Eq. (9) can be represented by (1,0,11) (Table B1).
Table B1
Evaluation of different model alternatives.
Model ( np , nz , nd ) MSE based on system identification
(%)
MSE for validation (%)
(1,0,11)
(2,0,11)
(2,1,11)
(3,0,11)
(3,1,11)
(3,2,11)
(4,0,11)
(4,1,11)
(4,2,11)
(4,3,11)
(4,3,0)
1.6379
1.6586
1.6631
1.6970
1.7040
1.6956
1.6944
1.7012
1.6880
1.6889
1.8310
0.5070
0.5167
0.5103
0.5377
0.5353
0.5326
0.5364
0.5337
0.5120
0.4929
0.5345
References
Acknowledgment
The authors thank to three anonymous reviewers for their
Abu-Khalaf, M., & Lewis, F. (2005). Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
10
A. Pomprapa et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Automatica, 41(5), 779–791.
Al-Tamimi, A., Abu-Khalaf, M., & Lewis, F. (2007). Model-free Q-learning designs for
discrete-time zero-sum games with application to H-infiinity control. Automatica, 43(3), 473–482.
ARDSNet (2000). Ventilation with lower tidal volume as compared with traditional
tidal volumes for acute lung injury. New England Journal of Medicine, 342,1301–
1308.
Ashbaugh, D., Bigelow, D., Petty, T., & Levine, B. (1967). Acute respiratory distress in
adults. Lancet, 2(7511), 319–323.
Ballard-Croft, C., Wang, D., Sumpter, R., Zhou, X., & Zwischenberger, J. B. (2012).
Large-animal models of acute respiratory distress syndrome. The Annals of
Thoracic Surgery, 93 (4), 1331–1339.
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K., Lewis, F., & Dixon, W.
(2013). A novel actor–critic–identifier architecture for approximate optimal
control of uncertain nonlinear systems. Automatica, 49, 82–92.
Brunner, J. (2002). History and principles of closed-loop control applied to mechanical ventilation. Nederlandse Vereniging voor Intensive Care, 6, 6–9.
Clark, J. (1974). The toxicity of oxygen. American Review of Respiratory Disease, 110
(6Pt2), 40–50.
Claure, N., & Bancalari, E. (2009). Automated respiratory support in newborn infants. Seminars in Fetal and Neonatal Medicine, 14, 35–41.
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural
Computation, 12(1), 219–245.
East, T., Tolle, C., MCJames, S., Farrell, R., & Brunner, J. (1991). A non-linear closedloop controller for oxygenation based on a clinically proven fifth dimensional
quality surface. Anesthesiology, 75(3A), A468.
Giard, M., Bertrand, F., Robert, D., & Pernier, J. (1985). An algorithm for automatic
control of O2 and CO2 in artificial ventilation. IEEE Transactions on Biomedical
Engineering BME, 32, 658–667.
Hill, J., O'Brien, T., Murray, J., Dontigny, L., Bramson, M., Osborn, J., & Gerbode, F.
(1972). Prolonged extracorporeal oxygenation for acute post-traumatic respiratory failure (shock-lung syndrome). Use of the bramson membrane lung.
New England Journal of Medicine, 286, 629–634.
Isermann, R., & Münchhof, M. (2010). Identification of dynamic systems: An introduction with applications. Springer.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey.
Journal of Artificial Intelligence Research, 4, 237–285.
Kallet, R., & Matthay, M. (2013). Hyperoxic acute lung injury. Respiratory Care, 58(1),
123–141.
Kalman, R. (1960). A new approach to linear filtering and prediction problems.
Journal of Basic Engineering, 82(1), 35–45.
Kamalabady, A., & Salahshoor, K. (2009). Affine modeling of nonlinear multivariable
processes using a new adaptive neural network-based approach. Journal of
Process Control, 19(3), 380–393.
Kron, A., Böhm, S., & Leonhardt, S. (1999). Analytic model for the CO2 mass balance
in mechanically ventilated model. In European medical and biomedical engineering conference.
Kwok, H., Linkens, D., Mahfouf, M., & Mills, G. (2004). Adaptive ventilator FiO2
advisor: Use of non-invasive estimations of shunt. Artificial Intelligence, 32,
157–169.
Lachmann, B., Robertson, B., & Vogel, J. (1980). In vivo lung lavage as an experimental model of the respiratory distress syndrome. Scandinavian Society of
Anaesthesiology, 24, 231–236.
Ljung, L. (1999). System identification theory for the user. Prentice-Hall.
Lu, G., & Ho, D. (2014). Robust H1 observer for nonlinear discrete systems with time
delay and parameter uncertainties. IEE Proceedings—Control Theory and Applications, 151 (4), 439–444.
Lüpschen, H., Zhu, L., & Leonhardt, S. (2009). Robust closed-loop control of the
inspired fraction of oxygen for the online assessment of recruitment maneuvers. In Conference proceedings of the IEEE engineering in medicine and biology
society (pp. 495–498).
Mach, W., Thimmesch, A., Pierce, J. T., & Pierce, J. D. (2011). Consequences of hyperoxia and the toxicity of oxygen in the lung. Nursing Research and Practice,
2011, 260482.
Mitamura, Y., Mikami, T., & Yamamoto, K. (1975). A dual control system for assisting
respiration. Medical & Biological Engineering, 13(6), 846–853.
Ohtake, S., & Yamakita, M. (2010). Adaptive output optimal control algorithm for
unknown system dynamics based on policy iteration. In American control
conference (pp. 1671–1676).
Orani, N., Pisano, A., Franceschelli, M., Giua, A., & Usai, E. (2011). Robust reconstruction of the discrete state for a class of nonlinear uncertain switched
systems. Nonlinear Analysis Hybrid Systems, 5, 220–232.
Pomprapa, A., Alfocea, S., Göbel, C., Misgeld, B., & Leonhardt, S. (2014). Funnel
control for oxygenation during artificial ventilation therapy. In: The 19th IFAC
world congress.
Pomprapa, A., Mir Wais, M., Walter, M., Misgeld, B., & Leonhardt, S. (2015). Policy
iteration algorithm for the control of oxygenation. In The 19th IFAC symposium
on biological and medical system (pp. 517–522).
Pomprapa, A., Misgeld, B., Lachmann, B., & Leonhardt, S. (2013). Closed-loop ventilation of oxygenation and end-tidal CO2 . In IEEE systems, man and cybernetics
society.
Pomprapa, A., Misgeld, B., Sorgato, V., Stollenwerk, A., Walter, M., & Leonhardt, S.
(2013). Robust control of end-tidal CO2 using the / ∞ loop-shaping approach.
Acta Polytechnica, 53(6), 895–900.
Pomprapa, A., Pikkemaat, R., Lüpschen, H., Lachmann, & B., Leonhardt, S. (2010).
Self-tuning adaptive control of the arterial oxygen saturation for automated
“Open Lung” maneuvers. In 3. Dresdner Medizintechnik-Symposium.
Pomprapa, A., Schwaiberger, D., Pickerodt, P., Tjarks, O., Lachmann, B., & Leonhardt,
S. (2014). Automatic protective ventilation using the ARDSNet protocol with the
additional monitoring of electrical impedance tomography. Critical Care, 18,
R128.
Pottmann, M., & Pearson, R. (1998). Block-oriented NARMAX models with output
multiplicities. AIChE Journal, 44, 131–140.
Saihi, K., Richard, J.-C., Gonin, X., Krger, T., Dojat, M., & Brochard, L. (2014). Feasibility and reliability of an automated controller of inspired oxygen concentration during mechanical ventilation. Critical Care, 18, R35.
Suda, H., & Yamakita, M. (2013). On adaptive optimal control for hybrid systems. In
IEEE multi-conference on systems and control (pp. 859–864).
Sun, F., Sun, Z., Li, N., & Zhang, L. (1999). Stable adaptive control for robot trajectory
tracking using dynamic neural networks. Machine Intelligence & Robotic Control,
1(2), 71–78.
Tarpy, S., & Celli, B. (1995). Long-term oxygen therapy. New England Journal of
Medicine, 333, 710–714.
Vamvoudakis, K., & Lewis, F. (2010). Online actor–critic algorithm to solve the
continuous-time infinite horizon optimal control problem. Automatica, 46,
878–888.
Vamvoudakis, K., Vrabie, D., & Lewis, F. (2009). Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem. In IEEE symposium on adaptive dynamic programming and reinforcement
learning (pp. 36–41).
Volterra, V. (1959). Theory of functionals and of integrals and integro-differential
equations. Dover Publications,.
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., & Lewis, F. (2009). Adaptive optimal
control for continuous-time linear systems based on policy iteration. Automatica, 45, 477–484.
Walter, M., Wartzek, T., Kopp, R., Stollewerk, A., Kashefi, A., & Leonhardt, S. (2009).
Automation of long term extracorporeal oxygenation systems. In European control conference (pp. 2482–2486).
Wang, Y., Zhou, R., & Wen, C. (1993). Robust load-frequency controller design for
power systems. In IEE proceedings generation, transmission and distribution (pp.
11–16).
Wernrud, A. (2007). Strategies for computing switching feedback controllers. In
American control conference (pp. 1389–1394).
West, J., (2012). Respiratory physiology: The essentials (9th edition). LWW.
Wukitsch, M., Petterson, M., D.R., T., & Pologe, J. (1988). Pulse oximetry: Analysis of
theory, technology, and practice. Journal of Clinical Monitoring, 4 (4), 290-301.
Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4), 973–992.
Yu, C., He, W., So, J., Roy, R., Kaufman, H., & Newell, J. (1987). Improvement in arterial oxygen control using multiple model adaptive control procedures. IEEE
Transactions on Biomedical Engineering, 8, 567–574.
Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a
proof-of-concept in an interconnecting three-tank system. Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.
conengprac.2016.07.014i
Téléchargement