Optimal Oxygen Saturation Control via Policy Iteration

Telechargé par abouda saif

Téléchargement

Optimal learning control of oxygen saturation using

a policy iteration algorithm and a proof-of-concept

in an interconnecting three-tank system

Anake Pomprapa

, Steffen Leonhardt, Berno J.E. Misgeld

Philips Chair for Medical Information Technology, RWTH Aachen University, Aachen, Germany

article info

Article history:

Received 30 November 2015

Received in revised form

18 June 2016

Accepted 25 July 2016

Keywords:

Policy iteration algorithm

Optimal control

Reinforcement learning

Control of oxygen saturation

Biomedical control system

Closed-loop ventilation

abstract

In this work, “policy iteration algorithm”(PIA) is applied for controlling arterial oxygen saturation that

does not require mathematical models of the plant. This technique is based on nonlinear optimal control

to solve the Hamilton–Jacobi–Bellman equation. The controller is synthesized using a state feedback

conﬁguration based on an unidentiﬁed model of complex pathophysiology of pulmonary system in order

to control gas exchange in ventilated patients, as under some circumstances (like emergency situations),

there may not be a proper and individualized model for designing and tuning controllers available in

time. The simulation results demonstrate the optimal control of oxygenation based on the proposed PIA

by iteratively evaluating the Hamiltonian cost functions and synthesizing the control actions until

achieving the converged optimal criteria. Furthermore, as a practical example, we examined the per-

formance of this control strategy using an interconnecting three-tank system as a real nonlinear system.

1. Introduction

Hypoxia, or oxygen deﬁciency, is a common consequence of

respiratory insufﬁciency. If a patient with hypoxia is not properly

treated on time, the prolonged state of impaired oxygenation can

lead to potentially lethal conditions, such as cerebral hypoxia,

cardiac malfunction, or multiple organ failure. The most effective

therapy is to supply the patient with an increased oxygen fraction

in the ventilated air, which is referred to as oxygen therapy (Tarpy

& Celli, 1995). A proper control output of this therapy is the re-

setting oxygen content in the body, including peripheral oxygen

saturation (SpO

), arterial oxygen saturation (SaO

) or partial

pressure arterial oxygen (PaO

), can be used in a negative feedback

conﬁguration. A single-input single-output (SISO) system can be

formulated and a closed-loop control of oxygen therapy can be

realized in clinical scenarios, which are unique, complex, and life-

threatening.

To our knowledge, closed-loop ventilation for oxygen therapy

developed gradually (Brunner, 2002) and was dependent on pro-

gress in control engineering. Early publications date back to 1975:

this paper describes the control of SpO

(positioned at the ear)

with a ‘bang–bang’control of a controlling input fraction of in-

spired oxygen (FiO

) and the results of limit cycles achieved in

anesthetized dogs (Mitamura, Mikami, & Yamamoto, 1975). In

1985, a linear quadratic regulator (LQR) was shown for the dual

control of oxygen and carbon dioxide (Giard, Bertrand, Robert, &

Pernier, 1985). In 1987, a multiple-model adaptive control (MMAC)

was applied for the control of SpO

in mongrel dogs and the re-

sults were compared with a proportional integral (PI) controller

(Yu et al., 1987). In 1991, the multivariable inputs of FiO

and po-

sitive end-expiratory pressure (PEEP) were applied to control PaO

using a proportional–integral–derivative (PID) controller in four

mongrel dogs (East, Tolle, MCJames, Farrell, & Brunner, 1991). In

2004, a non-linear adaptive neuro-fuzzy inference system (ANFIS)

model was used to estimate shunt in combination with dynamic

changes of blood gases for controlling PaO

(Kwok, Linkens,

Mahfouf, & Mills, 2004). Further simulation of septic patients was

carried out based on a hybrid knowledge/model-based advisory

control. Recently, feedback-oriented oxygen therapy was pre-

sented in preterm infants, where the controlled variable was SpO

(Claure & Bancalari, 2009). In addition, based on the works in our

research group, a proportional–integral (PI) controller with gain

scheduling (Walter et al., 2009), a Smith predictor with an internal

PI controller (Lüpschen, Zhu, & Leonhardt, 2009), a knowledge-

based controller (Pomprapa, Misgeld, Lachmann, & Leonhardt,

2013), and, potentially, a self-tuning adaptive controller (Pom-

prapa, Pikkemaat, Lüpschen, Lachmann, & Leonhardt, 2010) and a

funnel controller (Pomprapa, Alfocea, Göbel, Misgeld, & Leonhardt,

2014) can be used for this particular control problem. Moreover, in

2014, SpO

between 92% and 94% was targeted based on the

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/conengprac

Control Engineering Practice

http://dx.doi.org/10.1016/j.conengprac.2016.07.014

Corresponding author.

E-mail address: [email protected] (A. Pomprapa).

Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a

proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.

conengprac.2016.07.014i

Control Engineering Practice ∎(∎∎∎∎)∎∎∎–∎∎∎

automatic adjustment of FiO

using a step change approach (Saihi

et al., 2014). Based on a literature search, three different categories

were found for the control system design and their advantages and

disadvantages are given below.

A model-based approach is a traditional procedural design in

the control community. Generally, it requires a mathematical de-

scription of the system behavior. The difﬁculty of this approach

lies in the selection of an appropriate model structure for this

particular problem and has to deal with model uncertainties based

on varying conditions of the system. Therefore, self-tuning adap-

tive control was introduced for this control problem (Pomprapa

et al., 2010). With this model-based approach, system performance

and further corrective actions can be preanalyzed.

A model-free approach requires neither prior knowledge of the

system nor system identiﬁcation, which is a major advantage.

Therefore, the time-consuming process for determining an in-

dividual model can be signiﬁcantly reduced, which is certainly

beneﬁcial for a patient with severe hypoxia. In fact, the patient

requires an immediate therapeutic action. Hence, this approach is

beneﬁcial in the real clinical scenarios. Some of the proposed

control techniques were, for example, a knowledge-based con-

troller (Pomprapa, Misgeld, Lachmann et al., 2013) or a funnel

controller (Pomprapa, Alfocea, et al., 2014). Nevertheless, the

drawback is the requirement for experience to reach a good

performance.

An intelligent hybrid approach of combined model-based and

model-free methods (Kwok et al., 2004) requires a knowledge and

model-based advisory system for intensive care ventilation. The

strength and the weakness have been mentioned for both model-

based approach and model-free approach. The design of this ap-

proach was based on neuro-fuzzy system (ANFIS), which should

closely resemble to mental operation. However, neither an animal

experiment nor a clinical result has yet been reported.

For best patient beneﬁts, we employ a model-free approach in

this work. Therefore, we examine the application of a generally

known control strategy to the control of oxygenation based on

SaO

measurement. For the ﬁrst time (ﬁrst results on this new

approach have been prepublished in Pomprapa, Mir Wais, Walter,

Misgeld, & Leonhardt, 2015), we propose to use a reinforcement

learning optimal control method, called the policy iteration algo-

rithm (PIA) (Bhasin et al., 2013;Vamvoudakis & Lewis, 2010;

Vrabie, Pastravanu, Abu-Khalaf, & Lewis, 2009), for this particular

application. This control strategy should provide an optimal so-

lution for not only treating hypoxia, but also avoiding hyperoxia

(excess oxygen in the lungs) (Clark, 1974;Kallet & Matthay, 2013;

Mach, Thimmesch, & Pierce, 2011). The PIA is classiﬁed as re-

inforcement learning with the actor–critic architecture framework,

which conventionally learns behavior through interactions with a

dynamic environment and a survey for the development of re-

inforcement learning can be found in Kaelbling, Littman, and

Moore (1996). Based on this particular approach, it improves the

effectiveness of learning power using a policy gradient for sto-

chastic search for an optimal result. The technique is based on a

recursive two-step iteration, namely policy improvement and

policy evaluation. These steps are repeated iteratively until

achieving the stopping criteria for the converged optimal solution

(Vamvoudakis & Lewis, 2010;Vrabie et al., 2009). Practical im-

plementations of PIA control in other areas include optimal-load-

frequency control in a power plant (Wang, Zhou, & Wen, 1993), a

DC–DC converter (Wernrud, 2007), adaptive steering control of a

tanker ship (Xu, Hu, & Lu, 2007), and a hybrid system of a jumping

robot (Suda & Yamakita, 2013). Therefore, it is motivated to apply

this modern control strategy to biomedical applications and, spe-

ciﬁcally, to closed-loop ventilation therapy. Computer simulations

are used to investigate the feasibility of this controller in oxygen

therapy. Subsequently, a proof-of-concept is implemented in a real

application for a nonlinear interconnecting three-tank system in

order to control the water level in the last tank. In fact, the dy-

namics of this three-tank system resembles that of a patient with

respiratory deﬁciency, i.e. nonlinear and with time-delay behavior.

In this article, Sections 2.1 and 2.2 provide the statement of the

medical perspective and the formulation of optimal control pro-

blem. Section 3 presents a control system design using the PIA for

nonlinear optimal control based on the Hamilton–Jacobi–Bellman

(HJB) equation. The system identiﬁcation of the cardiopulmonary

system to identify a mathematical model for further design of the

PIA controllers, and a simulation of this control strategy is pro-

posed in Sections 4 and 5, respectively. A practical example of this

controller is presented in Section 6 for controlling the water level

in a nonlinear interconnecting three-tank system. Section 7 pre-

sents a discussion, and our conclusions are presented in Section 8.

2. Problem formulation

2.1. Statement of the medical perspective

The overall transfer function from the settings of inspired

oxygen fractions FiO

to the oxygen saturation SaO

measured in

arterial blood depends on many factors, including lung function

and blood transport (which then depends on cardiac output, state

of circulation, etc.). Let G(s)¼SaO

(s)/FiO

(s) be the transfer func-

tion describing the system under investigation. G(s) certainly is

nonlinear (e.g. due to the nonlinear saturation function, see West,

2011) and depends on many unknown and time-variant conditions

(like e.g. ﬂuid status, heart function, or many diseases). Often, for

the individual patient requiring artiﬁcial ventilation, this model is

not available. Furthermore, as there usually is no time for in-

dividual model selection and parameter identiﬁcation, this has

been the motivation to investigate the performance of control al-

gorithms which do not require explicit models.

As an illustrative example and in order to roughly understand

the dynamics, we provide the transfer function G(s) of a female

domestic pig weighing 34 kg, which we obtained during a la-

boratory experiment (Lüpschen et al., 2009). In this special case,

respiratory distress was induced by repeated lung lavage (ARDS-

model, acute respiratory distress syndrome; Ashbaugh, Bigelow,

Petty, & Levine, 1967;Lachmann, Robertson, & Vogel, 1980).

2.2. Optimal control problem

We consider a speciﬁc class of nonlinear systems, namely afﬁne

nonlinear systems of the following form:

() () () ()

̇=+· ()

tf gutxxx,1

where

χ()∈ ⊆tx

denotes the states of the system in a vector

form of ndimension,

υ()∈ ⊆ut

represents the control input or

FiO

, and

χυ×→:n

is Lipschitz continuous on

, such that

the state vector

(

)

is unique for a given initial condition

. In-

itially, it is assumed that the system is stabilizable. This type of

model has been used to describe the dynamics of various plants,

for example a robot manipulator (Sun, Sun, Li, & Zhang, 1999), a

continuous stirred-tank reactor (CSTR) (Kamalabady & Salahshoor,

2009) and a non-interconnecting three-tank system (Orani, Pisa-

no, Franceschelli, Giua, & Usai, 2011).

Let us consider a cost function

(

given by Eq. (2), which is to

be minimized.

∫

τττ()= (() ()) ()

∞

Vrudxx,, 2

let

(

)

xr,u

be determined by

()+

RxT

.If

()∈

is positive-

A. Pomprapa et al. / Control Engineering Practice ∎(∎∎∎∎)∎∎∎–∎∎∎2

Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a

proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.

conengprac.2016.07.014i

deﬁnite and continuously differentiable (for example, if

=x0

then

()=

and

()>

for all

), then

∈

represents a

positive-deﬁnite matrix for a penalty or weighting of the control

input, and

∞

denotes a time when the states are sufﬁciently close

to zero.

The Hamiltonian of this control problem can be deﬁned by

(*)= ( () ())+∇ *((())+ (())() ()

HVrut Vf gut

xx xx,u, t, t t , 3

where the gradient

∇*

satisﬁes the following HJB equation and

∇

symbolizes a partial differential in

. We note that

(∇

*)= ()

HuV

xmin , , 0 4

The optimal solution is provided by

⎛

⎝

⎜⎞

⎠

⎟⎛

⎝

⎜⎞

⎠

⎟

*=− ∇ *()

−

ugV

2x, 5

which is an algorithm for policy improvement (Ohtake & Yama-

kita, 2010). For a linear continuous-time system, the HJB equation

transforms to the well-known Riccati equation (Vamvoudakis,

Vrabie, & Lewis, 2009). The converged solution is equivalent to a

result from a linear–quadratic-regulator (LQR) controller. If the

optimal control policy (Eq. (5)) is substituted in the nonlinear

Lyapunov equation, the HJB equation can be derived in terms of

the Hamiltonian cost function (Vamvoudakis & Lewis, 2010).

For this particular control problem, we insert the optimal so-

lution of Eq. (5) into Eq. (3) with the condition provided in Eq. (4).

It yields

=()+

+∇ *()((())+ (())*()

Quu Vf gtu

Rxxx0x t . 6

This equation can be rewritten as

=()+∇*()()− ∇ *()() ()∇ *() ()

−

QVf VggV

xxRxx0x x

4x,

TTT

xx x

where

*()=

and this is a formulation of the HJB equation. The

aim of implementation of optimal control is to solve the HJB

equation (Eq. (7)) in order to ﬁnd the Hamiltonian cost function

and substitute it in Eq. (5) for the control policy. However, with a

nonlinear nature, it is generally difﬁcult to ﬁnd the solution to the

HJB equation. In this article, the policy iteration algorithm is in-

troduced for the optimal control of oxygen saturation based on the

animal model of induced acute respiratory distress syndrome

(ARDS), which is given in the subsequent sections. The iterative

process is therefore introduced for solving this optimal control

problem (Vamvoudakis & Lewis, 2010;Vrabie et al., 2009).

3. Policy iteration algorithm

The evaluation function equation (2) can be modiﬁed in the

following form:

∫

τττ

˜(())= (() ()) +˜(( + )) ()

VrudVT

xx xt,t, 8

where

~()=

and T represents a sampling time used in a process

of evaluation. Hence,

˜(()

can be solved in time domain using

Eq. (8) and the control signal

can be updated based on Eq. (5).

Subsequently, the function of

˜(()

can be estimated by either a

linear function or a nonlinear function of

(

)

, so that all unknown

parameters can be computed by using a simple least-squares al-

gorithm, a gradient descent algorithm, a recursive least squares

algorithm (Vrabie et al., 2009) or a neural network (Bhasin et al.,

2013) for parameter estimation. The proof of stability and guar-

anteed convergence for the control policy has been provided by

Vrabie et al. (2009) for a linear system and by Vamvoudakis and

Lewis (2010) for a nonlinear system under the assumption that the

system is stabilizable for a linear case and the solution of nonlinear

Lyapunov equation is smooth and positive deﬁnite for the non-

linear case. The block diagram of the PIA is shown in Fig. 1, with an

internal actor and critic subsystem of state feedback architecture.

Based on Fig. 2, the procedure for PIA implementation is pro-

vided as follows:

1. Assign an initial controller that can stabilize the system.

2. Compute the evaluation function

˜(()

numerically in time

domain, with

˜()=

3. Based on the response of

˜(()

in the previous step, estimate

all unknown parameters in the evaluation function using the

least squares algorithm (Vrabie et al., 2009) or other parameter

estimation techniques.

4. Evaluate a norm between the updated estimated unknown

parameters and the previous estimated updated parameters

compared to a predeﬁned value (

) (a convergence test). A PIA

must be terminated if the norm value is satisﬁed or the esti-

mated parameters are converged. Otherwise, an update of the

controller is required based on Eq. (5) for policy improvement

and a return to Step 2 to compute the evaluation function

˜(()

for further policy evaluation.

Fig. 1. Block diagram of the policy iteration algorithm of full state feedback

structure.

Fig. 2. Flowchart for the implementation of the policy iteration algorithm.

A. Pomprapa et al. / Control Engineering Practice ∎(∎∎∎∎)∎∎∎–∎∎∎ 3

Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a

proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.

conengprac.2016.07.014i

Aﬂowchart of this algorithm is presented in Fig. 2. A pre-

requisite for this algorithm is that an initial controller that can

stabilize the system does exist. In practice, a simple proportional–

integral (PI) controller can be used as the controller for the ﬁrst

evaluation of the system performance. Thus, PIA will optimize the

Hamiltonian cost function at each iteration in order to achieve an

extremal cost for the underlying nonlinear system. Using these

steps, online implementation can be realized, leading to an opti-

mal controller for the nonlinear system. It is worth noting that an

iteration of policy evaluation and policy improvement is required

until achieving the converged optimal result of the estimated

unknown parameters (

). Therefore, a repeated control im-

plementation must be active in order to update the new control

policy and compute further values of the cost function, which is a

basic characteristic of this control scheme. The details of the

complete algorithm can be found in Vrabie et al. (2009).

4. System identiﬁcation of the oxygenation model

A model of the cardiopulmonary system is identiﬁed using a

large animal model based on induced acute lung injury. The

mathematical model was identiﬁed based on measurements from

a female domestic pig weighing 34 kg. The animal received pre-

medication, and was then anesthetized and tracheotomized in

supine position. A catheter for measurement of SaO

(CeVOX,

Pulsion Medical System SE, Feldkirchen, Germany) was inserted

into the carotid artery. Under the condition of stable ventilatory

response, the animal underwent mechanical ventilation (SERVO

300, Marquet Critical Care AB, Solna, Sweden). The system con-

ﬁguration is shown in Fig. 3; a medical panel PC (POC-153, Ad-

vantech Co., Ltd, Taipei, Taiwan) was used for data acquisition.

During baseline ventilation of the animal, acute respiratory

distress syndrome (ARDS) was induced using repetitive lung la-

vage for washout of surfactant (Lachmann et al., 1980). Warm

isotonic saline solution of 9% was used during the entire process of

ARDS induction. All animal procedures were approved by the local

ethical authorities. Surfactant depletion results in increased al-

veolar surface tension, which facilitates alveolar collapse or partial

lung collapse (Ballard-Croft, Wang, Sumpter, & Zhou, 2012).

To implement system identiﬁcation, multi-step FiO

settings

were given to extract the underlying porcine dynamics during

induced ARDS. Measurement of the input–output relationship is

presented in Fig. 4. We separated the data into two data sets. The

ﬁrst part, containing 75% of all measured data, was used for system

identiﬁcation, while the second data set was used for validation.

System identiﬁcation was initially carried out with the assumption

of a ﬁrst-order model with time delay (Lüpschen et al., 2009).

After estimating the unknown parameters with a validation data

set, the following plant transfer function was derived at a certain

operating point (FiO

¼0.59 and SaO

¼93.3%). This transfer func-

tion, given in Eq. (9), incorporates the dynamics of the mechanical

ventilator, the porcine dynamics, and the dynamics of the SaO

sensor:

()= ()

() =+·()

−

GSaO

FiO

0.153

17.29 1 e9

It should be noted that a healthy subject normally has

97% in ambient air at standard altitude air pressure. A state of

hypoxia can be observed in Fig. 4. The FiO

value was given more

than that of the ambient air (FiO

¼0.21) in order to maintain

output SaO

in the range of 85–100%. For this particular system, a

time-delay of 11 s is identiﬁed. For further implementation for a

control system design, the Padé-approximation technique was

applied to approximate the delay. This is in fact a nonlinear sys-

tem, but we use only a small region and thus can roughly estimate

the system by a linear model. The plant transfer function is sim-

pliﬁed as follows:

()= +·−+ − +

++ + ()

ss s

G0.153

17.29 1

1.091 0.4959 0.09016

1.091 0.4959 0.09016 10

This mathematical model is used for a design of the PIA controller

for controlling SaO

. Closed-loop ventilation is formed based on

oxygen therapy in order to optimally adjust the controlling input

FiO

for this hypoxic cardiopulmonary system.

The frequency responses are also provided in Fig. 5 for both

system identiﬁcation data set and the validation data set. In the

system identiﬁcation data set, the model represents the measured

data well in the sense of least mean squares algorithm for para-

meter estimation.

Different model alternatives have been studied to a higher-or-

der model and the model evaluation are based on the mean

squares error (MSE). The summaries of model evaluation are given

in Table B1. According to the results given in Table B1 in the ap-

pendix, the ﬁrst-order model with time-delay based on the Padé-

approximation (Eq. (10)) gives the best prediction in terms of MSE

for the data set of model validation. Therefore, this model will be

used for further simulation of closed-loop oxygen therapy.

5. Simulation of the closed-loop oxygen therapy

Based on the identiﬁed model of Eq. (10), the derived transfer

function is served as an unknown model of pathophysiology of the

pulmonary system for this simulation of the closed-loop oxygen

therapy. It is therefore replaced the ‘System’block diagram as

previously given in Fig. 4. The full procedure for implementing PIA

as given in the ﬂowchart is then carried out with an initial

Fig. 3. Block diagram of an oxygen therapy system based on a porcine model of

induced ARDS.

Fig. 4. Input–output response for system identiﬁcation based on a porcine model of

induced ARDS (Lüpschen et al., 2009).

A. Pomprapa et al. / Control Engineering Practice ∎(∎∎∎∎)∎∎∎–∎∎∎4

Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a

proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.

conengprac.2016.07.014i

proportional-integral (PI) controller with parameters K

¼1 and

¼0.5 in order to stabilize the system. Based on this pre-

determined control structure, the simulation can be set up and all

system states and the control input u(t) are recorded with a ﬁxed

sampling time of 100 ms. A cost function V(x) based on Eq. (2) can

be computed by setting Q(x)¼I and R¼1 regardless of the system

model. Further evaluation function

˜(

can be thereafter calcu-

lated based on Eq. (8). Subsequently,

∇*

can be computed for a

control policy, as given in Eq. (5). The synthesized control action is

iteratively implemented until achieving the converged optimal

solution.

For a control system design of PIA, the evaluation function

˜(

is set up for the approximation of

˜(()

based on four different

states, namely

xxx,,

123

and

, where the time delay was calcu-

lated by a third-order Padé approximation, deﬁned as follows:

()

=+ + + ++

++++ ()

Vxxxx

wxwxxwxxwxxwxwxx

wxxwxwxxwx

,,,

,11

12 3 4

2212 313 41 4 52

2623

72 4 83

293 4 10 4

where

{

∈

wi I:

and

≤

}

i10

is the set of unknown parameters,

which are linearly independent based on different basis functions.

This equation is the simplest parametric form of second-order

surface squared by four states for ﬁtting the evaluation function,

which requires further mathematical manipulation for a policy

update. A higher-order surface would be alternatively applied for a

better ﬁtting, but has not pursued due to the potential danger of

overﬁtting.

Based on the animal model of induced acute lung injury pro-

vided in Eq. (9) (Lüpschen et al., 2009), we assumed that all in-

ternal states are accessible for state feedback conﬁguration of the

PIA controller. This is a drawback. If the system states are not

physically accessible and further design of state observer is deﬁ-

nitely required (Lu & Ho, 2014). With this strong assumption for

simulation, the stable converged solution of

at the third itera-

tion is given that

=w6.8

=w8.4

=w14.47

=w8.35

=w10.52

=w18.3

=w9.6

=w14.1

=w15.81

, and

=w14.32

, using the initial PI controller. The evolution of

given in the Appendix (Fig. A1).

Fig. 6 shows the control performance of a step response

(

()=yt 1

ref

) at an initial iteration with a stable result for the control

of oxygenation.

The control performance at the third iteration is also shown

with the optimal result for the control of oxygenation in Fig. 7 that

all states are driven to become zero with the settling time about

60 s compared to that of 600 s by the initial PI controller, as shown

in Fig. 6.

This simulation shows the non-minimum phase at the onset of

control, which the output

(

)

or SaO

responded to in the reverse

direction. This was due to the delay effect because of a positive

zero position after implementation of the Padé approximation.

Furthermore, the optimal response can be achieved after applica-

tion of the PIA at the third iteration in terms of cost function and

Fig. 5. Comparison of Bode diagram for the identiﬁed system model and the measured data set.

Fig. 6. Control performance at an initial iteration for the control of oxygenation.

Fig. 7. Comparative control performance of policy iteration algorithm for the

control of oxygenation.

A. Pomprapa et al. / Control Engineering Practice ∎(∎∎∎∎)∎∎∎–∎∎∎ 5

Please cite this article as: Pomprapa, A., et al. Optimal learning control of oxygen saturation using a policy iteration algorithm and a

proof-of-concept in an interconnecting three-tank system.Control Engineering Practice (2016), http://dx.doi.org/10.1016/j.

conengprac.2016.07.014i

1 / 10 100%

Documents connexes

A Guide to Choose Right Industrial Automation Equipment

myfileokj

Conferences-internationales1

Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans l'interface ou les textes ? Ou savez-vous comment améliorer l'interface utilisateur de StudyLib ? N'hésitez pas à envoyer vos suggestions. C'est très important pour nous!

GDPR Confidentialité Conditions d'utilisation

Optimal Oxygen Saturation Control via Policy Iteration

Documents connexes

Faire une suggestion

Produits

Assistance

Produits

Assistance

Optimal Oxygen Saturation Control via Policy Iteration

Documents connexes

Faire une suggestion

Produits

Assistance

Ajouter ce document à la (aux) collections

Ajouter ce document à enregistré

Suggérez-nous comment améliorer StudyLib