Telechargé par Things أشياء

Talk5 Optimization

publicité
10 mars 2019
1 / 30
Plan
1
the Fixed point algorithms
2
proximal operator
3
primal-dual problem
4
Alternating Direction Method of Multipliers
10 mars 2019
2 / 30
the Fixed point algorithms
Recall :
(Fixed point) : Let f : X → R a function continues on X. We say that α is
a fixed point of f if f (α) = α.
(Fermat’s rule) : Let f a proper function and x ∈ X . We have the
equivalence : x ∈ argminf ⇔ 0 ∈ ∂f (x ).In other words
argminf = zer ∂f = x ∈ X /0 ∈ ∂f (x )
10 mars 2019
3 / 30
the Fixed point algorithms
Objective :
We want to calculate numerically a minimizer of a convex function
f : X →]∞, +∞]. According to Fermat’s rule you have to find a point x
such as 0 ∈ ∂f (x ). The case where f derivable, this is equivalent to finding
a zero of the gradient 0 = ∇f (x ). In this case a popular algorithm is the
gradient algorithm, consist tô generate a suite (x k : k = 0, 1, ...) defined
by :
x k+1 = x k − γ∇f (x k ).
We can note T (x ) = x − γ∇f (x ) and the algorithm becomes in form
x k+1 = T (x k ) such as any fixed point of T is a minimizer of f. There are
many optimization algorithms written in this form where T is an
application chosen so that his fixed points are solutions to the problem. It
is therefore necessary to highlight conditions on T which guarantee the
convergence of the algorithm to a fixed point.
10 mars 2019
4 / 30
α-averaged applications
Notation
Let L > 0 and R, T : X → X two applications. The image of x by R is
noted R(x) or Rx. The TR compound will also be noted more compactly
TR. We note the identity (I(x ) = x ) by I.
10 mars 2019
5 / 30
α-averaged applications
Notation
Let L > 0 and R, T : X → X two applications. The image of x by R is
noted R(x) or Rx. The TR compound will also be noted more compactly
TR. We note the identity (I(x ) = x ) by I.
Definition 1 :
Application R is said L-Lipschitz if ∀(x , y ) ∈ X 2 : ||Rx − Ry || ≤ L||x − y ||.
If L < 1, we say that R is a contraction and if L=1, R so-called
non-expansive.
10 mars 2019
5 / 30
α-averaged applications
Definition 2 :
Let α ∈ [0, 1] Application T is said α-averaged if there’s a non-expansive R
1
application such as T = αR + (1 − α)I. Application -averaged is said
2
firmly non-expansive.
10 mars 2019
6 / 30
α-averaged applications
Definition 2 :
Let α ∈ [0, 1] Application T is said α-averaged if there’s a non-expansive R
1
application such as T = αR + (1 − α)I. Application -averaged is said
2
firmly non-expansive.
proposition
Let α ∈ [0, 1], the following assertions are equivalent :
1.T is α-averaged.
2.-∀(x , y ) ∈ X 2 : ||Tx − Ty ||2 ≤ ||x − y ||2 − α||(I−T(1−α)
)x −(I−T )y ||2
10 mars 2019
6 / 30
α-averaged applications
proposition
Let T : X → X application such as
∀(x , y ) : hTx − Ty , x − y i ≥ ||Tx − Ty ||2 . so T is firmly non-expansive.
10 mars 2019
7 / 30
α-averaged applications
proposition
Let T : X → X application such as
∀(x , y ) : hTx − Ty , x − y i ≥ ||Tx − Ty ||2 . so T is firmly non-expansive.
lemma
Let T and S two applications : X → X respectively α-averaged and β
-averaged, where 0 < α, β < 1. ∃δ, 0 < δ < 1 such as TS is δ -averaged.
10 mars 2019
7 / 30
Cocoercive functions
Definition
A function B : X → X is called µ-cocoercive (µ > 0) if µB is firmly
non-expansive. The definition means that :
∀(x , y ) ∈ X 2 , hBx − By , x − y i ≥ µ||Bx − By ||2
.
10 mars 2019
8 / 30
Cocoercive functions
Definition
A function B : X → X is called µ-cocoercive (µ > 0) if µB is firmly
non-expansive. The definition means that :
∀(x , y ) ∈ X 2 , hBx − By , x − y i ≥ µ||Bx − By ||2
.
proposition
Let B a function µ-cocoercive and 0 < γ ≤ 2µ. So I − γB est
γ
-averaged.
2µ
10 mars 2019
8 / 30
Cocoercive functions
Theorem (Krasnosel’skii Mann)
Let 0 < α < 1 et T application α -averaged such as fix (T ) 6= ∅. Any
sequence x k satisfying the recursion x k+1 = T (x k ) converges to a fixed
point of T.
10 mars 2019
9 / 30
Cocoercive functions
Theorem (Krasnosel’skii Mann)
Let 0 < α < 1 et T application α -averaged such as fix (T ) 6= ∅. Any
sequence x k satisfying the recursion x k+1 = T (x k ) converges to a fixed
point of T.
Corollary
Let B a function µ-cocoercive such as zerB 6= ∅. Let 0 < γ < 2µ. So every
sequence (x k )k satisfying x k+1 = x k − γB(x k ) converges to a point of
zerB.
10 mars 2019
9 / 30
Example : Gradient algorithm
Hypothesis
Let f : X →] − ∞, +∞] is convex, derivable on X and Of is L-Lipschitz.
10 mars 2019
10 / 30
Example : Gradient algorithm
Hypothesis
Let f : X →] − ∞, +∞] is convex, derivable on X and Of is L-Lipschitz.
Theorem (Baillon-Haddad)
Under hypothesis
∀(x , y ) ∈ X 2 , hOf (x ) − Of (y ), x − y i ≥
L−1 Of is firmly non-expansive.
1
||Of (x ) − Of (y )||2 . particularly
L
10 mars 2019
10 / 30
Example : Gradient algorithm
Hypothesis
Let f : X →] − ∞, +∞] is convex, derivable on X and Of is L-Lipschitz.
Theorem (Baillon-Haddad)
Under hypothesis
∀(x , y ) ∈ X 2 , hOf (x ) − Of (y ), x − y i ≥
L−1 Of is firmly non-expansive.
1
||Of (x ) − Of (y )||2 . particularly
L
Theorem
We assume that the hypothesis is satisfied and that argminf 6= ∅. Let
2
0 < γ < . Any sequence x k satisfying the recursion
L
x k+1 = x k − γOf (x k ) converges to an minimizer of f.
10 mars 2019
10 / 30
proximal operator
Definition:
Given a function f : E →] − ∞, +∞] the proximal operator (proximal
mapping) of f is the operator given by :
1
proxf (x ) = argminy ∈E {f (y ) + ky − x k2 }
2
the schaled proximal operator is defind by :
proxγf (x ) = argminy ∈E {f (y ) +
1
ky − x k2 }
2γ
10 mars 2019
11 / 30
proximal operator
Definition:
Given a function f : E →] − ∞, +∞] the proximal operator (proximal
mapping) of f is the operator given by :
1
proxf (x ) = argminy ∈E {f (y ) + ky − x k2 }
2
the schaled proximal operator is defind by :
proxγf (x ) = argminy ∈E {f (y ) +
1
ky − x k2 }
2γ
remark
1
Takes a vector x ∈ E , the argmin{f + k. − x k} is a set , wich might be
2
empty , a singleton , or a set with multiple vectors for this we loock the
cas where the operator is well defined.
10 mars 2019
11 / 30
proximal operator
proposition(*):
Let f ∈ Γ0 (E ) , So :
p = proxf (x ) ⇔ x ∈ p + ∂f (p)
proxf is well defined as the application of E to E
for any x , y ∈ E :
hx − y , proxf (x ) − proxf (y )i ≥ kproxf (x ) − proxf (y )k2
10 mars 2019
12 / 30
proximal operator
Exampls:
1
proxc (x ) = argminy ∈E {c + ky − x k2 } = x
2
1
proxh.,ai+b (x ) = argminy ∈E {hy , ai + b + ky − x k2 } = x − a
2
1
1
prox (x ) 1
= argminy ∈E { hx , Ax i − hb, x i + ky − x k2 }
2
2
hx ,Ai−hb,x i
2
= (A − I)−1 (x − b)
10 mars 2019
13 / 30
proximal operator
proposition:
Lets f1 ......fn functions of Γ0 (E ). for any x = (x1 , ., ..., xn ) ∈ E ; posed
f (x ) = f1 (x1 ) + f2 (x2 ) + ..... + fn (xn ) :
proxf (x ) = (proxf1 (x1 ), proxf2 (x2 ), .., .., proxfn (xn ))
10 mars 2019
14 / 30
proximal operator
proposition:
Lets f1 ......fn functions of Γ0 (E ). for any x = (x1 , ., ..., xn ) ∈ E ; posed
f (x ) = f1 (x1 ) + f2 (x2 ) + ..... + fn (xn ) :
proxf (x ) = (proxf1 (x1 ), proxf2 (x2 ), .., .., proxfn (xn ))
theorem:
Let the point x ∗ ∈ R n is a minimizer of a proper closed convex function f
if and only if ; x ∗ = proxf (x )
10 mars 2019
14 / 30
Proximal gradient algorithm
in this part we are intersted in two functions f and g , such that f is
convex , derivable and its derivation is lipschitzienne . according to the
above ,try to minimize the sum f + g is to loock for on element x ∗ such
that 0 ∈ ∂(f + g)(x ∗ ) = ∇f (x ∗ ) + ∂g(x ∗ ).
which is equivalent to write −∇f (x ∗ ) ∈ ∂g(x ∗ ) or
−∇f (x ∗ ) ∈ ∂g(x ∗ ) ⇒ x ∗ − ∇f (x ∗ ) ∈ x ∗ + ∂g(x ∗ )
Let the point x ∗ ∈ R n is a minimizer of a proper closed convex function f
if and only if ; x ∗ = proxf (x ∗ ) so according to the above
,x ∗ − ∇f (x ∗ ) ∈ x ∗ ∂g(x ∗ )
⇔ x ∗ = proxg (x ∗ − ∇f (x ∗ )
. can extand the remark ,observing that there is identity between the
minimizers ,of f + g and the minimizers of γf + γg for all γ ≥ 0. in other
words we have :
10 mars 2019
15 / 30
Proximal gradient algorithm
proposition:
Lets f , g ∈ Γ0 (E ) , Two functions , such that f derivable and its dirivation
is lipshtizienne . so
x ∗ ∈ argmin(f + g)
if and only if x ∗ = proxγg (x ∗ − γ∇f (x ∗ ))
the above proprite suggests the following algorithm ,called the proximal
gradient algorithm :
x k+1 = proxγg (x k − γ∇f (x k ))
theorem:
Lets f , g ∈ Γ0 (E ), Two functions such that f derivale and its ∇f
2
L-lipschitzienne for 0 < γ < ,any suite x k satisfying the previous
L
recutsion converges to a minimizer of f + g.
10 mars 2019
16 / 30
application
Let c ⊂ E be a closed convex set , we are interested in the problem :
infx ∈C f (x ) (∗)
we define the indicator function ic of the set c by :
(
Sic (x ) =
0 ,x ∈ C
(1)
+∞ , if not
we verify immediately that : proxic (x ) = pc (x ) so the gradient algorithm
proximal is given by :
x k+1 = PC (x k − Oγf (x k )) :
becouse the problem (*) is equivalent to
infx ∈E f (x ) + ic (x )
under the hypotheses of the theorem , this algorithm convergs to an
minimizer of f + ic ,that is to say an minimizer of f on c
10 mars 2019
17 / 30
Iterative soft-thresholding
we put E = Rn and we are interested to the problem :
infx ∈E {f (x ) + ηkx k1 }
where kx k1 is the norm l1 of the vector x defined by
kx k1 = |x1 | + |x2 | + .... + |xn |. for allx = (x1 , x2 , ..., , xn ) we define the
function
proposition:
The function proxη|x |1 coincides with the function called soft-thresholding
defined for any x ∈ R by :
Sη (x ) =



x − η , for x > η
0 for x ∈ [−η, η]
(2)


 x + η for x < η
10 mars 2019
18 / 30
Iterative soft-thresholding
proposition:
in this case, the proximal gradient algorithm takes the forme :
x k+1 = proxγη|x |1 (x k − γ∇f (x k ))
we put
y k = x k − γ∇f (x k )
so xik+1 = Sγη (yik ) (∀i = 1, , , n) if f derivable convex and ∇f
L-lipschitzienne then this iteratif converge to a miminizer of f + ηk.k1
10 mars 2019
19 / 30
Monotone operators
remark:
In previous chapters, we have seen that the operator proximal proxf of a
function f ∈ Γ0 (X ) and the point p that satisfies :
x ∈ p + ∂f (p)
10 mars 2019
20 / 30
Monotone operators
proposition (*) ensures this inclusion x ∈ p + ∂f (p) well defines a single
point p, the resulting proxf application is firmly non-expansive. In
re-reading the proof of proposition(*), we see that this result comes of the
next property of the sub-differential, called the monotonic property :
∀(u; v ) ∈ ∂f (x ) × ∂f (y ); hu − v , x − y i ≥ 0
The purpose of this paragraph is to extend the proposal to applications
A : X → P(X ) which are not necessarily written as sub-differentials, but
always trust the property of monotony. For such applications, we are able
to extend the notion of proximal operator.
10 mars 2019
21 / 30
Monotone operators
definition
the operator A : X → P(X ) is monotone if the next proposal is true for all
(x ; y ) → P(X )2 :
∀(u; v ) ∈ A(x ) × A(y ); hu − v ; x − y i ≥ 0
10 mars 2019
22 / 30
Monotone operators
definition
the operator A : X → P(X ) is monotone if the next proposal is true for all
(x ; y ) → P(X )2 :
∀(u; v ) ∈ A(x ) × A(y ); hu − v ; x − y i ≥ 0
proposition:
for all A : X → P(X ) , we denote A−1 the application that has all x ∈ X
associates :
A−1 (x ) = {y ∈ X : x ∈ A(y )}
In other words, we have equivalence
y ∈ A−1 (x ) ⇔ x ∈ A(y ).
10 mars 2019
22 / 30
operation on sub-differentials
proposition:
Let f : X →]∞, +∞] and g :Y →] − ∞, +∞] be convex functions and
M ∈ L(X , Y). We think that
0 ∈ ri(Mdomf − domg)
(3)
∂(f + g ◦ M)(x ) = ∂f(x ) + M ∗ ∂g(Mx ).
(4)
So
10 mars 2019
23 / 30
primal-dual problem
Position of the problem
Let X and Y be two Euclidean spaces and let M : X →
7 Y be a linear
operator. Given two real convex functions f and g on X . We conseder the
minimization problem :
infx ∈X f(x ) + g(Mx )
(5)
A minimizer of(5) is called an optimal primal point. Under the hypothess
of proposition(.),finding an optimal primal point is equivalent to find x
such that
0 ∈ ∂f(x ) + M ∗ ∂g(Mx )
10 mars 2019
24 / 30
primal-dual problem
by definition, x is optimal primal if and if there exists λ ∈ ∂g(Mx ) such
that 0 ∈ ∂f(x ) + M ∗ λ By a simple rewrite game, the condition
λ ∈ ∂g(Mx ) returns to Mx ∈ (∂g)−1 (λ).finally, in order to find a point x
optimal-primal, just find a couple(x , λ) ∈ X × Y such that
(
0 ∈ ∂f(x ) + M ∗ λ
0 ∈ −Mx + (∂g)−1 (λ)
(6)
the problem is rewritten :
(
0 ∈ ∂f(x ) + M ∗ λ
0 ∈ −Mx + ∂g ∗ (λ)
(7)
the problem is sometime called the primal-dual problem.
10 mars 2019
25 / 30
Alternating direction method of multipliers
dual ascent
Consider the problem
minx f (x ) subject to Ax = b
where f is strictly convex and closed. Denote Lagrangian :
L(x , u) = f (x ) + u T (Ax − b)
Dual gradient ascent repeats, for k = 1, 2, 3, . . .
x (k) = argminx L(x , u (k−1) )
u (k) = u (k−1) + tk (Ax (k) − b)
10 mars 2019
26 / 30
Alternating direction method of multipliers
Augmented Lagrangian method
considers the modified problem, for a parameter ρ > 0,
ρ
minx f (x ) + kAx − bk22 subject to Ax = b
2
uses modified Lagrangian
ρ
Lρ (x , u) = f (x ) + u T (Ax − b) + kAx − bk22
2
and repeats, for k = 1, 2, 3, . . .
x (k) = argminx Lρ (x , u (k−1) )
u (k) = u (k−1) + ρ(Ax (k) − b)
10 mars 2019
27 / 30
Alternating direction method of multipliers
Alternating direction method of multipliers or ADMM : combines the best
of both methods. Consider a problem of the form :
minx ,z f (x ) + g(z) subject to Ax = b
We define augmented Lagrangian, for a parameter ρ > 0,
ρ
Lρ (x , z, u) = f (x ) + g(z) + u T (Ax + Bz − c) + kAx + Bz − ck22
2
We repeat, for k = 1, 2, 3, . . .
x (k) = argminx Lρ (x , z (k−1) , u (k−1) )
z (k) = argminx Lρ (x k , z, u (k−1) )
u (k) = u (k−1) + ρ(Ax (k) + Bz (k) − c)
10 mars 2019
28 / 30
Alternating direction method of multipliers
Connection to proximal operators
Consider
minx f (x ) + g(x ) ⇔ minx ,z f (x ) + g(z) subject to x = z
ADMM steps :
x (k) = proxf ,1/ρ (z (k−1) − w (k−1) )
z (k) = proxg,1/ρ (x (k) + w (k−1) )
w (k) = w (k−1) + x (k) − z (k)
where proxf ,1/ρ is the proximal operator for f at parameter 1/, and
similarly for proxg,1/ρ .
In general, the update for block of variables reduces to prox update
whenever the corresponding linear transformation is the identity
10 mars 2019
29 / 30
10 mars 2019
30 / 30
Téléchargement