Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Difference of two norms-regularizations for Q-Lasso

Difference of two norms-regularizations for Q-Lasso Revised 6 July 2018 Accepted 8 July 2018 The focus of this paper is in Q-Lasso introduced in Alghamdi et al. (2013) which extended the Lasso by Tibshirani (1996). The closed convex subset Q belonging in a Euclidean m-space, for m∈ IN, is the set of errors when linear measurements are taken to recover a signal/image via the Lasso. Based on a recent work by Wang (2013), we are interested in two new penalty methods for Q-Lasso relying on two types of difference of convex functions (DC for short) programming where the DC objective functions are the difference of l and l norms 1 σ and the difference of l and l norms with r > 1. By means of a generalized q-term shrinkage operator upon the 1 r special structure of l norm, we design a proximal gradient algorithm for handling the DC l − l model. Then, σ 1 σ q q based on the majorization scheme, we develop a majorized penalty algorithm for the DC l − l model. The 1 r convergence results of our new algorithms are presented as well. We would like to emphasize that extensive simulation results in the case Q ¼fbg show that these two new algorithms offer improved signal recovery performance and require reduced computational effort relative to state-of-the-art l and l (p∈ ð0; 1Þ) models, see 1 p Wang (2013). We also devise two DC Algorithms on the spirit of a paper where exact DC representation of the cardinality constraint is investigated and which also used the largest-q norm of l and presented numerical results that show the efficiency of our DC Algorithm in comparison with other methods using other penalty terms in the context of quadratic programing, see Jun-ya et al. (2017). Keywords Q-Lasso, Split feasibility, Soft-thresholding, DC-regularization, Proximal gradient algorithm, Majorized penalty algorithm, Shrinkage, DCA algorithm Paper type Original Article 1. Introduction and preliminaries The process of compressive sensing (CS) [8], which consists of encoding and decoding, is rapidly consolidated year after year due to the blooming of large datasets which become increasingly important and available. The process of encoding involves taking a set of (linear) measurements, b ¼ Ax, where A is a matrix of size m3n.If m < n, we can compress the signal x∈ IR , whereas the process of decoding is to recover x from b where x is assumed to be sparse. It can be formulated as an optimization problem, namely min kxk subject to Ax ¼ b; (1.1) © Abdellatif Moudafi. Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/ legalcode Publishers note: The publisher wishes to inform readers that the article “Difference of two norms- regularizations for Q-Lasso” was originally published by the previous publisher of Applied Computing and Informatics and the pagination of this article has been subsequently changed. There has been no Applied Computing and change to the content of the article. This change was necessary for the journal to transition from the Informatics previous publisher to the new one. The publisher sincerely apologises for any inconvenience caused. To pp. 79-89 access and cite this article, please use Moudafi, A. (2020), “Difference of two norms-regularizations for Emerald Publishing Limited e-ISSN: 2210-8327 Q-Lasso”, New England Journal of Entrepreneurship. Vol. 17 No. 1, pp. 79-89. The original publication p-ISSN: 2634-1964 date for this paper was 19/17/2018. DOI 10.1016/j.aci.2018.07.002 where k$k is the l norm, which counts the number of nonzero entries of x; namely 0 0 ACI 17,1 kxk ¼ jfx ; x ≠ 0gj (1.2) 0 i i with j$jbeing here the cardinality, i.e., the number of elements of a set. Hence minimizing the l norm amounts to finding the sparsest solution. One of the difficulties in CS is solving the decoding problem above, since l optimization is NP-hard. An approach that has gained popularity is to replace l by the convex norm l since it often gives a satisfactory sparse 0 1 solution and has been applied in many different fields such as geology and ultrasound imaging. More recently, nonconvex metrics were used as alternative approaches to l , especially the nonconvex metric l for p∈ ð0; 1Þin [6] which can be interpreted as a continued approximation strategy of l as p→ 0. A great deal of research has been conducted into l problems including 0 p all kinds of variants and related algorithms, as you can see in [4] and references therein. The convex l relaxation compared to the nonconvex problem (l ) is generally more difficult to 1 p handle. However, it was shown in [12] that the potential reduction method can solve this special nonconvex problem in polynomial time with arbitrarily given accuracy. Most recently, the majority of such sparsity inducing functions are unified as the notion of DC programming in [9], including log-sum, smoothly clipped absolute deviation and capped-l penalty. Generally, DC programming problem can be solved through a primal–dual convex relaxations algorithm which is famous in the literature of DC Programming [11]. Other algorithms appeared as for solving application problems of DC programming in the area of finance and insurance, data analysis, machine learning as well as signal processing. However, as noted in [18], among the above mentioned DC programming approaches for sparse reconstruction, most of them are mainly preserving the separability properties of both l and l norms. To begin with, let us recall that the lasso of Tibshirani [16] is given by the following minimization problem min kAx  bk þ γ kxk ; (1.3) x∈IR A being an m3n real matrix, b∈ IR and γ > 0 is a tuning parameter. The latter is nothing else than the basic pursuit (BP) of Chen et al. [7], namely min kxk such that Ax ¼ b: (1.4) x∈IR However, the constraint Ax ¼ b being inexact due to errors of measurements, the problem (1.4) can be reformulated as min kxk subject to kAx  bk ≤ ε; (1.5) 1 p x∈IR where ε > 0 is the tolerance level of errors and p is often 1; 2or ∞. It is noticed in [1] that (1.5) can be rewritten as min kxk subject to Ax∈ Q; (1.6) x∈IR in the case when Q :¼ B ðbÞ, the closed ball in IR with center b and radius ε. Now, when Q is a nonempty closed convex set of IR and P the orthogonal projection from IR onto the set Q and by observing that the constraint is equivalent to the condition Ax− P ðAxÞ¼ 0, this leads to the following Lagrangian formulation Q 1 Difference of min kðI  P ÞAxk þ γkxk ; (1.7) Q 1 x∈IR 2 two norms- regularizations γ > 0 being a Lagrangian multiplier. A link is also made in [1] with split feasibility problems [5] which consist in finding x satisfying x∈ C; Ax∈ Q; (1.8) n m with C and Q two nonempty closed convex subsets of IR and IR , respectively. An equivalent formulation of (1.8) as a minimization problem is given by min kðI  P ÞAxk ; (1.9) x∈C and its l -regularization is min kðI  P ÞAxk þ γkxk ; (1.10) Q 1 x∈C with γ > 0 a regularization parameter. This convex relaxation approach was frequently employed, see for example [1,20] and references there in. As the level curves of l -l are closer to l than those of l , this motivated us 1 2 0 1 in [14] to propose a regularization of split feasibility problems by means of the nonconvex l -l , 1 2 namely min kðI  P ÞAxk þ γðkxk kxk Þ; (1.11) Q 1 2 x∈C 2 and present three algorithms with their convergence properties [14]. Unlike the separable sparsity inducing functions involved in the aforementioned DC programming for problem ðl Þ, we are interested in the two first sections of this work to two specific types of DC programming with un-separable objective functions, which are in the form of difference functions between two norms, namely the new notion l denoting the sum of q largest elements of a vector in magnitude (i.e., the l norm of q-term best approximation of a vector) introduced in [18] and the classical l norm with r > 1. Obviously l and l ðr > 1Þ are regular r σ r convex norms. The corresponding DC programs are as follows: min kxk  εkxk : Ax ¼ b ; (1.12) 1 σ x∈IR and minðkxk  εkxk : Ax ¼ bÞ; (1.13) 1 r x∈IR where ε∈ ð0; 1; kxk is defined as the sum of the q largest elements of x in magnitude, q∈ f1; 2; $$$ng and r > 1. We would like to emphasize that the following least-squares variant of (1.12) and (1.13), were studied in the recent work by Wang [18]: min f ðxÞ :¼ kAx  bk þ μ kxk  εkxk ; (1.14) 1 σq where μ > 0 and ε∈ ð0; 1Þ, and ACI 1 min f ðxÞ :¼ kAx  bk þ μðkxk  εkxk Þ ; (1.15) 1 r 17,1 x where r > 0 and ε∈ ð0; 1Þ. This paper proposes generalizations to Q-Lasso, namely min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  εkxk ; Q 1 σ x 2 where μ > 0 and ε∈ ð0; 1Þ, as well as min f ðxÞ :¼ kðI  P ÞAxk þ μðkxk  εkxk Þ ; Q 1 r where r > 0 and ε∈ ð0; 1Þ, and our attention will be focused on the algorithmic aspect. The rest of the paper is organized as follows. In Sections 2 and 3, two DC-penalty methods instead of conventional methods such as l or l − l minimization are proposed. Their 1 1 2 convergence to a stationary point are also analyzed. The first iterative minimization method is based on the gradient proximal algorithm and the second one is designed by means of the majored penalty strategy. Furthermore, relying on DCA (difference of convex algorithm) two other algorithms are proposed and their convergence results are established in Section 3 and 4. 2. Proximal gradient algorithm First, we recall that the subdifferential of a convex function f is given by n n vfðxÞ :¼ fu∈ IR ; fðyÞ≥ fðxÞþ hu; y  xi ∀y∈ IR g: (2.1) 1 2 Each element of vfðxÞ is called subgradient. If fðxÞ¼ kðI − P ÞAxk , it is well-known that vfðxÞ¼ ∇fðxÞ¼ A ðI  P ÞAx; (2.2) and when fðxÞ¼kxk , we have sgnðx Þ if x ≠ 0; i i ðvfðxÞÞ ¼ (2.3) ½1; 1 if x ¼ 0: The indicator function of a set C ⊆ IR is defined by 0if x∈ C; i ðxÞ¼ (2.4) þ∞ otherwise: Moreover, the normal cone of a set C at x∈ C, denoted by N ðxÞ is defined as N ðxÞ :¼ fd ∈ IR jhd; y  xi≤ 0;∀y∈ Cg: (2.5) Connection between the above definitions is given by the key relation vi ¼ N . C C In this section our interest is in solving the DC programming min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  εkxk ; (2.6) Q 1 σ where μ > 0 and ε∈ ð0; 1Þ. Similar to l norm, l norm, etc., we adopt the notation kxk to denote the norm of l which 1 2 σ σ q q Difference of is defined a line below (1.13) and we design an iterative algorithm based both on a generalized two norms- q-term shrinkage operator and on the proximal gradient algorithm framework. regularizations At this stage, observe that the restriction on ε guarantees that f ðxÞ≥ 0 for all x. To solve (2.6), we consider the following standard proximal gradient algorithm: 1. Initialization: Let x be given and set L > λ T with λ T the maximal 0 maxðA AÞ maxðA AÞ eigenvalue. 2. For k ¼ 0; 1; $$$ find D E T 2 x ∈ Argmin A ðI  P ÞAx ; x  x þ kx  x k þ μ kxk  εkxk : (2.7) kþ1 Q k k k 1 σ x∈IR Observe that subproblem (2.7) can equivalently formulated as L 1 T 2 min kx  x  A ðI  P ÞAx k þ μ kxk  εkxk : (2.8) k Q k 1 σ 2 L Thus, it suffices to consider the solutions to the following minimization problem min kx  yk þ λ kxk  λ kxk ; (2.9) 1 1 2 σ x 2 with a given vector y and positive numbers λ > λ > 0. An explicit solution of this problem is 1 2 given by the following result, see [18]. Proposition 2.1 Let fi ;  ; i g be the indices such that 1 n y ≥ y ≥  ≥ y : i i i 1 2 n Then x :¼ prox ðyÞ with λ kxk −λ kxk 1 1 2 σ signðy Þmaxfjy j ðλ  λ Þ; 0g if i ¼ i ; i ; $$$; i ; * i i 1 2 1 2 q x ¼ (2.10) signðy Þmaxfjy j  λ ; 0g otherwise i i 1 is a solution of (2.9). The proximal operator above (called the generalized q-term shrinkage operator in [18]) amounts to write the algorithm as follows: Proximal Gradient Algorithm: 1. Start: Let x be given and set L > λ T with λ T the maximal eigenvalue. maxðA AÞ maxðA AÞ 2. For k ¼ 0; 1;  find y ¼ x  A ðI  P ÞAx ; kþ1 k Q k Sort y as y ≥ y ≥  ≥ y ; kþ1 i i i 1 2 signðy Þmaxfj y j  μð1  εÞ; 0g if i ¼ i ; i i l ðx Þ ¼ (2.11) kþ1 signðy Þmaxfj y j  μ; 0g otherwise i i where l ¼ 1;   ; q. End. Now, we are in a position to show the following convergence result of the scheme (2.7): Proposition 2.2 The sequence ðx Þ generated by the Proximal Gradient Algorithm above ACI converges to a stationary point of problem (2.6). 17,1 1 2 Proof. Remember that hðxÞ ¼ kðI − P ÞAxk is differentiable and its gradient ∇hðxÞ¼ A ðI − P ÞAx is Lipschitz continuous with constant L :¼ λ T .By[3]- maxðA AÞ Proposition A.24, we have D E 2 T f ðx Þ≤ kðI  P ÞAx k þ A ðI  P ÞAx ; x  x 84 kþ1 Q k Q k kþ1 k þ kx  x k þ μ k x k  εkx k : kþ1 k kþ1 1 kþ1 σ Combining this with definition of x , we obtain kþ1 L  L f ðx Þ≤ f ðx Þ kx  x k : (2.12) kþ1 k kþ1 k Since L > L, we see immediately that f ðx Þ≤ f ðx Þ and thus the sequence ðf ðx ÞÞ is kþ1 k k convergent since f is a non-negative function. Furthermore, we obtain that kx − x k < þ∞ which follows by summing (2.12) from k ¼ 0to ∞. As a further kþ1 k consequence, we note that μð1  εÞkx k ≤ μ kx k  εkx k ≤ f ðx Þ≤ f ðx Þ: k 1 k 1 k σ k 0 Since μð1− εÞ > 0, we have that ðx Þ is bounded. Moreover, the objective function f is square term plus a piecewise linear function which ensures that f is semi-algebric and hence satisfies Kurdyka-Lojasiewicz inequality. [2]-Theorem 5.1 is then applicable and obtain that ðx Þ is convergent to a stationary point of (2.6). , 3. Majorized penalty algorithm Consider the following minimization problem min f ðxÞ :¼ kðI  P ÞAxk þ μðkxk  εkxk ; (3.1) Q 1 r x 2 m3n m where A∈ IR ; Q a nonempty closed convex set of IR ; r > 0 and ε∈ ð0; 1Þ. First, observe again that conditions on ε guarantees that f ðxÞ≥ 0 for all x. We will now describe an algorithm for solving (3.1), based on the majorized penalty approach see, for example, [18] and references therein. Following the same lines as in [18], we start by constructing a ~ T n majorization of f. To that end let L > λ ðA AÞ,thenfor any x; y∈ IR ,wehave max D E 1 1 L 2 2 2 kðI  P ÞAxk ≤ kðI  P ÞAyk þ A ðI  P ÞAy; x  y þ kx  yk : Q Q Q 2 2 2 2 2 2 Moreover, by invoking the convexity of the norm kxk and definition of its subdifferential, we also have kxk ≥ kyk þ hgðyÞ; x  yi with gðyÞ∈ vkyk ; r r r where r−1 signðy Þjy j i i if y≠ 0; r−1 kyk ½gðyÞ ¼ (3.2) 0 otherwise: Hence, if we define Difference of D E 2 two norms- Fðx; yÞ¼ kðI  P ÞAyk þ A ðI  P ÞAy; x  y Q Q regularizations þ kx  yk þ μðkxk  εkyk  εhgðyÞ; x  yiÞ; 1 r hence, for every x; y∈ IR , we get ~ ~ Fðx; yÞ≥ f ðxÞ and Fðy; yÞ¼ f ðyÞ: Starting with an initial iterate x , the majorized penalty approach above updates x by solving 0 k x ¼ argminFðx; x Þ: (3.3) kþ1 k This leads to the following explicit formulation of x by means of the proximity (shrinkage) kþ1 operator of kxk : D E T 2 x ¼ argmin A ðI  P ÞAx ; x  x þ kx  x k þ μðkxk  ε gðx Þ; x  x h iÞ kþ1 Q k k k 1 k k L 1 T 2 ¼ argmin kx  x þ A ðI  P ÞAx  μεgðx Þ k þ μkxk k Q k k 1 2 L ¼ prox x  A I  p Ax  μεgðx Þ k Q k k kxk n o ¼ sgnðv Þ+max v  ; 0 ; k k where v ¼ x  A ðI  P ÞAx  μεgðx Þ with gðx Þ∈ vkx k : k k Q k k k k r We summarize the algorithm as follows: Majorized Penalty Algorithm: 1. Initialization: Let x be given and set L > λ T . maxðA AÞ 2. For k ¼ 0; 1;  find n o 1 μ x ¼ sgn x  A ðI  P ÞAx  μεgðx Þ +max v  ; 0 (3.4) kþ1 k Q k k k L L End. The following proposition contains the convergence result of this Penalty Algorithm. Proposition 3.1 Let ðx Þ be the sequence generated by the Majorized Penalty Algorithm above. Then ~ ~ kx  x k ≤ f ðx Þ f ðx Þ: (3.5) k kþ1 k kþ1 Furthermore, the sequence ðx Þ is bounded and any cluster point is a stationary point of problem (3.1). Proof. Since x minimizes Fðx; x Þ, thanks to the first-order optimality condition we can kþ1 k write 0∈ A ðI  P ÞAx þ Lðx  x Þþ μkx k  μεgðx Þ; (3.6) Q k kþ1 k kþ1 1 k gðx Þ being a subgradient of kxk at x . This combined with the definition of the k r kþ1 ACI subdifferential of kxk at x gives kþ1 17,1 D E μ kx k  μkx k ≥ −A ðI  P ÞAx  Lðx  x Þþ μεgðx Þ; x  x k 1 kþ1 1 Q k kþ1 k k k kþ1 D E ¼ −A ðI  P ÞAx þ μεgðx Þ; x  x þ Lkx  x k Q k k k kþ1 kþ1 k D E ¼ A ðI  P ÞAx þ μεgðx Þ; x  x þ Lkx  x k : Q k k kþ1 k kþ1 k Hence D E μkx k  μkx k þ A ðI  P ÞAx  μεgðx Þ; x  x ≤  Lkx  x k : kþ1 1 k 1 Q k k k kþ1 kþ1 k This together with the definition of F, for any k≥ 1, leads to D E T 2 ~ ~ ~ f ðx Þ f ðx Þ≤ Fðx ; x Þ f ðx Þ¼ A ðI  P ÞAx ; x  x þ kx  x k kþ1 k kþ1 k k Q k kþ1 k kþ1 k þ μðkx k kx k  εgðx Þ; x  x iÞ kþ1 1 k 1 k k kþ1 ¼ kx  x k þ μkx k  μkx k kþ1 k kþ1 1 k 1 D E þ A ðI  P ÞAx  μεgðx Þ; x  x Q k k kþ1 k 2 2 ≤ kx  x k  Lkx  x k : kþ1 k kþ1 k 2 2 Consequently, ~ ~ f ðx Þ f ðx Þ≤  kx  x k : (3.7) kþ1 k kþ1 k ~ ~ ~ ~ Hence f ðx Þ≤ f ðx Þ and thus the sequence ðf ðx ÞÞ is convergent since f is a non-negative kþ1 k k function. Furthermore, the sequence ðx Þ is such that kx  x k < þ∞: kþ1 k k¼0 Indeed, by summing (3.7) from k ¼ 0to ∞, we obtain that ~ ~ ~ kx  x k ≤ f ðx Þ lim f ðx Þ≤ f ðx Þ < þ∞: kþ1 k 0 k 0 k→þ∞ k¼0 Consequently, the sequence ðx Þ is asymptotically regular, i.e., lim kx − x k¼ 0. On k k→þ∞ k kþ1 the other hand, observe that the definition of f for any k≥ 1, leads to ~ ~ μðkx k  εkx k Þ≤ kðI  P ÞAx k þ μðkx k  εkx k Þ¼ f ðx Þ≤ f ðx Þ: k 1 k r Q k k 1 k r k 0 2 ~ Since kx k ≥ kx k , we obtain that μð1− εÞkx k ≤ f ðx Þ. This implies that ðx Þ is bounded k 1 k r k r 0 k Difference of since 0 < ε < 1. To conclude, we prove that every cluster point of ðx Þ is a stationary point of two norms- * * (3.1). Let x be a cluster point of ðx Þ, then x ¼ lim x ; ðx Þ being subsequence of ðx Þ.By k V k k k v v regularizations passing to the limit in (3.6) along the subsequence ðx Þ and in the light of the upper semicontinuity of (Clarke) subdifferentials, we obtain the desired result, namely T * * * 0∈ A ðI  P ÞAx þ μvkx k  μεgðx Þ; Q 1 which is nothing else than the first-order optimality condition of (3.1)., 4. DCA algorithm Now we turn our attention to a DC Algorithm (DCA), where the dual step at each iteration can be efficiently carried out due to the accessible subgradients of the largest- q-norm k$k and k$k norm. Remember that to find critical points of f :¼ f− ψ, the DCA consists in designing of sequences ðx Þ and ðy Þ by the following rules k k y ∈ vψðx Þ; k k (4.1) x ¼ argmin ðfðxÞðψðx Þþ hy ; x  x iÞÞ: kþ1 x∈IR k k k Note that by the definition of subdifferential, we can write ψðx Þ≥ ψðx Þþ hy ; x  x i: kþ1 k k kþ1 k Since x minimizes fðxÞ− ðψðx Þþ hy ; x− x iÞ, we also have kþ1 k k k fðx Þðψðx Þþ hy ; x  x iÞ≤ fðx Þ ψðx Þ: kþ1 k k kþ1 k k k Combining the last inequalities, we obtain f ðx Þ¼ fðx Þ ψðx Þ≥ fðx Þðψðx Þþ hy ; x  x iÞ≥ f ðx Þ: k k k kþ1 k k kþ1 k kþ1 Therefore, the DCA leads to a monotonically decreasing sequence ðf ðx ÞÞ that converges as long as the objective function f is bounded below. Now, we can decompose the objective function in (2.6) as follows min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  με kxk ; (4.2) Q 1 σ x 2 1 2 where μ > 0, ε∈ ð0; 1Þ, here fðxÞ¼ kðI − P ÞAxk þ μkxk and ψðxÞ¼ μεkxk . Q 1 σ At each iteration, DCA solves The convex subproblem defined by linearizing the concave term −εkxk is solved by DCA at each iteration until a convergence condition is satisfied. More precisely, we have y ∈ με vkx k ; > k k σ (4.3) x ¼ argmin n kðI  P ÞAxk þ μkxk  μεkx k þ hy ; x  x i : kþ1 x∈IR Q 1 k σ k k Especially, if either the function f or ψ is polyhedral, the DCA is said to be polyhedral and terminates in finite iterations [15]. Note that the our proposed DCA is polyhedral since the q q largest-q norm term −εkxk can be expressed as a pointwise maximum of 2 C linear q n functions, see [10]. On the other hand, the subdifferential of kxk at a point x is given in, see σ k for example [19], ! n n ACI X X vkx k argmax j½x  jy : y ¼ q; 0≤ y ≤ 1; i ¼ 1;  ; n ; (4.4) k σ k i i i 17,1 q i i¼1 i¼1 that is n o vkx k ¼ ðy ;  ; y Þ : y ¼  ¼ y ¼ 1; y ¼ 0 ¼  ¼ y ¼ 0 ; k σ 1 n i i i i q 1 q qþ1 n where y denotes the element of y corresponding to x in the linear program (4.4). Observe that i i j j a subgradient y∈ vkx k can be computed efficiently by first sorting the elements j x j in k σ i decreasing order, namely x ≥ x ≥  ≥ x . Then, assign 1 to y which corresponds i i i i 1 2 to x  ; x . i i 1 q To conclude, let us consider the following DC formulation of (3.1): min f ðxÞ :¼ kðI  P ÞAxk þ μkxk ðμεkxk Þ ; (4.5) Q 1 r 1 2 where r > 0, ε∈ ð0; 1Þ, here fðxÞ¼ kðI − P ÞAxk þ μkxk and ψðxÞ¼ μεkxk . Q 1 r The subgradient y∈ vkx k is also available via the formula (3.2) and the DCA in this k r context take the following form y ∈ μεvkx k ; k k r (4.6) x ¼ argmin n kðI  P ÞAxk þ μkxk ðμεkx k þ hy ; x  x iÞ : kþ1 x∈IR Q 1 k r k k where r−1 signð½x  Þj½x  j k k i i if x ≠ 0; r−1 kx k ½y  ¼ k k r 0 otherwise: For the details of DCA convergence properties, see [15]. 5. Concluding remarks The focus of this paper is on Q-Lasso relying on two new DC-penalty methods instead of conventional methods such as l or l − l minimization developed in [13,17] and [21]. Two 1 1 2 iterative minimization methods based on the gradient proximal algorithm as well as the majored penalty algorithm are designed and their convergence to a stationary point is proved. Furthermore, by means of DC (difference of convex) Algorithm, two other algorithms are devised and their convergence results are also stated. References [1] M.A. Alghamdi, M. Ali Alghamdi, Naseer Shahzad, H.-K. Xu, Properties and iterative methods for the Q-Lasso, Abstr. Appl. Anal. (2013), Article ID 250943, 8 pages. [2] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., Ser. A 137 (2013) 91–129. [3] D.-P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999. [4] A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse Difference of modeling of signals and images, SIAM Rev. 51 (2009) 34–81. two norms- [5] Y. Censor, T. Elfving, A multiprojection algorithm using Bregman projections in a product space, regularizations Numer. Algorithms 8 (1994) 221–239. [6] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett. 14 (2007) 707–710. [7] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998) 33–61. [8] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52 (2006) 1289–1306. [9] G. Gasso, A. Rakotomamonjy, S. Canu, Recovering sparse signals with a certain family of nonconvex penalties and dc programming, IEEE Trans. Signal Process. 57 (12) (2009) 4686–4698. [10] Jun-ya Gotoh, Akiko Takeda, Katsuya Tono, DC formulations and algorithms for sparse optimization problems, Math. Program. (2017) 1–36. [11] R. Horst, N.V. Thoai, Dc programming: overview, J. Optim. Theory Appl. 103 (1999) 1–41. [12] S. Ji, K.-F. Sze, Z. Zhou, A.M.-C. So, Y. Ye, Beyond convex relaxation: A polynomial-time non- convex optimization approach to network localization, in: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM 2013), Torino, 2013. [13] Y. Lou, M. Yan, Fast l l Minimization via a proximal operator, J. Sci. Comput. (2017) 1–19. 1 2 [14] A. Moudafi, A. Gibali, l l regularization of split feasibility problems, Numer. Algorithms (2017) 1 2 1–19, http://dx.doi.org/10.1007/s11075-017-0398-6. [15] T. Pham Dinh, H.A. Le Thi, Convex analysis approach to D.C. programming: Theory, algorithms and applications, Acta Math. Vietnamica 22 (1) (1997) 289–355. [16] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B 58 (1996) 267–288. [17] P. Yin, Y. Lou, Q. He, J. Xin, Minimization of l for compressed sensing, SIAM J. Sci. Comput. 37 12 (2015) 536–563. [18] Y. Wang, New improved penalty methods for sparse reconstruction based on difference of two norms, Technical Report (2013) 1–11. [19] B. Wu, C. Ding, D.F. Sun, K.C. Toh, On the Moreau-Yoshida regularization of the vector k-norm related functions, SIAM J. Optim. 24 (2014) 766–794. [20] Xu. Hong-Kun, Maryam A. Alghamdi, Naseer Shahzad, Regularization for the split feasibility problem, J. Nonlinear Convex Anal. 17 (3) (2015) 513–525. [21] Z. Xu, X. Chang, F. Xu, H. Zhang, l regularization: a thresholding representation theory and a 12 fast solver, IEEE Trans. Neural Networks Learn. Syst. 23 (2012) 1013–1027. Corresponding author Abdellatif Moudafi can be contacted at: abdellatif.moudafi@univ-amu.fr For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Computing and Informatics Emerald Publishing

Difference of two norms-regularizations for Q-Lasso

Applied Computing and Informatics , Volume 17 (1): 11 – Jan 4, 2021

Loading next page...
 
/lp/emerald-publishing/difference-of-two-norms-regularizations-for-q-lasso-vIXX71SiTP
Publisher
Emerald Publishing
Copyright
© Abdellatif Moudafi
ISSN
2634-1964
DOI
10.1016/j.aci.2018.07.002
Publisher site
See Article on Publisher Site

Abstract

Revised 6 July 2018 Accepted 8 July 2018 The focus of this paper is in Q-Lasso introduced in Alghamdi et al. (2013) which extended the Lasso by Tibshirani (1996). The closed convex subset Q belonging in a Euclidean m-space, for m∈ IN, is the set of errors when linear measurements are taken to recover a signal/image via the Lasso. Based on a recent work by Wang (2013), we are interested in two new penalty methods for Q-Lasso relying on two types of difference of convex functions (DC for short) programming where the DC objective functions are the difference of l and l norms 1 σ and the difference of l and l norms with r > 1. By means of a generalized q-term shrinkage operator upon the 1 r special structure of l norm, we design a proximal gradient algorithm for handling the DC l − l model. Then, σ 1 σ q q based on the majorization scheme, we develop a majorized penalty algorithm for the DC l − l model. The 1 r convergence results of our new algorithms are presented as well. We would like to emphasize that extensive simulation results in the case Q ¼fbg show that these two new algorithms offer improved signal recovery performance and require reduced computational effort relative to state-of-the-art l and l (p∈ ð0; 1Þ) models, see 1 p Wang (2013). We also devise two DC Algorithms on the spirit of a paper where exact DC representation of the cardinality constraint is investigated and which also used the largest-q norm of l and presented numerical results that show the efficiency of our DC Algorithm in comparison with other methods using other penalty terms in the context of quadratic programing, see Jun-ya et al. (2017). Keywords Q-Lasso, Split feasibility, Soft-thresholding, DC-regularization, Proximal gradient algorithm, Majorized penalty algorithm, Shrinkage, DCA algorithm Paper type Original Article 1. Introduction and preliminaries The process of compressive sensing (CS) [8], which consists of encoding and decoding, is rapidly consolidated year after year due to the blooming of large datasets which become increasingly important and available. The process of encoding involves taking a set of (linear) measurements, b ¼ Ax, where A is a matrix of size m3n.If m < n, we can compress the signal x∈ IR , whereas the process of decoding is to recover x from b where x is assumed to be sparse. It can be formulated as an optimization problem, namely min kxk subject to Ax ¼ b; (1.1) © Abdellatif Moudafi. Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/ legalcode Publishers note: The publisher wishes to inform readers that the article “Difference of two norms- regularizations for Q-Lasso” was originally published by the previous publisher of Applied Computing and Informatics and the pagination of this article has been subsequently changed. There has been no Applied Computing and change to the content of the article. This change was necessary for the journal to transition from the Informatics previous publisher to the new one. The publisher sincerely apologises for any inconvenience caused. To pp. 79-89 access and cite this article, please use Moudafi, A. (2020), “Difference of two norms-regularizations for Emerald Publishing Limited e-ISSN: 2210-8327 Q-Lasso”, New England Journal of Entrepreneurship. Vol. 17 No. 1, pp. 79-89. The original publication p-ISSN: 2634-1964 date for this paper was 19/17/2018. DOI 10.1016/j.aci.2018.07.002 where k$k is the l norm, which counts the number of nonzero entries of x; namely 0 0 ACI 17,1 kxk ¼ jfx ; x ≠ 0gj (1.2) 0 i i with j$jbeing here the cardinality, i.e., the number of elements of a set. Hence minimizing the l norm amounts to finding the sparsest solution. One of the difficulties in CS is solving the decoding problem above, since l optimization is NP-hard. An approach that has gained popularity is to replace l by the convex norm l since it often gives a satisfactory sparse 0 1 solution and has been applied in many different fields such as geology and ultrasound imaging. More recently, nonconvex metrics were used as alternative approaches to l , especially the nonconvex metric l for p∈ ð0; 1Þin [6] which can be interpreted as a continued approximation strategy of l as p→ 0. A great deal of research has been conducted into l problems including 0 p all kinds of variants and related algorithms, as you can see in [4] and references therein. The convex l relaxation compared to the nonconvex problem (l ) is generally more difficult to 1 p handle. However, it was shown in [12] that the potential reduction method can solve this special nonconvex problem in polynomial time with arbitrarily given accuracy. Most recently, the majority of such sparsity inducing functions are unified as the notion of DC programming in [9], including log-sum, smoothly clipped absolute deviation and capped-l penalty. Generally, DC programming problem can be solved through a primal–dual convex relaxations algorithm which is famous in the literature of DC Programming [11]. Other algorithms appeared as for solving application problems of DC programming in the area of finance and insurance, data analysis, machine learning as well as signal processing. However, as noted in [18], among the above mentioned DC programming approaches for sparse reconstruction, most of them are mainly preserving the separability properties of both l and l norms. To begin with, let us recall that the lasso of Tibshirani [16] is given by the following minimization problem min kAx  bk þ γ kxk ; (1.3) x∈IR A being an m3n real matrix, b∈ IR and γ > 0 is a tuning parameter. The latter is nothing else than the basic pursuit (BP) of Chen et al. [7], namely min kxk such that Ax ¼ b: (1.4) x∈IR However, the constraint Ax ¼ b being inexact due to errors of measurements, the problem (1.4) can be reformulated as min kxk subject to kAx  bk ≤ ε; (1.5) 1 p x∈IR where ε > 0 is the tolerance level of errors and p is often 1; 2or ∞. It is noticed in [1] that (1.5) can be rewritten as min kxk subject to Ax∈ Q; (1.6) x∈IR in the case when Q :¼ B ðbÞ, the closed ball in IR with center b and radius ε. Now, when Q is a nonempty closed convex set of IR and P the orthogonal projection from IR onto the set Q and by observing that the constraint is equivalent to the condition Ax− P ðAxÞ¼ 0, this leads to the following Lagrangian formulation Q 1 Difference of min kðI  P ÞAxk þ γkxk ; (1.7) Q 1 x∈IR 2 two norms- regularizations γ > 0 being a Lagrangian multiplier. A link is also made in [1] with split feasibility problems [5] which consist in finding x satisfying x∈ C; Ax∈ Q; (1.8) n m with C and Q two nonempty closed convex subsets of IR and IR , respectively. An equivalent formulation of (1.8) as a minimization problem is given by min kðI  P ÞAxk ; (1.9) x∈C and its l -regularization is min kðI  P ÞAxk þ γkxk ; (1.10) Q 1 x∈C with γ > 0 a regularization parameter. This convex relaxation approach was frequently employed, see for example [1,20] and references there in. As the level curves of l -l are closer to l than those of l , this motivated us 1 2 0 1 in [14] to propose a regularization of split feasibility problems by means of the nonconvex l -l , 1 2 namely min kðI  P ÞAxk þ γðkxk kxk Þ; (1.11) Q 1 2 x∈C 2 and present three algorithms with their convergence properties [14]. Unlike the separable sparsity inducing functions involved in the aforementioned DC programming for problem ðl Þ, we are interested in the two first sections of this work to two specific types of DC programming with un-separable objective functions, which are in the form of difference functions between two norms, namely the new notion l denoting the sum of q largest elements of a vector in magnitude (i.e., the l norm of q-term best approximation of a vector) introduced in [18] and the classical l norm with r > 1. Obviously l and l ðr > 1Þ are regular r σ r convex norms. The corresponding DC programs are as follows: min kxk  εkxk : Ax ¼ b ; (1.12) 1 σ x∈IR and minðkxk  εkxk : Ax ¼ bÞ; (1.13) 1 r x∈IR where ε∈ ð0; 1; kxk is defined as the sum of the q largest elements of x in magnitude, q∈ f1; 2; $$$ng and r > 1. We would like to emphasize that the following least-squares variant of (1.12) and (1.13), were studied in the recent work by Wang [18]: min f ðxÞ :¼ kAx  bk þ μ kxk  εkxk ; (1.14) 1 σq where μ > 0 and ε∈ ð0; 1Þ, and ACI 1 min f ðxÞ :¼ kAx  bk þ μðkxk  εkxk Þ ; (1.15) 1 r 17,1 x where r > 0 and ε∈ ð0; 1Þ. This paper proposes generalizations to Q-Lasso, namely min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  εkxk ; Q 1 σ x 2 where μ > 0 and ε∈ ð0; 1Þ, as well as min f ðxÞ :¼ kðI  P ÞAxk þ μðkxk  εkxk Þ ; Q 1 r where r > 0 and ε∈ ð0; 1Þ, and our attention will be focused on the algorithmic aspect. The rest of the paper is organized as follows. In Sections 2 and 3, two DC-penalty methods instead of conventional methods such as l or l − l minimization are proposed. Their 1 1 2 convergence to a stationary point are also analyzed. The first iterative minimization method is based on the gradient proximal algorithm and the second one is designed by means of the majored penalty strategy. Furthermore, relying on DCA (difference of convex algorithm) two other algorithms are proposed and their convergence results are established in Section 3 and 4. 2. Proximal gradient algorithm First, we recall that the subdifferential of a convex function f is given by n n vfðxÞ :¼ fu∈ IR ; fðyÞ≥ fðxÞþ hu; y  xi ∀y∈ IR g: (2.1) 1 2 Each element of vfðxÞ is called subgradient. If fðxÞ¼ kðI − P ÞAxk , it is well-known that vfðxÞ¼ ∇fðxÞ¼ A ðI  P ÞAx; (2.2) and when fðxÞ¼kxk , we have sgnðx Þ if x ≠ 0; i i ðvfðxÞÞ ¼ (2.3) ½1; 1 if x ¼ 0: The indicator function of a set C ⊆ IR is defined by 0if x∈ C; i ðxÞ¼ (2.4) þ∞ otherwise: Moreover, the normal cone of a set C at x∈ C, denoted by N ðxÞ is defined as N ðxÞ :¼ fd ∈ IR jhd; y  xi≤ 0;∀y∈ Cg: (2.5) Connection between the above definitions is given by the key relation vi ¼ N . C C In this section our interest is in solving the DC programming min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  εkxk ; (2.6) Q 1 σ where μ > 0 and ε∈ ð0; 1Þ. Similar to l norm, l norm, etc., we adopt the notation kxk to denote the norm of l which 1 2 σ σ q q Difference of is defined a line below (1.13) and we design an iterative algorithm based both on a generalized two norms- q-term shrinkage operator and on the proximal gradient algorithm framework. regularizations At this stage, observe that the restriction on ε guarantees that f ðxÞ≥ 0 for all x. To solve (2.6), we consider the following standard proximal gradient algorithm: 1. Initialization: Let x be given and set L > λ T with λ T the maximal 0 maxðA AÞ maxðA AÞ eigenvalue. 2. For k ¼ 0; 1; $$$ find D E T 2 x ∈ Argmin A ðI  P ÞAx ; x  x þ kx  x k þ μ kxk  εkxk : (2.7) kþ1 Q k k k 1 σ x∈IR Observe that subproblem (2.7) can equivalently formulated as L 1 T 2 min kx  x  A ðI  P ÞAx k þ μ kxk  εkxk : (2.8) k Q k 1 σ 2 L Thus, it suffices to consider the solutions to the following minimization problem min kx  yk þ λ kxk  λ kxk ; (2.9) 1 1 2 σ x 2 with a given vector y and positive numbers λ > λ > 0. An explicit solution of this problem is 1 2 given by the following result, see [18]. Proposition 2.1 Let fi ;  ; i g be the indices such that 1 n y ≥ y ≥  ≥ y : i i i 1 2 n Then x :¼ prox ðyÞ with λ kxk −λ kxk 1 1 2 σ signðy Þmaxfjy j ðλ  λ Þ; 0g if i ¼ i ; i ; $$$; i ; * i i 1 2 1 2 q x ¼ (2.10) signðy Þmaxfjy j  λ ; 0g otherwise i i 1 is a solution of (2.9). The proximal operator above (called the generalized q-term shrinkage operator in [18]) amounts to write the algorithm as follows: Proximal Gradient Algorithm: 1. Start: Let x be given and set L > λ T with λ T the maximal eigenvalue. maxðA AÞ maxðA AÞ 2. For k ¼ 0; 1;  find y ¼ x  A ðI  P ÞAx ; kþ1 k Q k Sort y as y ≥ y ≥  ≥ y ; kþ1 i i i 1 2 signðy Þmaxfj y j  μð1  εÞ; 0g if i ¼ i ; i i l ðx Þ ¼ (2.11) kþ1 signðy Þmaxfj y j  μ; 0g otherwise i i where l ¼ 1;   ; q. End. Now, we are in a position to show the following convergence result of the scheme (2.7): Proposition 2.2 The sequence ðx Þ generated by the Proximal Gradient Algorithm above ACI converges to a stationary point of problem (2.6). 17,1 1 2 Proof. Remember that hðxÞ ¼ kðI − P ÞAxk is differentiable and its gradient ∇hðxÞ¼ A ðI − P ÞAx is Lipschitz continuous with constant L :¼ λ T .By[3]- maxðA AÞ Proposition A.24, we have D E 2 T f ðx Þ≤ kðI  P ÞAx k þ A ðI  P ÞAx ; x  x 84 kþ1 Q k Q k kþ1 k þ kx  x k þ μ k x k  εkx k : kþ1 k kþ1 1 kþ1 σ Combining this with definition of x , we obtain kþ1 L  L f ðx Þ≤ f ðx Þ kx  x k : (2.12) kþ1 k kþ1 k Since L > L, we see immediately that f ðx Þ≤ f ðx Þ and thus the sequence ðf ðx ÞÞ is kþ1 k k convergent since f is a non-negative function. Furthermore, we obtain that kx − x k < þ∞ which follows by summing (2.12) from k ¼ 0to ∞. As a further kþ1 k consequence, we note that μð1  εÞkx k ≤ μ kx k  εkx k ≤ f ðx Þ≤ f ðx Þ: k 1 k 1 k σ k 0 Since μð1− εÞ > 0, we have that ðx Þ is bounded. Moreover, the objective function f is square term plus a piecewise linear function which ensures that f is semi-algebric and hence satisfies Kurdyka-Lojasiewicz inequality. [2]-Theorem 5.1 is then applicable and obtain that ðx Þ is convergent to a stationary point of (2.6). , 3. Majorized penalty algorithm Consider the following minimization problem min f ðxÞ :¼ kðI  P ÞAxk þ μðkxk  εkxk ; (3.1) Q 1 r x 2 m3n m where A∈ IR ; Q a nonempty closed convex set of IR ; r > 0 and ε∈ ð0; 1Þ. First, observe again that conditions on ε guarantees that f ðxÞ≥ 0 for all x. We will now describe an algorithm for solving (3.1), based on the majorized penalty approach see, for example, [18] and references therein. Following the same lines as in [18], we start by constructing a ~ T n majorization of f. To that end let L > λ ðA AÞ,thenfor any x; y∈ IR ,wehave max D E 1 1 L 2 2 2 kðI  P ÞAxk ≤ kðI  P ÞAyk þ A ðI  P ÞAy; x  y þ kx  yk : Q Q Q 2 2 2 2 2 2 Moreover, by invoking the convexity of the norm kxk and definition of its subdifferential, we also have kxk ≥ kyk þ hgðyÞ; x  yi with gðyÞ∈ vkyk ; r r r where r−1 signðy Þjy j i i if y≠ 0; r−1 kyk ½gðyÞ ¼ (3.2) 0 otherwise: Hence, if we define Difference of D E 2 two norms- Fðx; yÞ¼ kðI  P ÞAyk þ A ðI  P ÞAy; x  y Q Q regularizations þ kx  yk þ μðkxk  εkyk  εhgðyÞ; x  yiÞ; 1 r hence, for every x; y∈ IR , we get ~ ~ Fðx; yÞ≥ f ðxÞ and Fðy; yÞ¼ f ðyÞ: Starting with an initial iterate x , the majorized penalty approach above updates x by solving 0 k x ¼ argminFðx; x Þ: (3.3) kþ1 k This leads to the following explicit formulation of x by means of the proximity (shrinkage) kþ1 operator of kxk : D E T 2 x ¼ argmin A ðI  P ÞAx ; x  x þ kx  x k þ μðkxk  ε gðx Þ; x  x h iÞ kþ1 Q k k k 1 k k L 1 T 2 ¼ argmin kx  x þ A ðI  P ÞAx  μεgðx Þ k þ μkxk k Q k k 1 2 L ¼ prox x  A I  p Ax  μεgðx Þ k Q k k kxk n o ¼ sgnðv Þ+max v  ; 0 ; k k where v ¼ x  A ðI  P ÞAx  μεgðx Þ with gðx Þ∈ vkx k : k k Q k k k k r We summarize the algorithm as follows: Majorized Penalty Algorithm: 1. Initialization: Let x be given and set L > λ T . maxðA AÞ 2. For k ¼ 0; 1;  find n o 1 μ x ¼ sgn x  A ðI  P ÞAx  μεgðx Þ +max v  ; 0 (3.4) kþ1 k Q k k k L L End. The following proposition contains the convergence result of this Penalty Algorithm. Proposition 3.1 Let ðx Þ be the sequence generated by the Majorized Penalty Algorithm above. Then ~ ~ kx  x k ≤ f ðx Þ f ðx Þ: (3.5) k kþ1 k kþ1 Furthermore, the sequence ðx Þ is bounded and any cluster point is a stationary point of problem (3.1). Proof. Since x minimizes Fðx; x Þ, thanks to the first-order optimality condition we can kþ1 k write 0∈ A ðI  P ÞAx þ Lðx  x Þþ μkx k  μεgðx Þ; (3.6) Q k kþ1 k kþ1 1 k gðx Þ being a subgradient of kxk at x . This combined with the definition of the k r kþ1 ACI subdifferential of kxk at x gives kþ1 17,1 D E μ kx k  μkx k ≥ −A ðI  P ÞAx  Lðx  x Þþ μεgðx Þ; x  x k 1 kþ1 1 Q k kþ1 k k k kþ1 D E ¼ −A ðI  P ÞAx þ μεgðx Þ; x  x þ Lkx  x k Q k k k kþ1 kþ1 k D E ¼ A ðI  P ÞAx þ μεgðx Þ; x  x þ Lkx  x k : Q k k kþ1 k kþ1 k Hence D E μkx k  μkx k þ A ðI  P ÞAx  μεgðx Þ; x  x ≤  Lkx  x k : kþ1 1 k 1 Q k k k kþ1 kþ1 k This together with the definition of F, for any k≥ 1, leads to D E T 2 ~ ~ ~ f ðx Þ f ðx Þ≤ Fðx ; x Þ f ðx Þ¼ A ðI  P ÞAx ; x  x þ kx  x k kþ1 k kþ1 k k Q k kþ1 k kþ1 k þ μðkx k kx k  εgðx Þ; x  x iÞ kþ1 1 k 1 k k kþ1 ¼ kx  x k þ μkx k  μkx k kþ1 k kþ1 1 k 1 D E þ A ðI  P ÞAx  μεgðx Þ; x  x Q k k kþ1 k 2 2 ≤ kx  x k  Lkx  x k : kþ1 k kþ1 k 2 2 Consequently, ~ ~ f ðx Þ f ðx Þ≤  kx  x k : (3.7) kþ1 k kþ1 k ~ ~ ~ ~ Hence f ðx Þ≤ f ðx Þ and thus the sequence ðf ðx ÞÞ is convergent since f is a non-negative kþ1 k k function. Furthermore, the sequence ðx Þ is such that kx  x k < þ∞: kþ1 k k¼0 Indeed, by summing (3.7) from k ¼ 0to ∞, we obtain that ~ ~ ~ kx  x k ≤ f ðx Þ lim f ðx Þ≤ f ðx Þ < þ∞: kþ1 k 0 k 0 k→þ∞ k¼0 Consequently, the sequence ðx Þ is asymptotically regular, i.e., lim kx − x k¼ 0. On k k→þ∞ k kþ1 the other hand, observe that the definition of f for any k≥ 1, leads to ~ ~ μðkx k  εkx k Þ≤ kðI  P ÞAx k þ μðkx k  εkx k Þ¼ f ðx Þ≤ f ðx Þ: k 1 k r Q k k 1 k r k 0 2 ~ Since kx k ≥ kx k , we obtain that μð1− εÞkx k ≤ f ðx Þ. This implies that ðx Þ is bounded k 1 k r k r 0 k Difference of since 0 < ε < 1. To conclude, we prove that every cluster point of ðx Þ is a stationary point of two norms- * * (3.1). Let x be a cluster point of ðx Þ, then x ¼ lim x ; ðx Þ being subsequence of ðx Þ.By k V k k k v v regularizations passing to the limit in (3.6) along the subsequence ðx Þ and in the light of the upper semicontinuity of (Clarke) subdifferentials, we obtain the desired result, namely T * * * 0∈ A ðI  P ÞAx þ μvkx k  μεgðx Þ; Q 1 which is nothing else than the first-order optimality condition of (3.1)., 4. DCA algorithm Now we turn our attention to a DC Algorithm (DCA), where the dual step at each iteration can be efficiently carried out due to the accessible subgradients of the largest- q-norm k$k and k$k norm. Remember that to find critical points of f :¼ f− ψ, the DCA consists in designing of sequences ðx Þ and ðy Þ by the following rules k k y ∈ vψðx Þ; k k (4.1) x ¼ argmin ðfðxÞðψðx Þþ hy ; x  x iÞÞ: kþ1 x∈IR k k k Note that by the definition of subdifferential, we can write ψðx Þ≥ ψðx Þþ hy ; x  x i: kþ1 k k kþ1 k Since x minimizes fðxÞ− ðψðx Þþ hy ; x− x iÞ, we also have kþ1 k k k fðx Þðψðx Þþ hy ; x  x iÞ≤ fðx Þ ψðx Þ: kþ1 k k kþ1 k k k Combining the last inequalities, we obtain f ðx Þ¼ fðx Þ ψðx Þ≥ fðx Þðψðx Þþ hy ; x  x iÞ≥ f ðx Þ: k k k kþ1 k k kþ1 k kþ1 Therefore, the DCA leads to a monotonically decreasing sequence ðf ðx ÞÞ that converges as long as the objective function f is bounded below. Now, we can decompose the objective function in (2.6) as follows min f ðxÞ :¼ kðI  P ÞAxk þ μ kxk  με kxk ; (4.2) Q 1 σ x 2 1 2 where μ > 0, ε∈ ð0; 1Þ, here fðxÞ¼ kðI − P ÞAxk þ μkxk and ψðxÞ¼ μεkxk . Q 1 σ At each iteration, DCA solves The convex subproblem defined by linearizing the concave term −εkxk is solved by DCA at each iteration until a convergence condition is satisfied. More precisely, we have y ∈ με vkx k ; > k k σ (4.3) x ¼ argmin n kðI  P ÞAxk þ μkxk  μεkx k þ hy ; x  x i : kþ1 x∈IR Q 1 k σ k k Especially, if either the function f or ψ is polyhedral, the DCA is said to be polyhedral and terminates in finite iterations [15]. Note that the our proposed DCA is polyhedral since the q q largest-q norm term −εkxk can be expressed as a pointwise maximum of 2 C linear q n functions, see [10]. On the other hand, the subdifferential of kxk at a point x is given in, see σ k for example [19], ! n n ACI X X vkx k argmax j½x  jy : y ¼ q; 0≤ y ≤ 1; i ¼ 1;  ; n ; (4.4) k σ k i i i 17,1 q i i¼1 i¼1 that is n o vkx k ¼ ðy ;  ; y Þ : y ¼  ¼ y ¼ 1; y ¼ 0 ¼  ¼ y ¼ 0 ; k σ 1 n i i i i q 1 q qþ1 n where y denotes the element of y corresponding to x in the linear program (4.4). Observe that i i j j a subgradient y∈ vkx k can be computed efficiently by first sorting the elements j x j in k σ i decreasing order, namely x ≥ x ≥  ≥ x . Then, assign 1 to y which corresponds i i i i 1 2 to x  ; x . i i 1 q To conclude, let us consider the following DC formulation of (3.1): min f ðxÞ :¼ kðI  P ÞAxk þ μkxk ðμεkxk Þ ; (4.5) Q 1 r 1 2 where r > 0, ε∈ ð0; 1Þ, here fðxÞ¼ kðI − P ÞAxk þ μkxk and ψðxÞ¼ μεkxk . Q 1 r The subgradient y∈ vkx k is also available via the formula (3.2) and the DCA in this k r context take the following form y ∈ μεvkx k ; k k r (4.6) x ¼ argmin n kðI  P ÞAxk þ μkxk ðμεkx k þ hy ; x  x iÞ : kþ1 x∈IR Q 1 k r k k where r−1 signð½x  Þj½x  j k k i i if x ≠ 0; r−1 kx k ½y  ¼ k k r 0 otherwise: For the details of DCA convergence properties, see [15]. 5. Concluding remarks The focus of this paper is on Q-Lasso relying on two new DC-penalty methods instead of conventional methods such as l or l − l minimization developed in [13,17] and [21]. Two 1 1 2 iterative minimization methods based on the gradient proximal algorithm as well as the majored penalty algorithm are designed and their convergence to a stationary point is proved. Furthermore, by means of DC (difference of convex) Algorithm, two other algorithms are devised and their convergence results are also stated. References [1] M.A. Alghamdi, M. Ali Alghamdi, Naseer Shahzad, H.-K. Xu, Properties and iterative methods for the Q-Lasso, Abstr. Appl. Anal. (2013), Article ID 250943, 8 pages. [2] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., Ser. A 137 (2013) 91–129. [3] D.-P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999. [4] A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse Difference of modeling of signals and images, SIAM Rev. 51 (2009) 34–81. two norms- [5] Y. Censor, T. Elfving, A multiprojection algorithm using Bregman projections in a product space, regularizations Numer. Algorithms 8 (1994) 221–239. [6] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett. 14 (2007) 707–710. [7] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998) 33–61. [8] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52 (2006) 1289–1306. [9] G. Gasso, A. Rakotomamonjy, S. Canu, Recovering sparse signals with a certain family of nonconvex penalties and dc programming, IEEE Trans. Signal Process. 57 (12) (2009) 4686–4698. [10] Jun-ya Gotoh, Akiko Takeda, Katsuya Tono, DC formulations and algorithms for sparse optimization problems, Math. Program. (2017) 1–36. [11] R. Horst, N.V. Thoai, Dc programming: overview, J. Optim. Theory Appl. 103 (1999) 1–41. [12] S. Ji, K.-F. Sze, Z. Zhou, A.M.-C. So, Y. Ye, Beyond convex relaxation: A polynomial-time non- convex optimization approach to network localization, in: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM 2013), Torino, 2013. [13] Y. Lou, M. Yan, Fast l l Minimization via a proximal operator, J. Sci. Comput. (2017) 1–19. 1 2 [14] A. Moudafi, A. Gibali, l l regularization of split feasibility problems, Numer. Algorithms (2017) 1 2 1–19, http://dx.doi.org/10.1007/s11075-017-0398-6. [15] T. Pham Dinh, H.A. Le Thi, Convex analysis approach to D.C. programming: Theory, algorithms and applications, Acta Math. Vietnamica 22 (1) (1997) 289–355. [16] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B 58 (1996) 267–288. [17] P. Yin, Y. Lou, Q. He, J. Xin, Minimization of l for compressed sensing, SIAM J. Sci. Comput. 37 12 (2015) 536–563. [18] Y. Wang, New improved penalty methods for sparse reconstruction based on difference of two norms, Technical Report (2013) 1–11. [19] B. Wu, C. Ding, D.F. Sun, K.C. Toh, On the Moreau-Yoshida regularization of the vector k-norm related functions, SIAM J. Optim. 24 (2014) 766–794. [20] Xu. Hong-Kun, Maryam A. Alghamdi, Naseer Shahzad, Regularization for the split feasibility problem, J. Nonlinear Convex Anal. 17 (3) (2015) 513–525. [21] Z. Xu, X. Chang, F. Xu, H. Zhang, l regularization: a thresholding representation theory and a 12 fast solver, IEEE Trans. Neural Networks Learn. Syst. 23 (2012) 1013–1027. Corresponding author Abdellatif Moudafi can be contacted at: abdellatif.moudafi@univ-amu.fr For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com

Journal

Applied Computing and InformaticsEmerald Publishing

Published: Jan 4, 2021

Keywords: Q-Lasso; Split feasibility; Soft-thresholding; DC-regularization; Proximal gradient algorithm; Majorized penalty algorithm; Shrinkage; DCA algorithm

References