A Convex Approach for Image Restoration with Exact Poisson-Gaussian Likelihood

,


Introduction
The recovery of a target image in the presence of degradations (i.e. noise) has been extensively studied in the literature. Image reconstruction problems are often formulated into the Maximum A Posteriori (MAP) framework. The MAP estimator is computationally simple and offers significant flexibility in the choice of a prior, while the data fidelity term usually consists of the noise neg-log-likelihood or some approximation of it. A Gaussian noise model is commonly assumed in imaging due to the simple and intuitive form of the related data fidelity function, which is convex and Lipschitz differentiable. However, recent methodological progress has ensured that more sophisticated variational problems can be efficiently solved numerically. As a consequence, studies are no longer limited to the simplest Gaussian model. Some contributions concern various standard noise distributions, e.g. Poisson [4,11,20,33,44,48,67], impulsive [16] or multiplicative [5,19,31,46] noise, as well as some mixed noise models, e.g. the mixture of Gaussian and impulsive noise [72,73].
A growing interest in Poisson-Gaussian probabilistic models has recently been observed. The Poisson component is often related to the quantum nature of light and accounts for photoncounting principles in signal acquisition, whereas the Gaussian component is typically related 1 2 to noise present in the electronic part of the imaging system. Indeed, despite constant improvements, electronic noise usually cannot be neglected. In addition, the Poisson noise component cannot usually be accurately described through Gaussian statistics, especially in the context of low-count imaging. As a consequence, a mixed Poisson-Gaussian noise model is frequently considered in astronomy [9,68], remote sensing [3], medicine [60] and biology [29]. There has been a growing interest for denoising problems involving images corrupted in this fashion [8,12,57]. Recent advances have been made in Poisson-Gaussian noise parameters estimation procedures [38,47,49]. However, the literature in deconvolution and reconstruction problems involving such noise model remains scarce. Among existing works, Benvenuto et al. [9] proposed a scaled gradient method and more recently Gil-Rodrigo et al. [41] developed an alternating minimization algorithm. An augmented Lagrangian method grounded on a Poisson approximation of the noise characteristics was proposed in [17], while a similar framework with a weighted squared ℓ 2 norm noise approximation was proposed in [55]. So far, restoration strategies have relied on some approximations of the noise statistics, which may have been detrimental to the quality of the results. The use of approximations is motivated by the mathematical difficulties raised by the Poisson-Gaussian model. Indeed, the corresponding probability distribution has a discretecontinuous nature, and the expression of the associated log-likelihood function involves infinite sums. For simplification, one usually neglects either the Poisson or the Gaussian component, or performs an approximation of the Poisson-Gaussian model based on variance stabilization techniques [37,59].
In this paper, we investigate the properties of the Poisson-Gaussian negative log-likelihood, showing that it is a convex Lipschitz differentiable function. Since the gradient of the Poisson-Gaussian neg-log-likelihood requires the computation of infinite series, we propose to approximate it using finite sums whose bounds depend on the current signal estimate. Next, we employ a proximal optimization method whose convergence is guaranteed even in the presence of summable numerical errors. The retained approach belongs to the recent class of primaldual splitting algorithms (see [52] and the references therein) and can cope with the sum of a gradient Lipschitz term and possibly nonsmooth penalty terms. Such terms can model a wide range of prior information, e.g. range value constraints, criteria promoting sparsity in a frame, total-variation and more generally hybrid regularization functions [62].
Our main contributions in this paper are as follows: • a proof of the Lipschitz differentiability of the Poisson-Gaussian neg-log-likelihood function; • a proof of the convexity of the mixture of Poisson and Generalized-Gaussian neg-loglikelihood function; • the establishment of new summation bounds, along with error control, for the infinite sums appearing in the gradient of the Poisson-Gaussian neg-log-likelihood; • new tools for comparing the existing approximations of the Poisson-Gaussian model; and • validation of the numerical performance of our restoration method applied to the model with data fidelity given by exact Poisson-Gaussian neg-log-likelihood and also by its several approximations.
The paper is organized as follows: Section 2 introduces the notation used in this work and investigates the Poisson-Gaussian model. In particular, convexity and Lipschitz-differentiability results are presented. Section 3 describes the proposed optimization framework. Next, our restoration approach is illustrated via experiments in Section 4. Finally, some conclusions are drawn in Section 5.

2 Degradation model
Let y = (y i ) 1≤i≤Q ∈ R Q be a vector of observations related to an original signal x ∈ [0, +∞) N , which is degraded by a matrix H in [0, +∞) Q×N (e.g., a convolutive model) and further corrupted with Poisson-Gaussian noise. More specifically, the observations y are related to x by where z(x) = z i (x) 1≤i≤Q and w = (w i ) 1≤i≤Q are realizations of mutually independent random vectors Z(x) = Z i (x) 1≤i≤Q and W = (W i ) 1≤i≤Q featuring independent components. It is further assumed that, for every i ∈ {1, . . . , Q}, where [Hx] i is the i-th component of vector Hx, and σ ∈ (0, +∞) is the standard deviation of the zero-mean Gaussian noise component. Hence, y is a realization of a random vector Y with probability density function given by the convolution of the components resulting from the Gaussian and Poisson distributions, i.e.
In the context of inverse problems, the original signal can be recovered by minimizing a penalized criterion: min where Φ is the so-called data fidelity term and ρ is a regularization function incorporating a priori information, so as to guarantee the stability of the solution w.r.t. the observation noise.
In the Bayesian framework, this allows us to compute the MAP estimate [30] of the original image. In this context, the data fidelity term is defined as the negative logarithm of p Y |x (y): where, for every i ∈ {1, . . . , Q} and u ∈ [0, +∞), and the regularization term ρ corresponds to the potential of the chosen prior probability density function of the target image. Let 1 denote the vector of R Q with all its components equal to 1. The gradient and Hessian of Φ on the positive orthant are respectively given by: for every x ∈ [0, +∞) N , and, for every (a, b) ∈ R 2 , A common useful technique for solving large-size optimization problems such as those encountered in image recovery consists of splitting (i.e. decomposing) the cost function f in a sum of simpler functions, which are then processed individually. For example, each of these functions can be dealt with through its gradient if the function is µ-Lipschitz differentiable 1 , or through its proximity operator [58,65] if the latter has a closed form expression.
In view of this, the two following results are essential to our approach.
Proof. Our proof consists of showing that, for every i ∈ {1, · · · , Q}, η i is upper bounded on [0, +∞) by η i (0), where η i (0) is defined in (30). Firstly, we recall that for a = 0 the infinite sum (12) simplifies to the first sum element. Consequently, for every z i ∈ (0, +∞), we have the following equivalences: 5 The latter inequality holds, since the left-hand side is a sum of nonnegative terms. Hence, the expression of the Lipschitz constant µ in (13) is obtained by searching the maximum value of η i (0) for all possible values of i ∈ {1, . . . , Q}.
The next convexity result can be regarded as an extension to the one presented in [9].
Theorem 2. The neg-log-likelihood Φ (β) of a mixture of Generalized-Gaussian and Poisson variables defined over the positive orthant as where, for every i ∈ {1, . . . , Q} and u ∈ [0, +∞), Proof. It is sufficient to show that, for every i ∈ {1, · · · , Q}, and, for every b ∈ R, with ζ The proof reduces to studying the sign of the numerator ofΦ (β) i . For every a ∈ [0, +∞), b ∈ R and β ∈ (0, +∞), the following equivalences hold: Furthermore, for all a ∈ (0, +∞), b ∈ R, and (n, m) ∈ N 2 with m > n, The above inequality holds if the function is increasing. To prove this fact, let us study the sign of the derivative of the function π (β) over its domain. The singularity points, i.e. u ∈ {0, 1}, can be excluded from the study due to the continuity of π (β) . Then, for β ≥ 1 and u ∈ R \ {0, 1} the derivative is given by where sign denotes the signum function, and Hence, it is positive (resp. nonnegative) for β > 1 (resp. β = 1), so that π (β) is strictly increasing (resp. increasing).
The Poisson-Generalized Gaussian offers more flexibility than the Poisson-Gaussian model, but it has not been yet studied in the area of image processing. The results obtained above should be of potential use for future applications. In the rest of the paper we concentrate on Poisson-Gaussian model (β = 2), which is of main interest of this paper.

Name
Expression Table 1: . Ei denotes the exponential integral function. We provide conditions on (y i ) 1≤i≤Q for Φ to be convex as well as the expression of coefficients (µ i ) 1≤i≤Q involved in the Lipschitz constant µ = H 2 max i∈{1,...,Q} µ i of ∇ Φ. Table 1 summarizes several data fidelity terms that have been proposed in the literature as approximations of (6), with the aim to solve image restoration problems with data corrupted by Poisson-Gaussian noise. All these approximations are convex (up to some assumptions on y), and Lipschitz differentiable.
For the optimization methods developed in the next section, it is important to note that the definition of the negative log-likelihood (6) can be extended to the whole space R N by setting, where For every i ∈ {1, . . . , Q}, function ϕ i identifies with Φ i on the positive half line, while a quadratic extension is used on the negative one, guaranteeing that the resulting function is twice differentiable at 0, and Lipschitz differentiable on R with minimum Lipschitz constant, i.e.
where functions Φ i , ξ i and η i are defined in (7), (10) and (11) respectively. We note that for a = 0 the infinite sum (12) simplifies to the first term, i.e. s(0, b) = e − b 2 2σ 2 . Hence, for every i ∈ {1, . . . , Q}, the expressions of ξ i (0) and η i (0) in (28) read Consequently, h is a convex function with a µ-Lipschitzian gradient on R N . The positivity constraint in the original problem is imposed by the indicator function (27).
3 Proposed optimization method

Minimization problem
According to the analysis carried out in Section 2, the objective function has the following form where the regularization term is split into a sum of simpler functions. More precisely, it will be assumed that ψ 0 ∈ Γ 0 (R N ) and, for every r ∈ {1, . . . , R}, ψ r ∈ Γ 0 (R Pr ) and V r ∈ R Pr×N . 2 Note that (31) covers a large range of penalization strategies. For instance, a sparsity prior in an analysis frame with frame operator V r is introduced by taking ψ r equal to λ r · 1 with λ r > 0 [34]. Block sparsity measures [35] can also be easily addressed in the proposed framework. Another popular example in image restoration is the total variation penalization [66]. 3 In this case, corresponds to a horizontal (resp. vertical) gradient operator, and, for every Another interesting choice is the Hessianbased penalization [40], [54] i.e., P r = 3N and and ∆ vv ∈ R N ×N model the second-order partial finite difference operators between neighbooring pixels as described in [54, Sec.III-A] and, for every Another example is the non-local total variation (NLTV) [42]. NLTV is associated with image-driven gradient directions i.e. directions are chosen for all n ∈ {1, . . . , N } independently, based on a similarity score between pixel intensity in a local neighborhood, e.g. patch based score [15]. In this where, or every n ∈ {1, . . . , N } and d ∈ {1, . . . , D n }, ∆ d n ∈ R 1×N , D n ∈ N states for the number of gradient directions and ω d n is a positive weight and, for every x ∈ R N , ψ r (V r x) = λ r N n=1 Dn d=1 (∆ d n x) 2 1/2 with λ r > 0. The above penalties can be considered individually (R = 1) or combined in a hybrid manner (R > 1) [62]. Finally, following (25), ψ 0 should be the indicator function ι [0,+∞) N . However, to take into account the dynamic range of the expected output image, it can be more generally chosen equal to the indicator function ι C of a nonempty closed convex subset C of [0, +∞) N .

Primal-dual splitting algorithm
Problem (5) where f takes the form (31) can be efficiently addressed using proximal splitting algorithms [14,23,26,63]. The solution is obtained iteratively by evaluating the individual proximity operators of the functions (ψ r ) 0≤r≤R , provided that they have an explicit expression. We first require the notion of proximity operator. 9 Definition 1. [58,65] The value at x ∈ R N of the proximity operator of a function f ∈ Γ 0 (R N ), denoted by prox f : R N → R N , is the unique solution to the following minimization problem: The proximity operator is firmly nonexpansive, i.e. for every (x, and its fixed points are the minimizers of function f . Numerous convex optimization algorithms are based on this concept (see [24,61] for tutorials) due to these properties. In the context of the minimization of (31), we are interested in algorithms which incorporate functions h and (ψ r ) 0≤r≤R either via their proximity operators or via their gradients. The presence of a smooth term is of paramount importance as we have shown our data fidelity term h to be µ-Lipschitz differentiable, while its proximity operator does not have a closed form expression. Note that not all proximal methods offer the required flexibility to address the minimization of (31). More precisely, in the case when R = 0, one can employ either the forward-backward (FB) [ We are now ready to present our primal-dual splitting algorithm. The main advantage of the chosen primal-dual splitting algorithm is that it allows us to solve (5) for any Lipschitz differentiable function h while allowing arbitrary linear operators (V r ) 1≤r≤R . Another strong point of this algorithm is that it does not require any matrix inversion to be performed. Our primal-dual method is summarized in Algorithm 1. It corresponds to an instance of the algorithm proposed in [26] under a generic form.

Convergence result
The convergence of the proposed primal-dual proximal splitting algorithm is guaranteed by the following result deduced from Theorem 1 and [26, Theorem 4.2]: Theorem 3. Given the following assumptions: (i) f is coercive, i.e. lim x →+∞ f (x) = +∞, (ii) for every r ∈ {1, . . . , R}, ψ r is finite valued, where ǫ ∈ (0, 1/(δ + 1)) and (iv) (c k ) k∈N and (e k ) k∈N are absolutely summable sequences, then there exists a minimizer x of (31) such that the sequences (x k ) k∈N and (p 1,k ) k∈N produced by Algorithm 1 converge to x.

Implementation issues
Algorithm 1 is tolerant to numerical errors (c k ) k∈N and (e k ) k∈N . This feature is essential in our context, as the gradient of the Poisson-Gaussian negative log-likelihood given by (8) involves infinite sums. We propose finite summation bounds depending on the current estimate of x that allow to compute practically accurate approximations of these sums. Let us firstly rewrite s(a, b) defined in (12) Then, the approximation accuracy is guaranteed by the following result deduced from [50, Proposition A.2]: Proposition 1. Let ∆ > 0 and set n − = ⌊n * − ∆σ⌋, n + = ⌈n * + ∆σ⌉ (35) with n * given by where W(·) denotes the Lambert function. Then, where erf denotes the error function.
One can observe in Fig. 1 that the bounds defined in (35) can be quite precise. On the contrary, the summation bounds n − = 0 and n + = b + 4σ proposed in [9,53] are not always guaranteed to include all the significant coefficients or to be very effective. Note that efficient numerical methods exist to compute the Lambert function [39].
In order to secure convergence of the proposed algorithm, the errors made when computing the gradient terms need to summable so that ∆ should be increased at each iteration. In the experimental study conducted in the next section, we will show that a good numerical performance of Algorithm 1 is obtained even with a fixed, sufficiently large, parameter ∆.

Simulations
We now demonstrate the practical performance of Algorithm 1 for the restoration of images corrupted with blur and Poisson-Gaussian noise. More specifically, our study aims at illustrating the usefulness of our approach and at comparing the merits of the various approximations of the Poisson-Gaussian likelihood, in the context of low-count confocal microscopy images with fairly low initial signal to noise ratio (SNR). The considered ground truth images, i.e. x 1 of size 190 × 190, x 2 of size 128 × 128, x 3 of size 350 × 350 and x 4 of size 256 × 256, are illustrated in Fig. 2(top). The first two images result from the time series noise identification procedure described in [50] applied to real data acquired with a macro confocal laser scanning microscope (Leica TCS-LSI). The third and fourth images are publicly free confocal microscopy images available respectively at http://meyerinst.com/ confocals/ tcs-spe/index.htm and www.gensat.org/ imagenavigator.jsp?imageID=29109 [1]. The intensities of these original images have been rescaled so as to impose a given pixel value range [x − , x + ], with x − = 0 and x + ∈ (0, +∞). 5 The images are then degraded with a convolution operator H modeling a spatially invariant blur, and further corrupted with Poisson and zeromean additive white Gaussian noise, with standard-deviation σ > 0, according to model (2). The resulting images are displayed in Fig. 2(bottom).

Method settings
The restored images are obtained by applying Algorithm 1 to Problem (31), where the definition of h, (ψ r ) 0≤r≤R and (V r ) 1≤r≤R depends on the retained data fidelity and regularization strategies. More precisely, we present restoration results obtained for a data fidelity term either derived from the Gaussian likelihood, the Poisson likelihood, or the GAST, EXP, SPoiss, WL2 or (6) functions described in Table 1. For the regularization term, we consider the sum of ι C , the indicator function of C = [x − , x + ] N with a TV-like prior, defined either as the standard isotropic TV prior with weight λ > 0, as the NLTV prior with weight λ > 0, or as a hybrid penalization being the sum of the Hessian prior from [54] with TV penalty, with respective weights λ H > 0 and λ TV > 0. In all our experiments, the regularization weights are tuned manually so as to minimize the Mean Absolute Error (MAE) between the original imagex and its final estimate x. Moreover, the linear operators involved in the NLTV prior are computed from an initial Wiener filter restoration using code provided by Bresson [13]. 6 . The Gaussian likelihood, and the GAST, EXP, WL2, (6) functions are Lipschitz differentiable, so that the resulting optimization problem takes the form (31) where h is the considered data fidelity term, ψ 0 = ι C , the indicator function of C = [x − , x + ] N , and (ψ r ) 1≤r≤R , (V r ) 1≤r≤R model the TV-like prior. GAST is thus handled in a manner similar to [32], i.e. by taking advantage of Lipschitz-differentiability properties of the function Φ presented in the first line of Table 1. In the cases of the non-Lipschitz differentiable SPoiss and Poisson likelihoods, the optimization problem is written under the form (31) where h ≡ 0, ψ 0 is the considered data fidelity term, ψ 1 = ι C , and (ψ r ) 2≤r≤R , (V r ) 2≤r≤R model the TV-like prior. Note that, in both cases, the proximity operators of the involved functions (ψ r ) 0≤r≤R have explicit forms [25].
In order to guarantee the convergence of Algorithm 1, the parameter γ involved in the algorithm is chosen so as to take the maximum possible value satisfying Theorem 3 (iii). Moreover, in the cases of the Poisson, SPoiss, WL2 and GAST data fidelity terms, a data truncation is performed as a pre-processing step on the observed images in order to satisfy the required convexity conditions (see Table 1).

Restoration results
We start this section with a study, in Table 2, of the influence of the choice of parameter ∆ in the restoration quality of image x 1 , when the exact Gaussian-Poisson likelihood (6) is used as data fidelity term. One can observe that the restoration performance is stable for ∆ ≥ 3. This setting will be adopted for the remaining experiments. Tables 3, 4, 5, and 6 present the evaluation scores MAE (Mean Absolute Error), SNR and SSIM [71] for the initial and restored images using the various data fidelity criteria. Note that the provided relative MAE values are normalized with a factor 255/x + .  Table 2: Influence of ∆ on restoration results of imagex 1 using Hessian-TV prior.
One can observe that the exact model always leads to the best qualitative results. Conversely, Poisson, WL2 and GAST strategies lead to relatively poor performance. Although Gaussian and EXP data fidelity terms may be competitive in some situations (see, for instance, Table 3), the performance gain in terms of SNR between the exact and EXP data fidelity term reaches up to 0.25 dB (see Table 4). Complementary to these numerical results, Fig. 3 illustrates the visual differences resulting from various data fidelity terms in the case of image x 1 . One can observe that a high number of low intensity components are lost when using a data fidelity term derived from Poisson statistics. Similarly, the shape of low intensity components is not well reconstructed when using a WL2 data fidelity term. Most artifacts are corrected when using the GAST data fidelity term while even better results are obtained with the Gaussian and SPoiss. Finally, one can hardly notice a visual difference between the EXP approximation and the exact data fidelity term. Finally, we indicate in the caption of Fig. 3 the computational time required to reach the algorithm convergence (typically, when the difference in norm between two consecutive iterates becomes small), when tests are performed on a quad-CPU, 32-core Intel Xeon L7555 1.8 GHz with 144 GB of RAM running RedHat Enterprise Linux 6.5 using a Matlab code implementation. One can observe that the best tradeoff between image quality and convergence time is reached by the EXP model. The computational time difference between Exact and EXP data fidelity term may result from: i) the relatively high value of Lipschitz constant and ii) the gradient of the data fidelity which is computationally more expensive. The latter issue could be alleviated by a better optimized implementation. Finally, one can observe by inspecting the final MAE relative values and the SNR improvements for the four images that the hybrid Hessian-TV regularization strategy leads to very good results (see Tables 3, 4, 5, and 6). This can be also validated by a visual inspection of Fig. 4 (bottom) which shows that this penalization allows a reduction of the undesired staircase effect visible in the TV result ( Fig. 4 (top)). The NLTV regularization strategy should also have reduced the undesired staircase effect visible in the TV result. However, one can see that this penalization does not always lead to an improvement with respect to TV or Hessian-TV (Fig. 4 (middle)). This reflects the fact that the similarity between local image features is essentially lost in the observed images characterized with a very low SNR, such as y 1 , so that the NLTV weights cannot be adjusted properly in that cases. 14

Conclusion
We have shown that Poisson-Gaussian neg-log-likelihood is a convex, Lipschitz-differentiable function. The provided convexity result is actually more general as it applies to the neg-log likelihood of a mixture of Generalized-Gaussian and Poisson variables. Building upon these results, we have proposed a new variational approach for solving data recovery problems in the presence of Poisson-Gaussian noise. We have developed a practical implementation of an efficient primal-dual algorithm, which is particularly flexible, i.e. for which a large range of penalization strategies and data fidelity terms are applicable. Among those we employed, the hybrid TV-Hessian prior was shown to produce naturally looking, high quality results for low count microscopy image restoration problems in the presence of Poisson-Gaussian noise. We have shown the good performance of our restoration algorithm using the exact data fidelity term and various approximations. In addition, we also conclude that the EXP and SPoiss fidelity terms provide good results compared to the exact solution. In a future work we would like to extend our approach to encompass the case of Generalized Gaussian-Poisson noise and to study the performance of other regularization strategies than those we have considered here. Although only TV, NLTV and hybrid TV-Hessian priors were analyzed in this work, the versatility of our approach should allow us to address a variety of applications by making use of various forms of convex penalty functions. The extension of the proposed method to β = 2 would require to prove that the corresponding neg log likelihood is Lipschitz differentiable, which remains an open problem.