Disparities selection controlled by the compensated image quality for a given bitrate

A stereoscopic image consists of two views rendering a depth sense. Indeed each eye is constrained to look at one view, and the small objects displacements across the two views are interpreted as an indication of depth. These displacements are exploited as specific inter-view redundancies from a compression viewpoint. The classical still compression scheme, called disparity-compensated compression scheme, compresses one view independently of the second view, and a block-based disparity map modeling the displacements is losslessly compressed. The difference between the original view and its disparity predicted view is then compressed and used by the decoder to compute the compensated view to improve the disparity predicted view. However, a proof of concept work has already shown that selecting disparities according to the compensated view, instead of the predicted view, yields increased rate-distortion performance. This paper derives from the JPEG-coder, a disparity-dependent analytic expression of the distortion induced by the compensated view. This expression is embedded into an algorithm with a reasonable numerical complexity approaching the performance obtained with the proof of concept work. The proposed algorithm, called fast disparity-compensated block matching algorithm, provides at the same bitrate an average performance increase as compared to the classical stereoscopic image coding schemes.


Introduction
A stereoscopic image is composed of two views which are perceived as two viewpoints of a single 3D-scene, thanks to a technical device.Applications concern the entertainment industry, video games, medical field and cartography [1].From an information technology viewpoint, all these displayed contents require a very large amount of data which causes issues with storage, transmission and sometimes real-time displaying.Such data is used in many 3D-research activities [2] to estimate the depth map, generally assuming that objects look the same when seen from different views, which happens to be not so common [3].Research in compression aims at reducing that amount of data by exploiting redundancies.This paper focuses on the stereoscopic images compression [4][5][6] where the depth map is not by itself an issue and it is needed only to explain the differences between the two views.The horizontal distance between the two similar points is called the disparity and is inversely related to the depth.The depth map is sometimes encoded as a disparity map as for lifting schemes where the view synthesis is achieved using a set of predict and update filters in a multi-resolution context.Correlations between depth map texture and motion are exploited in [3].In [7], the authors used also view synthesis optimization, meaning that the choice of the depth map takes also into account the reconstruction of the other view, while using a different framework, this idea is at the core of our present work.Besides, it should be said that high performance is achieved when different techniques are combined as in Multiview Video Coding (MVC) extension of H264/AVC video coding standard [8] which has been subjectively evaluated in [9].
As in [10,11], this paper proposes to work with the original framework, called the disparity-compensated compression scheme (DCC), exploiting the stereoscopic image redundancy.It consists in coding separately a reference view, losslessly encoding an estimated disparity map and then encoding a residual image.The transmitted information enables the decoder to reconstruct the reference view, and using the disparity map to compute a predicted view to which is added the decoded residual image.Note that the DCC scheme shares some similarities with the depth and view synthesis representation in that depth information is here modeled as a block-based disparity map and the texture information is featured by the lossy-encoded residual image.The DCC scheme is very similar to motion/disparity compensation implemented in the HEVC/MVC (extension of the H264/AVC) video coding standards.
Research within this framework has achieved increased performance when estimating the disparity map, by taking into account its own bit-cost in [12,13] and its limited predicting capacity [14], by using blocks of arbitrary shapes in [15], and by addressing also the illumination compensation in [16].Investigating the statistical properties of the residual, reference [17] uses a DCT-based coder for nonoccluded 8×8-blocks and a 3-level Haar-based coder for occluded 8×8-blocks to encode the residual instead of the JPEG-coder [18].Reducing the numerical complexity is also a significant research issue.Examples include selecting optimal hyper parameter values thanks to allocation modeling in [19] as opposed to an exhaustive search in [20] reducing the search area in [21] and using embedded coding scheme that can be truncated at any point to obtain the best reconstruction for a given bitrate [17].
At the core of our work is the idea that the estimation of the disparity should take into account the ability of the residual coder to refine the predicted view, instead of assuming that the best predicted view yields the best compensated view.In the context of the JPEG-residual encoder, a proof of concept using a very greedy algorithm has already shown increased performance in [22].Our contribution is the design of an algorithm with a reasonable numerical complexity, able to select the disparity according to the compensated predicted view in order to improve the rate-distortion performance of the compressed stereoscopic image.
This paper is organized as follows.Section 2 summarizes the basic concepts of the classical DCC scheme.Section 3 shows how finding the best performing disparity map.Section 4 reviews the greedy disparity-compensated block matching (DCBM) algorithm to solve the optimization problem.Section 5 proposes a fast extension of the DCBM algorithm.Section 6 discusses the simulation results.Section 7 concludes the paper.

Basic concepts and notations
This paper deals with rectified stereoscopic images using the classical DCC scheme.Notations, used in the following sections, are summarized in Fig. 1 presenting the DCC scheme where the dashed line separates the encoder (above) from the decoder (below).
In Fig. 1, I l (upper left corner) denotes the left view chosen here as the reference view.It feeds a lossy encoder denoted C q l (upper left corner) where q l ∈ Q l is its quality factor and Q l is a set containing all allowed values.The bit stream output is transmitted to the decoder (left downward arrow connecting the dashed line).This bit stream is decoded by D l yielding a reconstructed left view denoted I l (lower left corner) as follows: ( Note that the framework chosen uses a close loop as this bit stream yields also I l in the encoder through D l (center upper part).I l feeds the remaining compressing part.Such a choice reduces the distortion as I l is not available to the decompressing part, but it also increases the numerical complexity as the remaining compressing part depends on the choice of q l .I r (center of the upper part) represents the original right view.
With I l , it is used by the disparity estimator (DE) to yield a disparity map denoted d using the well-known BM algorithm.d is then used by the Image Predictor (IP) to transform I l into the predicted view, denoted I p .More specifically, I l and I r are decomposed into K nonoverlapping blocks of same size.The upper left corner of the k-block is indicated by coordinates (i k , j k ).The pixels contained in the k-block are referred to by (i k + Δi, j k + Δj) where (Δi, Δj) spans B, a set listing all internal-block displacements (including (0, 0)).
where k ranges from 1 to K and (Δi, Δj) spans B. This IPblock is shown on the upper right part in Fig. 1.To simplify notations, we do not indicate here the d-dependency of I p .
The BM algorithm, in the DE-block, consists in selecting for each k-block, the disparity value d k for which the k-block I p -values resemble most the k-block I r -values in the sense that the mean squared error is minimized as follows: where S contains all allowed disparity values.
As I l is q l -dependent, the disparity value found, d k is also q l -dependent.C (center upper part) is a lossless encoding operation of the disparity map d.The resulting bit stream is transmitted to the decoder (center downward arrow connecting the dashed line) which recovers the exact disparity map d, through D, being the inverse operation of C as follows: The recovered disparity map is used with I l by the second IP-block to yield according to Eq. ( 2), I p , this time in the decoder.This second IP-block is at the bottom in Fig. 1.R (upper right corner) represents the residual image, that is the difference between the original right view and its prediction: C q r (upper right corner) is a lossy encoding operation where q r ∈ Q r is its quality factor and Q r is the set of all allowed values.C q r compresses R into a bit stream transmitted to the decoder (right downward arrow connecting the dashed line).D r , being the inverse operation of C q r , is used in the decoder to get an approximation of R denoted R. By reversing Eq. ( 5), the decoder gets an approximation of I r denoted as I r and given by: In general, I r is closer to I r than I p and this improvement of I p is being referred to as compensation.
The bitrate, denoted by b, is deduced from the bit streams C q l (I l ), C(d) and C q r (R): where | • | is the set cardinal number, here it helps counting, above, the number of bits and, below, the number of pixels.

Optimization problem statement
The aim of a coding/decoding scheme is a trade-off between getting the highest quality (i.e., visual rendering) while using the least amount of bits accounted for by Eq. ( 7).In this paper, this trade-off is rephrased into finding the best quality within a constrained bit budget.The mean squared error between ( I l , I r ) and (I l , I r ) is used as the cost function to be minimized with respect to a bit budget, b a .More specifically, the mean squared error of the k-block of an image I as compared to that of an image I is: Averaging J k over all blocks yields J : The cost function is then defined as: This choice of cost function gives way to an optimization problem.I r is actually (q l , q r , d) dependent as stated by Eqs. ( 1), ( 2), ( 5) and ( 6).I l is q l dependent (see Eq. ( 1)).Such dependencies are indicated here: ql ∈Ql , qr ∈Qr , b≤ba J I l (q l ), I l , I r (q l , q r , d(q l , q r )), I r (11) where b, defined in Eq. ( 7), depends on I l , d, I r , q l , q r .S K is the set of all arrays of size K whose components are in S, and b a is the expected bitrate.
Investigating the link between the BM algorithm and this optimization problem, Eq. ( 3) is recasted into: When considering the whole array of disparities, Eq. ( 12) becomes: Equation ( 13) is different from Eq. ( 11) only in that I p is considered instead of I r .This difference is actually the decoded-encoded residual as stated by Eqs. ( 5) and ( 6): Hence, the BM algorithm can be regarded as a suboptimal solution of Eq. ( 11), where the effect of the choice of the disparity on the residual, and the residual impact on the distortion, are neglected.Note that from then on, this DCC algorithm is referred to as BM algorithm.

Review of DCBM algorithm
This section presents the strategy of the disparity-compensated block matching (DCBM) algorithm already developed in [22].The DCBM algorithm is different from the BM algorithm in that Eq. ( 11) is no longer simplified into Eq.( 13).
The DCBM algorithm is derived from a different suboptimal solution involving much greater numerical complexity.Indeed the algorithm is computed in K + 1 steps.In the first step, the disparity map is computed using the BM algorithm.This initial disparity map has the K following components: where k ranges from 1 to K .Note that at this point d(0, q l ) does not depend on q r .The goal at step t ∈ {1, . . ., K } is to select the k-block disparity, denoted, for now, as s.We assume that a disparity map d(t − 1, q l , q r ) has already been computed at step t − 1.For each s ∈ S, a predicted image I p (t, q l , q r , s) is computed taking into account s on the tth block and d k (t − 1, q l , q r ) for all other blocks: with (Δi, Δj) spanning B and k ranging from 1 to K .
Compensation transforms I p (t, q l , q r , s) into I r (t, q l , q r , s) as follows: I r (t, q l , q r , s) = I p (t, q l , q r , s) +D r C q r I r − I p (t, q l , q r , s) .( Finally, J ( I r , I r ) is computed and the best disparity is selected as follows: Note that the increased numerical complexity when using DCBM, stems from the necessity, to code and decode a new image, at each block and then each time a new disparity value is considered.The DCBM algorithm is summarized in Algorithm 1.

Algorithm 1 DCBM algorithm
Input: I l , I r , q l , q r Output: C q l (I l ), C(d), C qr (R), b, J Compute C q l (I l ), I l with Eq. ( 1) and J ( I l , I l ) with Eqs. ( 8) and ( 9) Compute d(0, q l ) with Eq. ( 15) using I p defined by Eq. ( 2) for all t ∈ {1 . . .K } do for all s ∈ S do Compute I p (t, q l , q r , s) with Eq. ( 16) using d(t − 1, q l , q r ) Compute I r (t, q l , q r , s) with Eq. ( 17) Compute J I r (t, q l , q r , s), I r with Eq. ( 9) end for Select d(t, q l , q r ) with Eq. ( 18) using all s-values of J ( I r , I r ) end for Get d = d(K , q l , q r ) and compute C(d) Compute I p with Eq. (2) using d Compute R = I r − I p and C qr (R) with Eq. ( 5) Compute I r with Eq. ( 6) and J ( I r , I r ) with Eq. ( 9) Compute J with Eq. (10) using J ( I l , I l ) and J ( I r , I r ) Compute b(I l , d, I r , q l , q r ) with Eq. ( 7) using C q l (I l ), C(d), C qr (R)

Proposed FDCBM algorithm
Due to the interesting performance of the DCBM algorithm (see [22]), this section proposes a Fast version of this algorithm called FDCBM algorithm.The novelty is that disparity selection is no longer based on the computation of I r with all its pixel values.The underlying idea of the developed algorithm is first discussed, and then an explicit formula of the JPEG-codec distortion is derived.Blocks of size 8×8 pixels are considered knowing that an extension to a larger block size is possible.

FDCBM algorithm underlying idea
This section considers that the size of B is 8×8 and more specifically that the disparity-related blocks are exactly the JPEG-related blocks.
Introduce first some new notations.Define R = D r C q r (R) the reconstructed residual at the decoder, and I k any matrix of size 8×8: So as to be consistent with notations defined in Sect.2, indexes of these 8×8 matrices start from 0: Δi, Δj ∈ {0, . . .7}.Note that because of the above block-related assumption, R k can also be considered as the decoded-encoded 8×8 matrix R k : Our main claim is that the relevant pixel values are those of R k and that J k measures the mean squared distortions yielded by the compression and decompression of R k : The first equality is obtained with Eqs. ( 5) and ( 6).The second equality uses an additive-invariance property derived from Eq. ( 8).The third equality is computed using Eqs.( 8), (19) and (20).

JPEG encoding modeling
This section is interested in what JPEG encoding causes distortions, namely the quantization of the DCT-components: where Q q r is the 8×8-JPEG-quantizer.
As DCT is an orthogonal transformation, it preserves the L2 norm: Combining Eqs. ( 22) and ( 23), a minimized formula of the mean squared distortions is obtained: The explicit formula uses the following information extracted from the JPEG-codec (see [23]).The DCT of an 8×8 matrix is: where T is an 8×8 orthogonal matrix defined as follows: The JPEG-quantizer transforms an 8×8-matrix into an 8×8-matrix: using a nonlinear mapping transforms q r into a scaling factor (see [24]): Experimentations have shown that J k I r , I r is not , and the latter depends on q l , q r and on the k-block disparity, s.So the following notation is used: Finally, the k-block disparity is selected as:

Derived FDCBM algorithm
Instead of computing large-scale images with DCBM algorithm, only 8×8-matrices are computed yielding to an approximation of J k ( I r , I r ) (i.e., Jk (q l , q r , s)) using Eq.(29).Moreover, instead of selecting the k-block disparity based on J ( I r , I r ), it is based on the minimization of Jk (q l , q r , s).The numerical complexity of FDCBM algorithm is then definitely much lower than that of DCBM algorithm.It remains higher than that of the BM algorithm, not only because of the complexity of Eq. ( 29) but also because it takes into account q l and q r , whereas BM takes into account only q l .The FDCBM algorithm is summarized in Algorithm 2.

Algorithm 2 FDCBM algorithm
Input: I l , I r , q l , q r Output: C q l (I l ), C(d), C qr (R), b, J Compute C q l (I l ), I l with Eq. (1) and J ( I l , I l ) with Eqs. ( 8) and ( 9) for all k ∈ {1 . . .K } do for all s ∈ S do Compute R k using I l and I r with Eqs. ( 19), ( 2) and (5) Compute Jk (q l , q r , s) with Eq. ( 29) end for Select d k with Eq. ( 30 5) Compute I r with Eq. ( 6) and J ( I r , I r ) with Eq. ( 9) Compute J with Eq. (10) using J ( I l , I l ) and J ( I r , I r ) Compute b(I l , d, I r , q l , q r ) with Eq. ( 7) using C q l (I l ), C(d), C qr (R)

Performance of the proposed algorithm
This section starts with a discussion on the validity of Eq. ( 29) on which the proposed FDCBM algorithm is based.To do so, simulations are conducted on synthetic data to measure the ability of this equation to reduce distortions more than the BM algorithm.
(31) Figure 2 illustrates the behavior of the ratio ρ(q r ) when q r ranges from 1 to 100.When q r is between 15 and 90, on average and compared to the distortions left when using BM algorithm, FDCBM algorithm is able to reduce at least 90% of the distortions that DCBM algorithm is able to reduce.
The second part of this section concerns the simulation results performed on Middleburry dataset stereoscopic images [2].To simplify the experiment, the left view is not compressed.Assume that the pixel values, on both views, are ranging from 0 to 255.The distortion of the predicted right view is measured using the peak signal-to-noise ratio (P SN R) given by P SN R = 10 log 10 .The lossless coder, C, is here an arithmetic coder (see [25]).To reduce the numerical complexity, the set of quality factor values is reduced to Q r = {5, 10, 15, . . ., 90}.The set of all available disparities is S = {0, . . ., 120}.
The rate-distortion curves, provided in Fig. 3, confirm the results stated above using "Art" stereoscopic image of Middlebury-dataset (2005) and blocks of size 8×8.Indeed, the performance (in terms of rate distortion) of the proposed FDCBM algorithm is similar to that of DBCM algorithm, which is, however, better than that of the classical BM algorithm and the reference-based block matching algorithm called (R algorithm) proposed in [26].Figure 5 presents the decompressed right image "Aloe" extracted from Middleburry dataset (2006) using BM algorithm on the left side, R algorithm on the mid side and FDCBM on the right side.For each algorithm, blocks are of sizes 8×8 and q r ∈ Q r is set so that b = 0.3 bpp.When comparing both reconstructed views with the original view, it appears that the background cloth on right neighborhoods of each vertical leaf is wrongly drawn.The reason may be that these neighborhoods are occluded in the left view.The BM and R algorithms yield a dotted structure, whereas the FDCBM algorithm yields a slightly blurred square texture.From a PSNR-viewpoint, the FDCBM-reconstructed view is closer to the original view (30.14 dB) than the BM-reconstructed view (29.5 dB) and the R-reconstructed image (29.6 dB).
Figure 4 shows the histograms of, on the left side, the BM-disparity map, on the mid side the R-disparity map, and on the right side, the FDCBM-disparity map for the same experiment.More specifically, selected disparity values are sorted into 10 bins, each bin is referred to by its average disparity value on the horizontal axis.The vertical axis indicates the number of blocks for which the disparity value falls into a given bin.(The total number of blocks for that image is 2726.)Both histograms are right skewed, showing that for most blocks it did not prove useful to consider disparity values greater than 50.A closer look shows that, on the righthand side, the two first columns are slightly bigger and the two following columns are slightly smaller.This means that for this specific image, on average FDCBM algorithm tends to select smaller disparity values than BM and R algorithms (Fig. 5).
Figure 6 provides the reconstructions of the "Dwarves" right view from Middleburry dataset (2005) using BM (in the As for numerical complexity, FDCBM algorithm (consuming 17 s) is 3388 times quicker than DCBM algorithm (consuming 4 h), 6.8 times slower than BM algorithm (consuming 2.5 s) and 1.5 times slower than R algorithm (consuming 12 s).This has been measured on the "Aloe" stereoscopic image with block of 8 × 8 size using Matlab in a Windows environment on a computer using one processor with four cores at a frequency of 3.7 GHz.
The Bjøntegaard metric [27] is used here to quantify the increase in performance of FDCBM algorithm as compared to BM and R algorithms.Based on four rate-distortion points for each algorithm (roughly [0.3, 0.4, 0.5, 0.6] bpp), it computes an average PSNR increase or an average bitrate decrease.As for the "Art" stereoscopic image, FDCBM algorithm yields on average a PSNR increase of, respectively, 0.78 dB and 0.52 dB compared to the BM and R algorithms.To simplify its reading, the stereoscopic images have been sorted by their increase in PSNR performance to compare FDCBM with BM algorithms.
Table 1 shows on columns 2 and 3 that, on average, for all stereoscopic images, FDCBM is better performing than BM, and the difference ranges from 0.42 up to 1.69 dB.It seems difficult to understand why this difference is higher for some images and lower on other images.For instance, "Cloth3" and "Cloth4" appear at both ends of the table and yet have similar appearance.The same comment applies to "Baby1" and "Baby3."And both "Midd1," "Midd2" and "Lampshade1," "Lampshade2" have similar appearance and yet each pair has quite different performance increases.It is interesting to note that the stereoscopic image having the least PSNR-performance increase (+0.17 dB), namely "Plastic," is having a rather important bitrate decrease (−15.73%).
exploit the parameters tables, as specified in the standards, to better choose the disparities to improve the compensated view quality.Indeed, the residual error coding is traditionally based on an orthogonal transformation followed by a quantization process controlled by some parameters associated with quantization tables which need to be studied in future work.Moreover, only equal size blocks have been considered to show the interest of the proposed strategy.Blocks of variable size will be investigated in the near future.

Fig. 1
Fig. 1 DCC scheme where the encoder (above) is separated from the decoder (below) by a dashed line ) using all s-values of Jk (s) end for Collect d = (d 1 , . . ., d K ) and compute C(d) Compute I p with Eq. (2) using d Compute R = I r − I p and C qr (R) with Eq. (

Fig. 2
Fig. 2 Average distortion reduction ratio of BM-FDCBM compared to BM-DCBM on synthetic data (function of q r )

255 2 J
( I r ,I r ) ).The rate, in bits per pixel (bpp), is measured only on the right view according to b = |C(d)|+|C qr (R)| |I r |