3/03/2010

03-03-10 - Image Compresson - Color , ScieLab - Part 2

Follow up to the last post on color .

First a correction : what I said about downsampling there is mostly wrong. I made the classic amateur's blunder of testing on too small a data set and drawing conclusions from it. I'm a little embarassed to make that mistake, but hey this is a blog not a research journal. Any expectations of rigor are unfounded. For example this is one of the test images I ran on that convinced me that downsample was bad :


aikmi
-i7 qtable ; CoCg optimized joint for min SCIELAB

downsample :

   262,144 ->    32,823 =  1.001 bpb =  7.986 to 1 (per pixel)
Q : 11.0000  Co scale = Cg Scale = 1.525
bits DC : 19636|5151|3832 , bits AC : 175319|38483|19879
bits DC = 10.9% bits AC = 89.1%
bits Y = 74.3% bits CoCg = 25.7%
rmse : 7.3420 , psnr : 30.8485
ssim : 0.9134 , perc : 73.3109%
scielab rmse : 2.200

no downsample :

   262,144 ->    32,679 =  0.997 bpb =  8.021 to 1 (per pixel)
Q : 12.0000  Co scale = Cg Scale = 0.625
bits DC : 19185|13535|9817 , bits AC : 160116|39407|19091
bits DC = 16.3% bits AC = 83.7%
bits Y = 68.7% bits CoCg = 31.3%
rmse : 6.9877 , psnr : 31.2781
ssim : 0.9111 , perc : 72.9532%
scielab rmse : 1.980

you can see that downsample is just much worse in every way, including severely worse in SCIELAB which doesn't care about chroma differences as much as luma. In this particular image, there's a lot of high detail color bits, and the downsampled version looks significantly worse, it's easy to pick out visually.

However, in general this is not true, and in fact downsample is often a small win.

Without further ado I present lots of stats :

i0 Cg=1 Co=1 i0 Cg = 0.6 Co = 0.575 i7 Cg = 0.6 Co = 0.575 i4/i7 opt per image i7 CoCg optimized independently per image i7 CoCg optimized jointly per image downsampled
file rmse scielab rmse scielab rmse scielab rmse scielab Co Cg rmse scielab Co / Cg rmse scielab
kodim01 12.6809 4.8898 12.5848 4.8413 12.6567 4.3415 12.7018 4.238 0.455 0.455 12.623 4.3153 1.225 12.486 4.2525
kodim02 6.235 2.1961 6.1733 2.1793 6.2836 2.0519 6.2544 1.9542 0.58 0.58 6.2285 1.978 1.3375 6.4866 1.9841
kodim03 4.0098 1.7135 3.974 1.7173 4.0621 1.5587 3.9778 1.5883 0.705 0.83 4.0853 1.5359 1.6 4.1235 1.6102
kodim04 6.3981 2.4661 6.3658 2.4929 6.4083 2.2579 6.4083 2.2579 0.705 0.705 6.4092 2.248 1.5625 6.3698 2.1977
kodim05 14.2903 7.2293 14.0531 7.1756 14.1613 6.5253 14.2296 6.452 0.58 0.58 14.167 6.5291 1.5625 13.9658 6.4326
kodim06 8.9416 3.6338 8.836 3.5923 8.9622 3.2131 9.0316 3.1608 0.455 0.58 8.9664 3.2184 1.3 8.8455 3.1733
kodim07 5.147 2.316 5.1145 2.1919 5.2338 2.0167 5.2388 1.9815 0.58 0.58 5.202 2.0047 1.225 5.1601 1.9462
kodim08 14.6964 7.5082 14.5479 7.5237 14.5675 6.8769 14.6411 6.7521 0.58 0.83 14.5726 6.8285 1.4875 14.3053 6.692
kodim09 4.4789 1.8149 4.439 1.8574 4.5303 1.675 4.5303 1.675 0.705 0.955 4.5467 1.6359 1.4125 4.5389 1.6906
kodim10 4.9926 2.0932 4.9477 2.1196 5.0678 1.9887 5.0398 1.9514 0.58 0.955 5.0585 1.9109 1.6 5.0449 1.9556
kodim11 7.9484 3.2677 7.9006 3.2315 8.0441 2.9234 8.0441 2.9234 0.58 0.58 8.0478 2.9276 1.375 7.939 2.858
kodim12 4.6495 1.8486 4.6326 1.8529 4.7335 1.6862 4.7259 1.6663 0.58 0.705 4.7041 1.6776 1.2625 4.7001 1.6457
kodim13 18.5372 8.3568 18.3502 8.2634 18.5334 7.2841 18.6579 7.1262 0.455 0.58 18.5013 7.2697 1.1125 18.381 7.2327
kodim14 11.076 4.8628 10.972 4.7473 11.0146 4.3268 11.064 4.2636 0.58 0.58 11.0151 4.3308 1.3 10.9818 4.3614
kodim15 5.8269 2.4099 5.8082 2.4665 5.9134 2.2246 5.8383 2.2457 0.705 0.705 5.9158 2.2098 1.525 5.8699 2.1497
kodim16 5.689 2.3266 5.6289 2.3199 5.7372 2.0534 5.7372 2.0534 0.58 0.58 5.7373 2.055 1.375 5.6667 2.0276
kodim17 5.5166 2.3244 5.47 2.2994 5.6716 2.0774 5.5853 2.0874 0.455 0.705 5.6523 2.0574 1.4125 5.6014 2.037
kodim18 10.8501 4.8609 10.7131 4.7903 10.9517 4.3169 10.9639 4.2627 0.58 0.705 10.9266 4.3006 1.3375 10.8048 4.2189
kodim19 7.1545 2.8338 7.0872 2.8518 7.2311 2.4977 7.2637 2.4362 0.58 0.705 7.2158 2.4758 1.5625 7.1314 2.4396
kodim20 4.7872 1.8258 4.7183 1.8042 4.9208 1.6441 4.863 1.6524 0.455 0.83 4.9265 1.6306 1.1875 4.9427 1.656
kodim21 7.7757 3.3671 7.6338 3.3427 7.9293 3.0078 7.8541 3.0018 0.705 0.705 7.9204 2.95 1.3 7.7688 2.9302
kodim22 8.279 3.2205 8.1788 3.1253 8.3292 2.8656 8.3542 2.8114 0.455 0.58 8.3026 2.8379 1.45 8.267 2.8436
kodim23 3.917 1.5567 3.8968 1.5138 3.953 1.4315 3.961 1.4157 0.58 0.58 3.9481 1.4146 1.6 4.3382 1.573
kodim24 10.9877 5.2479 10.8105 5.0477 11.0256 4.6141 11.0435 4.5882 0.455 0.455 11.0413 4.6005 1.3375 10.9372 4.503
194.86 84.17 192.84 83.35 195.92 75.46 196.01 74.54 195.71 74.94 194.65 74.41

explanation :

output bit rate 1 bpb in all cases
parameters are optimized to minimize E = ( 2 * SCIELAB + 1 * RMSE )
RMSE is on RGB
SCIELAB is perceptual color difference metric

i0 = flat quantization matrix
i7 = tweaked perceptual quantization matrix to minimize E
i4/i7 = optimized blend of flat to perceptual matrices


The table reads roughly left to right in terms of decreasing perceptual error.  

"i0 Cg=1 Co=1" : flat q-matrix, standard lossless YCoCg transform without extra scaling

"i0 Cg=0.6 Co=0.575" ; optimize CoCg scale for E ; interestingly this also helps RMSE

"i7 Cg=0.6 Co=0.575" ; non-flat constant Q-matrix ; hurts RMSE a bit, helps SCIELAB a lot

"i4/i7 opt per image" ; per-image non-flat Q-matrix ; not a big difference

"i7 CoCg optimized independently per image" : independently optimize Co and Cg for each image

"i7 CoCg optimized jointly per image downsampled" : downsample test, CoCg optimized with Co=Cg

On the full kodak set, downsampling is a slight net win. There are a few cases (kodim03,kodim23) where it hurts a lot like I saw before, but in most cases it is a slight win or close to neutral. The conclusion is that given the speed benefit, you should downsample. However there are occasional cases where it will hurt a lot.

I think most of the results are pretty intuitive and not extremely dramatic.

It's a little non-inuitive what exactly is going on with the per-image customized chroma scales. Your first thought might be "well those images have different colors in them, so the color space scale is adapting to the color content in the image". That's not so. For one thing, more or less content of a certain color doesn't mean you need a different color space - it just means that that band of the color space will get more energy, and thus more bits. e.g. an image that has lots of "Co" component colors will simply have more energy in the Co plane - that doesn't mean scaling Co either up or down will help it.

If you think about the scaling another way it's more obvious what's going on. Scaling the color planes is equivalent to using different quantizers per plane. Optimizing the scalings is equivalent to doing an R/D optimization of the quantizer of each plane. Thus we see what the scaling is doing : it's taking bits away from hard to code planes and moving them to easier to code planes (in an R/D slope sense).

In particular, when I visually inspected some of the more extreme cases (cases where the per-image optimized scales were a big win vs. a constant overall scale, such as kodik10) what I found was that the optimized scalings were taking bits *away* from the dominant colors. One very obvious case was on photos of the ocean. The ocean is mostly one color and is very hard to code (expensive in an R/D sense) because it's all choppy and random. The optimized scaling took bits away from the ocean and moved them to other colors that had more R/D payoff.

(BTW rambling a bit : I've noticed that x264 Psy VAQ tends to do the same kind of thing - it takes bits away from areas that are really noisy mess, such as water, and moves them to areas that have smooth pattern and edges. Intuitively you can guess if an area is a mess and just really hard to code then you should just say "fuck it" and starve it for bits even if MSE R/D tells you it wants bits. I think also that improving an area from an RMSE of 4 to 2 is better than improving from 10 to 7, even though it's less of a distortion win. Visually there's a bit difference that occurs when an area goes from "looks good" to "looks noisy" , but not much of a difference when an area goes from "looks bad" to "looks really bad").

So this is in fact not really a surprising result. We know already that heavy R/D bit allocation can do wonders for lossy compressors. That are lots more areas to explore - optimization of every coefficient in the quantization matrix, optimization of the color transform, optimization of the transform basis functions, etc. etc. - and in each case you need to be clever about the way you encode the extra rate control side information.

ADDENDUM : I thought I should write up what I think are the useful takeaway conclusions :

1. It is crucial to do the right kind of scaling to Co/Cg (or chroma more generally) depending on whether you downsample or not. In particular the way most people just turn downsample on or off and don't compensate by scaling chroma is a mistake, eg. not a fair comparison, because their scaling will be tuned for one or the other.

2. Downsample vs. no-downsample is pretty close to neutral. If you downsample for speed, that's probably fine. There are rare cases where it does hurt a whole lot though.

3. Using a non-flat Q matrix does in fact help perceptual quality significantly. And it doesn't hurt RGB RMSE nearly as much as it helps SCIELAB (helps SCIELAB by 10.35 % , hurts RMSE by 1.58 % ).

4. It does appear acceptable to use global tweaked values all the time rather than custom tweaking to each image. Custom tweaks do give you another bit of benefit, but it's not huge, thus not worth the very slow optimization step. (see DCTune eg)

2 comments:

ryg said...

It'd be interesting to know how exactly the Kodak images were shot and digitized (my initial search didn't turn up anything). I think it's scanned from high-resolution "analog" photos, which would be fine. But if there's a 1-CCD camera with Bayer mosaic inbetween, that would probably bias the results towards downsampling since there's less detail in the chroma planes to begin with.

cbloom said...

"But if there's a 1-CCD camera with Bayer mosaic inbetween, that would probably bias the results towards downsampling since there's less detail in the chroma planes to begin with."

Blurg, good point. Though maybe you want to test on some CCD images, since a large portion of what you will be working on comes from that source.

.. and ideally you should also have a model of your output device; if you're being shown on a TV that has some lower quality form of chroma reproduction that should be incorporated in the SCIELAB filters. blurg blurg

old rants