11/20/2008

11-20-08 - DXTC Part 3

So we know DXT1 is bad compared to a variable bitrate coder, but it is a small block fixed rate coder, which is pretty hard to deal with.

Small block inherently gives up a lot of coding capability, because you aren't allowed to use any cross-block information. H264 and HDPhoto are both small block (4x4) but both make very heavy use of cross-block information for either context coding, DC prediction, or lapping. Even JPEG is not a true small block coder because it has side information in the Huffman table that captures whole-image statistics.

Fixed bitrate blocks inherently gives up even more. It kills your ability to do any rate-distortion type of optimization. You can't allocate bits where they're needed. You might have images with big flat sections where you are actually wasting bits (you don't need all 64 bits for a 4x4 block), and then you have other areas that desperately need a few more bits, but you can't gived them to them.

So, what if we keep ourselves constrained to the idea of a fixed size block and try to use a better coder? What is the limit on how well you can do with those constraints? I thought I'd see if I could answer that reasonably quickly.

What I made is an 8x8 pixel fixed rate coder. It has zero side information, eg. no per-image tables. (it does have about 16 constants that are used for all images). Each block is coded to a fixed bit rate. Here I'm coding to 4 bits per pixel (the same as DXT1) so that I can compare RMSE directly, which is a 32 byte block for 8x8 pixels. It also works pretty well at 24 byte blocks (which is 1 bit per byte), or 64 for high quality, etc.

This 8x8 coder does a lossless YCoCg transform and a lossy DCT. Unlike JPEG, there is no quantization, no subsampling of chroma, no huffman table, etc. Coding is via an embedded bitplane coder with zerotree-style context prediction. I haven't spent much time on this, so the coding schemes are very rough. CodeTree and CodeLinear are two different coding techniques, and neither one is ideal.

Obviously going to 8x8 instead of 4x4 is a big advantage, but it seems like a more reasonable size for future hardware. To really improve the coding significantly on 4x4 blocks you would have to start using something like VQ with a codebook which hardware people don't like.

In the table below you'll see that CodeTree and CodeLinear generally provide a nice improvement on the natural images, about 20%. In general they're pretty close to half way between DXTC and the full image coder "cbwave". They have a different kind of perceptual artifact when they have errors - unlike DXTC which just make things really blocky, these get the halo ringing artifacts like JPEG (it's inherent in truncating DCT's).

The new coders do really badly on the weird synthetic images from bragzone, like clegg, frymire and serrano. I'd have to fix that if I really cared about these things.

One thing that is encouraging is that this coder does *very* well on the simple synthetic images, like the "linear_ramp" and the "pink_green" and "orange_purple". I think these synthetic images are a lot like what game lightmaps are like, and the new schemes are near lossless on them.

BTW image compression for paging is sort of a whole other issue. For one thing, a per-image table is perfectly reasonable to have, and you could work on something like 32x32 blocks. But more important is that in the short term you still need to provide the image in DXT1 to the graphics hardware. So you either just have to page the data in DXT1 already, or you have to recompress it, and as we've seen here the "real-time" DXT1 recompressors are not high enough quality for ubiquitous use.

ADDENDUM I forgot but you may have noticed the "ryg" in this table is also not the same as previous "ryg" - I fixed a few of the little bugs and you can see the improvement here. It's still not competitive, I think there may be more errors in the best fit optimization portion of the code, but he's got that code so optimized and obfuscated I can't see what's going on.

Oh, BTW the "CB" in this table is different than the previous table; the one here uses 4-means instead of 2-means, seeded from the pca direction, and then I try using each of the 4 means as endpoints. It's still not quite as good as Squish, but it's closer. It does beat Squish on some of the more degenerate images at the bottom, such as "linear_ramp". It also beats Squish on artificial tests like images that contain only 2 colors. For example on linear_ramp2 without optimization, 4-means gets 1.5617 while Squish gets 2.3049 ; most of that difference goes away after annealing though.

RMSE per pixel :

file Squish opt Squish CB opt CB ryg D3DX8 FastDXT cbwave KLTY CodeTree CodeLinear
kodim01.bmp 8.2808 8.3553 8.3035 8.516 8.9185 9.8466 9.9565 2.6068 5.757835 5.659023
kodim02.bmp 6.1086 6.2876 6.1159 6.25 6.8011 7.4308 8.456 1.6973 4.131007 4.144241
kodim03.bmp 4.7804 4.9181 4.7953 4.9309 5.398 6.094 6.4839 1.3405 3.369018 3.50115
kodim04.bmp 5.6913 5.8116 5.7201 5.8837 6.3424 7.1032 7.3189 1.8076 4.254454 4.174228
kodim05.bmp 9.6472 9.7223 9.6766 9.947 10.2522 11.273 12.0156 2.9739 6.556041 6.637885
kodim06.bmp 7.1472 7.2171 7.1596 7.3224 7.6423 8.5195 8.6202 2.0132 5.013081 4.858232
kodim07.bmp 5.7804 5.8834 5.7925 5.9546 6.3181 7.2182 7.372 1.4645 3.76087 3.79437
kodim08.bmp 10.2391 10.3212 10.2865 10.5499 10.8534 11.8703 12.2668 3.2936 6.861067 6.927792
kodim09.bmp 5.2871 5.3659 5.3026 5.4236 5.7315 6.5332 6.6716 1.6269 3.473094 3.479715
kodim10.bmp 5.2415 5.3366 5.2538 5.3737 5.7089 6.4601 6.4592 1.7459 3.545115 3.593297
kodim11.bmp 6.7261 6.8206 6.7409 6.9128 7.3099 8.1056 8.2492 1.8411 4.906141 4.744971
kodim12.bmp 4.7911 4.8718 4.799 4.9013 5.342 6.005 6.0748 1.5161 3.210518 3.231271
kodim13.bmp 10.8676 10.9428 10.9023 11.2169 11.6049 12.7139 12.9978 4.1355 9.044009 8.513297
kodim14.bmp 8.3034 8.3883 8.3199 8.5754 8.8656 9.896 10.8481 2.4191 6.212482 6.222196
kodim15.bmp 5.8233 5.9525 5.8432 6.0189 6.3297 7.3085 7.4932 1.6236 4.3074 4.441998
kodim16.bmp 5.0593 5.1629 5.0595 5.1637 5.5526 6.3361 6.1592 1.546 3.476671 3.333637
kodim17.bmp 5.5019 5.6127 5.51 5.6362 6.0357 6.7395 6.8989 1.7166 4.125859 4.007367
kodim18.bmp 7.9879 8.0897 8.0034 8.225 8.6925 9.5357 9.7857 2.9802 6.743892 6.376692
kodim19.bmp 6.5715 6.652 6.5961 6.7445 7.2684 7.9229 8.0096 2.0518 4.45822 4.353687
kodim20.bmp 5.4533 5.5303 5.47 5.5998 5.9087 6.4878 6.8629 1.5359 4.190565 4.154571
kodim21.bmp 7.1318 7.2045 7.1493 7.3203 7.6764 8.4703 8.6508 2.0659 5.269787 5.05321
kodim22.bmp 6.43 6.5127 6.4444 6.6185 7.0705 8.0046 7.9488 2.2574 5.217884 5.142252
kodim23.bmp 4.8995 5.0098 4.906 5.0156 5.3789 6.3057 6.888 1.3954 3.20464 3.378545
kodim24.bmp 8.4271 8.5274 8.442 8.7224 8.9206 9.9389 10.5156 2.4977 7.618436 7.389021
clegg.bmp 14.9733 15.2566 15.1516 16.0477 15.7163 21.471 32.7192 10.5426 21.797655 25.199576
FRYMIRE.bmp 10.7184 12.541 11.9631 12.9719 12.681 16.7308 28.9283 6.2394 21.543401 24.225852
LENA.bmp 7.138 7.2346 7.1691 7.3897 7.6053 8.742 9.5143 4.288 7.936599 8.465576
MONARCH.bmp 6.5526 6.6292 6.5809 6.7556 7.0313 8.1053 8.6993 1.6911 5.880189 5.915117
PEPPERS.bmp 6.3966 6.5208 6.436 6.6482 6.9006 8.1855 8.8893 2.3022 6.15367 6.228315
SAIL.bmp 8.3233 8.3903 8.3417 8.5561 8.9823 9.7838 10.5673 2.9003 6.642762 6.564393
SERRANO.bmp 6.3508 6.757 6.5572 6.991 7.0722 9.0549 18.3631 4.6489 13.516339 16.036401
TULIPS.bmp 7.5768 7.656 7.5959 7.8172 8.0101 9.3817 10.5873 2.2228 5.963537 6.384049
lena512ggg.bmp 4.8352 4.915 4.8261 4.877 5.1986 6.0059 5.5247 2.054319 2.276361
lena512pink.bmp 4.5786 4.6726 4.581 4.6863 5.0987 5.8064 5.838 3.653436 3.815336
lena512pink0g.bmp 3.7476 3.8058 3.7489 3.8034 4.2756 5.0732 4.8933 4.091045 5.587278
linear_ramp1.BMP 1.4045 2.1243 1.3741 1.6169 2.0939 2.6317 3.981 0.985808 0.984156
linear_ramp2.BMP 1.3377 2.3049 1.3021 1.5617 1.9306 2.5396 4.0756 0.628664 0.629358
orange_purple.BMP 2.9032 3.0685 2.9026 2.9653 3.2684 4.4123 7.937 1.471407 2.585087
pink_green.BMP 3.2058 3.3679 3.2 3.2569 3.7949 4.5127 7.3481 1.247967 1.726312

No comments:

old rants