04-09-12 - Old Image Comparison Post Gathering

Perceptual Metrics, imdiff, and such. Don't think I ever did an "index post" so here it is :

01-18-11 - Hadamard
01-17-11 - ImDiff Release
01-12-11 - ImDiff Sample Run and JXR test
01-10-11 - Perceptual Results - PDI
01-10-11 - Perceptual Results - mysoup
01-10-11 - Perceptual Results - Moses
01-10-11 - Perceptual Metrics
01-10-11 - Perceptual Metrics Warmup - x264 Settin...
01-10-11 - Perceptual Metrics Warmup - JPEG Settin...
12-11-10 - Perceptual Notes of the Day
12-09-10 - Rank Lookup Error
12-09-10 - Perceptual vs TID
12-06-10 - More Perceptual Notes
12-02-10 - Perceptual Metric Rambles of the Day
11-18-10 - Bleh and TID2008
11-16-10 - A review of some perceptual metrics
11-08-10 - 709 vs 601
11-05-10 - Brief note on Perceptual Metric Mistakes
10-30-10 - Detail Preservation in Images
10-27-10 - Image Comparison - JPEG-XR
10-26-10 - Image Comparison - Hipix vs PDI
10-22-10 - Some notes on Chroma Sampling
10-18-10 - How to make a Perceptual Database
10-16-10 - Image Comparison Part 9 - Kakadu JPEG2000
10-16-10 - Image Comparison Part 11 - Some Notes on the Tests
10-16-10 - Image Comparison Part 10 - x264 Retry
10-15-10 - Image Comparison Part 8 - Hipix
10-15-10 - Image Comparison Part 7 - WebP
10-15-10 - Image Comparison Part 6 - cbwave
10-14-10 - Image Comparison Part 5 - RAD VideoTest
10-14-10 - Image Comparison Part 4 - JPEG vs NewDCT
10-14-10 - Image Comparison Part 3 - JPEG vs AIC
10-14-10 - Image Comparison Part 2
10-12-10 - Image Comparison Part 1


04-05-12 - DXT is not enough - Part 2

As promised last time , a bit of rambling on the future.

1. R-D optimized DXTC. Sticking with DXT encoding, this is certainly the right way to make DXTC smaller. I've been dancing around this idea for a while, but it wasn't until CRUNCH came out that it really clicked.

Imagine you're doing something like DXT1 + LZ. The DXT1 creates a 4 bpp (bits per pixel) output, and the LZ makes it smaller, maybe to 2-3 bpp. But, depending on what you do in your DXT1, you get different output sizes. For example, obviously, if you make a solid color block that has all indices of 0, then that will be smaller after LZ than a more complex block.

That is, we think of DXT1 as being a fixed size encoding, so the optimizers I wrote for it a while ago were just about optimizing quality. But with a back end, it's no longer a fixed size encoding - some choices are smaller than others.

So the first thing you can do is just to consider size (R) as well as quality (D) when making a choice about how to encode a block for DXTC. Often there are many ways of encoding the same data with only very tiny differences in quality, but they may have very different rates.

One obvious case is when a block only has one or two colors in it, the smallest encoding would be to just send those colors as the end points, then your indices are only 0 or 1 (selecting the ends). Often a better quality encoding can be found by sending the end point colors outside the range of the block, and using indices 2 and 3 to select the interpolated 1/3 and 2/3 points.

Even beyond that you might want to try encodings of a block that are definitely "bad" in terms of quality, eg. sending a solid color block when the original data was not solid color. This is intentionally introducing loss to get a lower bit rate.

The correct way to do this is with an R-D optimized encoder. The simplest way to do that is using lagrange multipliers and optimizing the cost J = R + lambda * D.

There are various difficulties with this in practice; for one thing exposing lambda is unintuitive to clients. Another is that (good) DXTC encoding is already quite slow, so making the optimization metric be J instead of D makes it even slower. Many simple back-end coders (like LZ) are hard to measure R for a single block for. And adaptive back-ends make parallel DXTC solvers difficult.

2. More generally we should ask why are we stuck with trying to optimize DXTC? I believe the answer is the preferred way that DXTC is treated by current hardware. How could we get away from that?

I believe you could solve it by making the texture fetch more programmable. Currently texture fetch (and decode) is one of the few bits of GPU's that still totally fixed function. DXTC encoded blocks are fetched and decoded into a special cache on the texture unit. This means that DXTC compressed textures can be directly rendered from, and also that rendering with DXTC compressed textures is actually faster than rendering from RGB textures due to the decreased memory bandwidth needs.

What we want is future hardware to make this part of the pipeline programmable. One possibility is like this : Give the texture unit its own little cache of RGB 4x4 blocks that it can fetch from. When you try to read a bit of texture that's not in the cache, it runs a "texture fetch shader" similar to a pixel shader or whatever, which outputs a 4x4 RGB block. So for example a texture fetch shader could decode DXTC. But it could also decode JPEG, or whatever.

old rants