Monday, October 2, 2017

Sprite blitting and Compression for Bitbox


Here are how and why the future sprite format is defined for bitbox sprites.

Here are the sprite format properties, why they are needed and what impacts it had on the format

  • binary based format
Since all of our data is generally compiled we could express the image as C code. This would be easier than to serialize a binary file and deserialize when loading from flash memory. However, we don't do that, in order to allow sprite data to be put to SD card, or be compressed in flash and the uncompressed to RAM.

  • line-based blitting
As the bitbox is issuing pixels line by line, objects in the bitbox blitter are defined by two methods : what to do each frame and what to do each line (on top of misc values as x,y position, height of blits ...). Thus it will be much better to think of sprites as RUNs of pixels instead of small squares by example.

  • no transparent color
Setting encoding of transparent pixels is not really the way to go for software blitters, because you'll need to check for that color for every pixel. Generally, on a give image the blanks go on strides (borders, by example) and then it's opaque strides. So it's better to encode a RUN of transparent pixels (which you just skip) and then a RUN of pixels you will blit.

  • run-length encoding and packbits algorithm

Coding same color ranges into one (color, number of pixels) pair is named RLE. It's the first useful compression one can use, it's fast and straightforward. It's also good for bitbox, since it's coherent with line-based blitting. It's also very fast because you just have one read and N writes for 2*N pixels if you're blitting 16bpp.

We can also group single pixels, since you generally have many similar pixels then a bunch of single pixels. In that case you can avoid transmitting (1,color) (1,color2) (1,color3).

  • back references

Another way to improve compression for "free" (on decompression side) is to avoid coding twice the same run of pixels. If you can retrieve a past group of pixels already encoded, it's not very expensive to blit from this address instead of the current address. Of course, this is only interesting if you're going to encode less than the size you'd need for encoding the address itself. This is in fact how deflate, gzip and other LZ77 encoders work; but with a modification : in lz77 you can reference the past blit (ie data already extracted). Here we're blitting sprites but once a line has been written, another object can have written over it for the next line, so you can't reference past blitted lines. You have to reference the source data, worsening a bit you ratio. You could reference the same line but generally the interesting parts - non handled by RLE- are in previous lines.

  • explicit EOL encoding
Adding a bit for explicit end of line uses a bit that could be used for length encoding, but it allows avoiding a byte or more to add the blank space. This is generally a net gain, because blits are generally smaller than 32 (sprites) or more than 64 (large background images)

  • extensible size encoding.
Sometimes, and in the majority of sprites by example, you have to encode blits with no more than 32 pixels wide, so having more than 5 bits by example would be wasteful.
But then you have to cut large blits into smaller ones, which can be slower to blit and decode the header, and bigger at the same time.
What's used here is a variable length encoding similar to LZ4 :you use 5 bits for the base and then if the len you want to encode is more than that, you add a byte of data (and another while you don't have to).

By example,
15 is encoded as 15,

30 as 30,

200 as 31 + another byte with 200-31 = 169
31 as 31  + another zero byte yes it's a useless zero byte but that's rare.


All of this leads to a 'blits' encoding, where we store blits with a header and some data.
Header is :
    2 bits for the type of blit,
    1 bit for end of line
    5 bits for length

    + N bytes extra for length as needed

Data is :
    blit "Skip" : nothing
    blit "Fill" : color to blit with
    blit "data" : "length" pixels encoding
    blit "back reference"  : 2 bytes as back reference in bytes since the current bytes. This sliding window allows for more than 64k resources

For encoding pixels, you can of course use raw u16 pixels which are quite big.
  • pixel palette can be slow but dense, but there is a better way.

In order to reduce the storage of individual pixels, you can use less colors and thus less bits per pixel. using 256 colors you can reduce your storage by two comarped to raw 15bpp, when using 16 colors you can reduce by four(but this will have an impact on what you can provide). A global 8bpp encoding is available for either the bitbox micro or the a global 8bpp palette reduction.

Pixel palette is not a panacea however, since it needs much longer to blit. While we can memcpy (setting aside alignment operations) one word at a time (i.e. one u32 word read and one write per couple of pixels), now need one read for 4 u8 pixels, but four reads (one of each palette lookup), adding to the contention of memory buses. What can be done is making a table for multiple pixels lookup but for 256 colors you'd need 65536 entries. Better to limit that number.

What's interesting is that generally you don't use the whole number of different couples for a given set of colors. Not every couple is used, and it's generally much better to limit oneself to 256 couples than 16 individual colors. And in that case, we have 8 pixels for 32 bits of data + one read + one write for 2 pixels, which is very close to the raw pixel encoding speed.

Couples also adds the possibility to allow run-length encoding of dithered patterns which in pixel art and limited color can be useful.

To encode such a picture, I made a small encoder which can extract common couples like you do with the same algorithms for color quantization, except it's now couples quantization.

Frame encoding :
  • multi-frame format & skiplist

Our format should be used to encode not a single image, but multiple frames animations.
First, it's generally simpler to manage one file per animation than one file per image.
Then you generally produce the content that way.
And then an animation generally contains many slightly different frames but not entirely different frames and colors, so you gain in not having to repeat the palette and the blits.

We do this by specifying a frame size and putting all the frames vertically.

What you need is then an index of each frame start since each frame is not the same size, in order to be able to skip directly to this and not having to unroll the whole sprite from frame zero. So we add a table with skip frame indices. We use a simple u32 list to allow big files, and direct access - we could use a delta encoded u16 since no frame is larger than 64k generally but the next paragraph will explain why - it's not very heavy generally.
  • frame deduplication
Often, an animation can repeat the same frames. for example, you can have an A-B-C-B-A-B... type animation (ping pong) which can be a looping ABCB animation. Why repeat the frame B when you just have to replace the fourth entry of the frame lookup table by the same pointer as the second ?  Therefore the encoder can detect similar frames.
Also, your frame step can be the same length and you can repeat a want a frame to be longer.
That means that frames are no longer unique or the frame index monotonous and cannot be delta-encoded efficiently.