Monday, January 14, 2013

storing pixels

So as output is in theory solved by having the DMA send a lines of 16bits halfwords, we need to focus on the storage and memory.
As with horizontal resolution, different choices can lead to different modes, but I’ll show some generic principles, you can adapt and create your own mode. Or reuse one.
There are several general ways to store pixels in order to output them.


The first, direct approach is a frame buffer.

A frame buffer is a chunk of memory storing pixels exactly as they will be output, so that you output to the screen what you’ve got in memory. Sometimes, things can be a little messier with color planes (i.e. one framebuffer per Rn G and B) and banks.

What resolution can we use if we want to use it with the current memory size ?

We can, of course use Flash for static images. But then, only using static images is quite restrictive for a game console.

Let’s pretend use 2 bytes per pixel. We have 192kb of memory on the stm32f4, but we can only use 128 as the other 64k is core coupled memory and won’t be seen by the DMA. So we can store 64k pixels (half that for 16 bits).

For a 4/3 aspect ratio, that’s sqrt(3/4*64*1024) =  221 lines.

We thus could store one screen of  294x221 pixels, or two screens of (only!) 221x147 if we want double buffering !

Which begins to enter the domain of lame for a 32bits game console.

For a better resolution of 640x480 @ 12bpp, we would need 640x480x2 = 614 400 bytes , which is about 5 times the RAM and more than half the Flash size.

For this, we need to compress the image we want in RAM and decompress it at 25 MHz pixel clock. 

So we will need some translation functions that prepares a line of pixels for the current line from the frame buffer while the DMA outputs a line buffer. and then exchange those front and back buffers, exactly timed at 31 000 times per second.

Needless to say we won’t use jpeg.

Note that we need this compressed data to be manipulated by our game, so PNG and the like (really LZ77/LZW in memory pixel storage) will be too CPU intensive also, as well as impractical to manipulate for non static images.


First we can use less bits per pixel by using indexed colors. By example, we could use a 256 colors palette using 1 byte per pixel giving nice colors and practical output. See the article about it on wikipedia, it’s well explained and has nice parrot pictures. Arrr !
The display function will translate the pixel color ids to a table of pixel colors and store it in the buffer. Generally it’s done by hardware by means of a RAMDAC (in : pixel ids, out : VGA signal, inside : some RAM for the palette / a DAC, hence, a ramdac. See wikipedia article.) but those chips are becoming hard to find/expensive and that’s an additional chip and that would force us using indexed color, so no).


Is it feasible by software ?

Let’s calculate how much time lines per seconds we could output.
Let’s consider a naive pixel / byte algorithm, not using 32bit word-aware method.
  • Excluding loops housekeeping, that’s around 6 clocks per pixel IIUC the ARM reference latency roughly
    • 1 read per pixel for the pixelID, 
    • 1 read to read the palette color from memory 
    • 1 store per pixel to the buffer. 
So the STM32 @ 168M can output 168e6/(6*640) = 43750 lines/sec. That’ more than 31kHz horizontal refresh rate so that’s possible (note that we include hblank periods) !

(taking 70% of the CPU power - not counting V blank periods where we’re not outputting video, compared to None when we’re using a framebuffer. That’s a serious memory/cpu tradeoff, but if we can do better, having 50% of a 168MHz CPU isn’t so bad after all).

Note that if we can use 16 colors palette, that’s half the RAM also (4bits pp) or in 4 colors that’s 2bpp (with 1/4 less pixel reads and keeping the whole palette in registers so no palette reads … that with combined word writes can make it much faster).

Note also that with a palette you can manipulate the palette individually from the pixel data, so you can do fadeouts quite easily by switching palettes (not for free but because you’re already doing the translation work).

But that is quite expensive, and another method will be used first.


Tiled backgrounds
Another technique, often used for backgrounds and very similar to text mode goes further : instead of having a palette of pixels, lets have a palette of sub-images, composed to make a bigger image : one for a tree top, one for a bottom tree, one for grass : repeat many times and you have a big forest with 3 small images + a map.

It’s similar to text modes in that instead of doing it with letters (buffer of characters on screen + small bitmaps representing letters), you do it with color images (which can be letters).

Nice editors exist for tiled data, and we will use one to compose our images. 

Storing such an image need storing tiles +  a tilemap referencing your elements. The bigger the tile, the less bits you need to store the tilemap, the more you need to store the tiles. Note that tiles can be stored with a palette also.

Many other choices can be made, and combining them is possible, but we have few cycles to spare for now, so let's consider only tiles for now.

No comments:

Post a Comment