Monday, October 21, 2013

An efficient scanline blitting algorithm

For the bitbox kernel, we need to have a fast graphical objects blitting library, since we will often have the same needs. Blitting from low Z to high Z of a fixed tiled background + overdrawing small sprites (to keep the 5k cycles budget) only goes so far, because as we are overdrawing pixels, we will be hitting the 5k cycles / lines pretty soon.

We will sum up the needs of such a library and propose an algorithm.

Tuesday, October 8, 2013

Tutorial : blitting graphics

(this is the second part of the tutorial about making a Fire game for the bitbox)

If you need basic information about writing software for the bitbox console, please read this introductory tutorial 

Saturday, October 5, 2013

Tutorial : Creating a small game from scratch : game logic

Hello all !

In order to let people know about the process of making a game for bitbox (or a toy 2D Game in general, this post will have very few specifics to bitbox, next ones will be more involved) , I'll blog a few times to show you that in detail.

The game will consist in a bunch of people jumping out of a building on fire, and a team of two firemen who will make them bounce to the ambulance. It'll be programmed in C99, without any complex techniques.

Sounds familiar ?

Wednesday, October 2, 2013

bitbox mailing list

A new mailing list has been created to discuss about bitbox issues, new revisions, games, ...
Here is the link on google groups :!forum/bitbox-console

Wednesday, September 25, 2013

bitbox updates

The Jump game has made progress : coins to collect (and lure you into places where you shouldn't go >:-D ), sounds (not very happy with some of them but it's working) and extra score handling. Game begins to be interesting. Now, level design !

I've also found that the USB port I've used is not the one that is used by the bootloader (duh). So, no upload by dfu for this one unless some fine soldering is done ...

I started to think about bitbox rev2 : getting rid of the bulky minidin to replace it with USB OTG (I've still to figure out if a kernel can work with it sufficiently reliably), getting stereo sound, 15bits colors(?), a proper GPU.

OTG would be huge for using normal keyboards, mice, controllers, (maybe even MIDI) but also less standard controllers.

I'm also preparing a tutorial on making a simple game and, of course thinking about many games that could be done ...

Saturday, September 21, 2013

Bitbox repository available.

Just a quick note to say that a Bitbox github repository has been created. Not much in it now, apart from the SDL emulator (very simple), makefiles, preliminary kernel & test kernel demo. Also some articles as wiki pages. Github repo is in the links on the blog front page.

Friday, September 20, 2013

Tutorial : developing for Bitbox (the basics)

We're going over the course of this tutorial to program and build a simple program for the bitbox console on ubuntu. I'm trying it on a 13.04 Ubuntu, so adapt accordingly.

For windows, there is a small guide on the blog, you can use is in parallel with this page.

Tuesday, September 17, 2013

First bitbox simple game preview : Jump ! (video)

I programmed a simple, very bare tile/sprite engine+game to test everything : JUMP !

Screenshots from the emulator

The game is a very simple vertical scrolling platformer where you only control left/right and die if you touch the base of the screen. Different platforms types make you jump higher, ... and there are other types :)

Debugging using gdb on the board itself is possible through SWD (the kernel giving you some perf data stats), but for game devel, I also developed a simple SDL-based emulator of the kernel for on-computer tests. Of course it's certainly not accurate but it can be useful for graphics and gameplay-related tuning.

Now, the game is playable - although still very rough - level needs a bit of work, there is no sound (edit : yes there is now !) , imperfect collisions, need to add some bonus, maybe particles, ...- but stays enjoyable nonetheless.

 Sample screenshots from the game, and a video (no sound).

bitbox first game prototype ! from mak apuf on Vimeo.

Bitbox news

I finally advanced a little bit on my sample game, while reworking the kernel for better image & needing to rewire a bit of the prototype (hence, the prototype qualification).

The kernel upgrade involved getting hardware sync (as did the new waha demo, jupiter and beyond - credit where it is due), which is much more reliable timing-wise that software, because even a few instructions from flash can behave very differently depending on the concurrent memory access : since the bus is used both by the DMA to transfer the current line and the CPU to prepare the next one, the availability of cached flash data instructions can vary quite a lot, and with 7 cycles latency from flash at the used speeds, it means that one cache miss implies one pixel off !

Of course, the pins I was using couldn't be driven directly by hardware, however I found a combination of Timers, DMA channels, pins ... (I'll try to explain the kernel in a future blog post) that implied only one pin change, h-sync.

So I went away and soldered a tiny wire directly on the new pin of the

LQFP64 chip to the vsync VGA output (I'm quite proud to have managed that with my firestick iron), and cut the original pcb trace.

Having done that, the VGA is now much more stable, and can achieve 640x480x60fps @ 4096 colors now without distortion.

Again, the controller is a snes compatible gamepad with the replaced with a ps2 mini din plug (useful to connect a keyboard).

The kernel exposes the following main elements for the graphics (as a C library) :
  • a game_init call back : initialize everything on your game
  • a game_frame callback : do what is needed every frame, like getting user input, moving sprites x/y/frame , .... , using the global integer "frame".
  • the gamepad is simply a global variable "gamepad1", read as a uint16 with 1 bit for each button.
  • a draw_buffer of 640 uint16 pixels and a "line" int to know which line it is (from 0 to 479)
  • a game_line callback : in this callback, you should draw each line (every frame) by blitting your screen line. So given a line number (from 0 to 479 and a ptr to a bunch of 16bit unsigned data, you have ~5k cycles to blit your graphics.

You can blit however you want, do what you want in 5k cycles per line : from aligned memsets (2 pixels at a time, word by word transfers, very quick) to full antialiased, rotozoomed, alpha blurred sprites (very expensive - tiny sprites !) : you do it & tune it (that's part of the fun) !

Of course, building a library / engine that you can reuse & tune from game to game is useful, but sometimes, the ad-hoc just blit it (tm) engine is simpler !

I then made a sample game, which I'll talk about in a next entry.

Wednesday, September 11, 2013

Bitbox early prototype schematics

By popular request (or not), I'm posting this schematics of the console, as CC by-sa license.

There is already quite a list of errata, so expect a rev2 of the board corresponding better to what I have now, after having pcb traces cut, Boot0/1 resistors added (still learning !),  tiny wires soldered directly on lqfp pins ... But, that is the basis I have been working on. I will make another, better revision for sure (yeah, right, promises).

Monday, April 8, 2013

the bitbox prototype

Here ya go !

Based on my prototype work, the first bitbox "console" is (almost) born !

The idea is to have a small (5x5 cm), cheap console bitbanging VGA from a stm32f4.

This is rev 0.1 of the board, which means that it's the first board and there ARE a few hardware bugs, bad component layouts, missed opportunities ... some of them fixed manually with sharp eyes and small wires !

Anyway I got a card fabbed and hand soldered (with a simple , non temperature-controlled 10€ soldering iron). BOM is quite cheap with the most expensive (and quite only active component) part, the STM32F405 costing ~10€

It ... blinks !

Available hardware on the board (SMT based) :

- STM32F405RGT6 64pins, 1MB Flash, 192kRAM
- VGA out + 12bits DAC (resistor-based) = 4096 colors
- output resolution is not set (depend on driving software, current driver is 640x480 pixels!)
- card reader (microSD, topleft on board, no code for it, not soldered on my test board),
- top is a mini-din (aka ps2) connector to connect either a keyboard (not tested) or a SNES controller (tested) ! (snes connectors are big, difficult to find, incompatible with anything. Easier to rip the male connector and replace it with mini din!)
- mini usb on the right (used for power as well as possible connexion / DFU firmware loading from USB (to be tested))
- mini jack out with lowpass filtering from stm32 DAC (mono, to be tested)
- SWD connector (debug ! / ISP programming)
- power LED+  extra LED on a GPIO pin (debug, tests)
- serial ports output just in case.

On the software side, I currently have a test program for the game engine+driver (see video) and an ongoing game (shoot'em up !)?. The difficulty being that to cram a whole, pretty game, you need to be clever and decompressing and compositing video lines on the fly, racing the beam is tricky even on that kind of beefy MCU : you can't just store data in flash or decompress PNG to RAM !

The VGA part is shamelessly based on waha first demo, with some extras : no frame buffer (not enough mem!), using FIFO DMA, instead 2 alternating line buffers generated on the fly by blitting routines (so a minimal memory allowing bigger resolutions), plus a sprite/tiles generator (exporting tiles from png, midi, wav and tmx files from Qt tiled editor). Sound part is barely started.

Currently blitting routines are tile8 and 16 based, RLE sprites, opaque or transparent (more on this later) but it's a library, you'll be able to use whatever you want, of course.

I plan on getting a micro sampler + midi file converter also.

Many things don't work for now or are not tested as I can't devote as much time on it as I would like but things are progressing anyway !

more posts later

Monday, February 18, 2013

protyping VGA out

The basics of the VGA generation board are set now, so I’ll present my different protypes.

First, I tried a 8 bit approach, as I’ve seen a demo on the internets using one of those, and I wanted to try something that I knew worked. Look at it, it’s fine ; it’s called peridiummm, running 320x200, double buffered.

So I soldered a R2R 8 bit DAC to my STM32F4 discovery board (cheap and powerful .. so nice).

Finally got around setting a proper ARM toolset under linux, and compiled away : everything worked right away (yeah, right. Well, after a while pulling my hairs, blasting my ears plugging an earphone to a sound output with no volume control and finally getting something compiled …  whoah It’s … alive !!)

The STM32F4DISCOVERY board. Cheap, nice, plenty of extras (accelerometer, I2S sound, DAC..pick any three. Alas, no Ethernet - we're spoiled !)

Top-of-the-line homemade radeon with 12 SMT resistors

Then, fast forward a few moments after with a nicer 12 bits VGA “graphic card” using 1% SMD resistors,  inverting bits which from a nice RRRRGGGGBBBB bit layout set me to a not so nice layout of RRRR(descending)GGGG(descending)B0B0B0B0 - this is NOT a nice and natural layout but that's how it went ! Inconvenient is that computing R,V,B values won't be pretty ... well, keep in mind that

  1. it’s a prototype
  2. it’s software fixable since all my data is converted offline (although computing pixels for effects will be HARD like this)
SO, I got a ~600-er x 480 12bit colors ! Woot !
Other resolutions were around 700 pixels wide ; nice but less clocks per pixel available to put that image to screen.

But …. eh, what’s that ? At every try, output was a little off, colors not so nice, and vertical lines were not that perfect...What was wrong as that the lines were not shut off at the end, so the screen was trying to compensate to get blacks ... so I overclocked the thing a bit (around 10%) to be able to have a nice 640x480 resolution - outputting a nice 25MHz pixel clock, and I fixed the end of lines so that the color bars ended to black at the end of the display, and …

tada ! it worked, the screen didn’t try to fix the colors / fight the sync by setting them completely off…cool, nice colors & all !

So here I am, having a nice 640x480 12 console. … more a single frame photo frame for now, but let's continue it in a next installment..

more on storing pixels

I recently thought about an extra possibility to reduce storage of tiles.

Since we directly store many strings in flash, and that we will store pointers to those strings, I will try to use "string tiling" (which is of course the same word as tiling but with a totally different meaning).

String tiling consists of detecting how to store efficiently N strings given pointers by taking advantage of overlaps.

By example, if you want to store AAAABBBBCCCC and BBBBCCCCDDDD and AABBB, you might as well store AAAABBBBCCCCDDDD only + pointers/length pairs (0,12), (4,12) and (2,4)

My current simple implementation detects for a string when a string begins by another (only partly match is needed, but only at beginning/end of strings respectively), or a string is included withing another (string must be completely included but place is free).

This leads to reduction of up to 17% with real life data, not bad for a free decompression !

Monday, January 14, 2013

storing pixels

So as output is in theory solved by having the DMA send a lines of 16bits halfwords, we need to focus on the storage and memory.
As with horizontal resolution, different choices can lead to different modes, but I’ll show some generic principles, you can adapt and create your own mode. Or reuse one.
There are several general ways to store pixels in order to output them.

The first, direct approach is a frame buffer.

A frame buffer is a chunk of memory storing pixels exactly as they will be output, so that you output to the screen what you’ve got in memory. Sometimes, things can be a little messier with color planes (i.e. one framebuffer per Rn G and B) and banks.

What resolution can we use if we want to use it with the current memory size ?

We can, of course use Flash for static images. But then, only using static images is quite restrictive for a game console.

Let’s pretend use 2 bytes per pixel. We have 192kb of memory on the stm32f4, but we can only use 128 as the other 64k is core coupled memory and won’t be seen by the DMA. So we can store 64k pixels (half that for 16 bits).

For a 4/3 aspect ratio, that’s sqrt(3/4*64*1024) =  221 lines.

We thus could store one screen of  294x221 pixels, or two screens of (only!) 221x147 if we want double buffering !

Which begins to enter the domain of lame for a 32bits game console.

For a better resolution of 640x480 @ 12bpp, we would need 640x480x2 = 614 400 bytes , which is about 5 times the RAM and more than half the Flash size.

For this, we need to compress the image we want in RAM and decompress it at 25 MHz pixel clock. 

So we will need some translation functions that prepares a line of pixels for the current line from the frame buffer while the DMA outputs a line buffer. and then exchange those front and back buffers, exactly timed at 31 000 times per second.

Needless to say we won’t use jpeg.

Note that we need this compressed data to be manipulated by our game, so PNG and the like (really LZ77/LZW in memory pixel storage) will be too CPU intensive also, as well as impractical to manipulate for non static images.

First we can use less bits per pixel by using indexed colors. By example, we could use a 256 colors palette using 1 byte per pixel giving nice colors and practical output. See the article about it on wikipedia, it’s well explained and has nice parrot pictures. Arrr !
The display function will translate the pixel color ids to a table of pixel colors and store it in the buffer. Generally it’s done by hardware by means of a RAMDAC (in : pixel ids, out : VGA signal, inside : some RAM for the palette / a DAC, hence, a ramdac. See wikipedia article.) but those chips are becoming hard to find/expensive and that’s an additional chip and that would force us using indexed color, so no).

Is it feasible by software ?

Let’s calculate how much time lines per seconds we could output.
Let’s consider a naive pixel / byte algorithm, not using 32bit word-aware method.
  • Excluding loops housekeeping, that’s around 6 clocks per pixel IIUC the ARM reference latency roughly
    • 1 read per pixel for the pixelID, 
    • 1 read to read the palette color from memory 
    • 1 store per pixel to the buffer. 
So the STM32 @ 168M can output 168e6/(6*640) = 43750 lines/sec. That’ more than 31kHz horizontal refresh rate so that’s possible (note that we include hblank periods) !

(taking 70% of the CPU power - not counting V blank periods where we’re not outputting video, compared to None when we’re using a framebuffer. That’s a serious memory/cpu tradeoff, but if we can do better, having 50% of a 168MHz CPU isn’t so bad after all).

Note that if we can use 16 colors palette, that’s half the RAM also (4bits pp) or in 4 colors that’s 2bpp (with 1/4 less pixel reads and keeping the whole palette in registers so no palette reads … that with combined word writes can make it much faster).

Note also that with a palette you can manipulate the palette individually from the pixel data, so you can do fadeouts quite easily by switching palettes (not for free but because you’re already doing the translation work).

But that is quite expensive, and another method will be used first.

Tiled backgrounds
Another technique, often used for backgrounds and very similar to text mode goes further : instead of having a palette of pixels, lets have a palette of sub-images, composed to make a bigger image : one for a tree top, one for a bottom tree, one for grass : repeat many times and you have a big forest with 3 small images + a map.

It’s similar to text modes in that instead of doing it with letters (buffer of characters on screen + small bitmaps representing letters), you do it with color images (which can be letters).

Nice editors exist for tiled data, and we will use one to compose our images. 

Storing such an image need storing tiles +  a tilemap referencing your elements. The bigger the tile, the less bits you need to store the tilemap, the more you need to store the tiles. Note that tiles can be stored with a palette also.

Many other choices can be made, and combining them is possible, but we have few cycles to spare for now, so let's consider only tiles for now.

bitbox VGA generation

While the preceding post was about generic video generation, this post will specify what is used by bitBox console for Video Generation.

First, the DAC : it will be a simple DAC made of resistors. A R2R ladder could be used, it can be nice to only have few values of resistors when manufacturing. Well, that’s nice but for now we’ll using less resistors since we will manufacture by hand (duh) so a resistor DAC will be used. I first tried a 8bit RRRGGGBB (as 8 bit).

That’s what the uzebox (The 8bit homebrew console, it’s great and has been a great inspiration) used with a 8 bit microcontroller, but here we have the capacity (cpu and momory wise) to do a little more.

How much colors should we be able to display ?

It’s a question of balance : more bits in the DAC looks better, but more bits mean more CPU to build the signal and memory to store the nice graphics, as well as a bigger RAM / Flash to store the graphics and more hardware complexity.

I finally settled for 4096 colors, which is 4-4-4 = 12 bits + 4 unused bits on a 16 bits output bus. The use of a palette will be defined by the software, so let’s not talk about that now.

15 bits could also have been done, but I think 12bits will provide nice colors anyway. The games won’t be photorealistic, so vivid colors is aimed at, not realistic.

Then, how many pixels should we be able to output ? That’s a software thing !  Nothing in hardware sets the number of pixels, as vertically it’s how often we fire the h-sync, and horizontally is how fast we make the pixel vary.

Let's try defining a first video mode (all by software).

We should try to build on a standard VGA timings, which might be easier for VGA screens to sync on because it’s a standard resolution, as well as being compatible with many screens.

The universal resolution is 640x480, 60Hz, which is a resolution supported by quasi everything (even HDMI supports it - but of course we are not generating hdmi with a few resistors).

Note, however, that this will be the resolution the screen thinks it gets. By example, there is no difference between varying the pixel levels twice slower and having horizontally twice larger pixels : it’s the same thing.
As well, if you’re outputting the same line twice, it will effectively provide half the resolution. That will provide you by example 320x240 @ 12 bits if you vary the pixel clock for 240 pixels.

You can also "forget" to send anything for 20 lines after and 20 lines before your signal, so you’ll have black lines and 320x200. Which has the nice property of needing a 64k frame buffer if we use 1 byte per pixel. 128k for double buffering… but more on that later.

Extra reading on that subject :

Outputting pixels

The next thing to consider is how to store pixels in memory and how to output them.

Outputting can be done by bitbanging, ie writing them clocked by the instruction clock of the processor.

The problem is that we won’t have much time left to do anything else, and while the main CPU is perfectly good at outputting bytes or halfwords, it really is much more powerful so all those cycles could be spent doing more useful things such as adding 4 bytes in parallel or running nice effects. It would be nice if we had a small bit of silicon on the MCU able to move data from memory to a peripheral (here GPIOs).

As a matter of fact, we do! It’s called a DMA for direct memory access.  The stm32f4 has two of them.

The only thing we need to do is :
  • generate with a clock-based interrupt line hsyncs at 31khz (see VGA generation posts and VGA timings references)
  • for some of those lines, generate vsync 
  • for the actual lines, 
    • point the DMA to a part of memory, tell it the pace / width of output, 
    • let it run in the background
    • fill another place of memory with the next line of pixels (or the whole screen) 
    • return from the interrupt ASAP letting the processor to interesting things in the foreground.
  • In foreground, process user input, calculating the next frame or decompressing a nice purple tentacle from a PNG to RAM, ...
The next post will focus on storing & generating pixels.

Friday, January 11, 2013

VGA software generation

The VGA software generation from a chip is quite simple as well as quite tricky to achieve.
Simply said, to output a vga signal, you should think of it as a Cathod Ray tube, scanning from top left to bottom right in lines, and being shut during getting back to left or back up to first position, as a Z pattern (let’s thing progressive scan here).

Then, to output a VGA signal, you need to generate three varying red, green, and blue signals (as 0-0.7 volts, 0 meaning black and 0.7 full color), as well as H sync (to tell the tube to go left) and V sync (to tell the tube to go to top right info)

Nice tutorials are available, so instead of copying and paraphrasing them here, I’ll just link to them. Great links for VGA and Video signal generation are :
- and finally a GREAT tutorial for video generation :
- A search engine using “VGA signal timings” terms by example
Composite is a little trickier with separate luma+chroma
The principle is very simple, what can be tricky is having the timing perfectly done (or not too badly done) because you’re trying to generate three 20MHz signals on a microcontroller … as well as (hopefully) running a simple game !

Hardware considerations

So the idea is to deliver a simple, cheap, hardware base, home-reproducible, and versatile to hack.
Video signals and sound generation and processing will be software-generated, so the exact characteristics (screen resolution, tile-based engine, frame buffer or even 3D raster, number of sound voices) will be defined by kernel software and will evolve as the hardware is pushed by the software.

Kernels are just drivers set to allow simpler game development by abstracting lower level VGA generation (graphics signal generation) in libs.
The aim is to be simple and cheap, while getting up to date hardware (not in the of powerful meaning - that’s not the point, but easy to find and cheap).

The main CPU will be the STM32F4 from STmicro, which is a quite powerful platform to build on.

Running at 168 MHz, 192 kB RAM and 1MB Flash memory, fast DMAs and 32 bit thumb2 cortex M4F instruction set with simd and float instructions, this little beast seem to have what it takes to bring us to the world of homemade snes (not ) ! It’s about 10-15$ also - even if the whole platform will be more expensive, (whole car vs engine).

meet the bitbox console

Hi, this is a personal blog aimed at relating my adventures in developing a simple DIY console, based on ARM chip. The base of it will be a single chip, the STM32F4 from STMicroelectonics.
The minimal hardware design will hopefully allow for hackability, as quasi everything will be based on this chip +software rendering of the video signal.
More on this later !