User Tools

Site Tools


wswan:guide:optimization

Optimizing programs

This page serves as a loose list of advice for getting the most out of the WonderSwan.

Optimizing C code

Optimizing for code speed

To optimize for speed, compile your code with -O2.

Optimizing for code size

To optimize for size, compile your code with -Os.

Optimizing for memory usage

  • For data stored in RAM, use the smallest type possible.
    • Exception: For argument passing, there is little reason to prefer char over int - the stack is always aligned to 2 bytes.
  • By default, GCC allows function call arguments to accumulate on the stack, then pops them all at once. To reduce peak stack usage at the cost of a larger and slightly slower program, compile your code with -fno-defer-pop.

Optimizing assembly code

Optimizing for speed

While the V30MZ is an 80186-compatible CPU, its instruction timings differ wildly from common expectations and are more reflective of its 1990s-era design:

  • MUL is very fast on this CPU, taking 3-4 cycles. As a result, “shift plus add” ladders will almost always be slower or, at best, equal in performance.
  • Using XCHG over PUSH/POP (and XCHG AX, reg over MOV AX, reg) is a popular pattern on the 8088/8086 due to the speed benefit. However, on the V30MZ, that is not the case:
    • XCHG on V30MZ always takes 3 cycles.
    • PUSH and POP take 1 cycle each for general registers. This means that PUSH/POP will be one cycle faster.
    • While MOV AX, reg is one byte larger than XCHG AX, reg, it also only takes 1 cycle.
  • XLAT takes 5 cycles. Many simple CPU operations (such as SHR reg, 4 or MUL reg - both taking 3 cycles) can actually be faster.

You can study the instruction timings in detail on the WSdev wiki.

There are also some additional tricks you can take advantage of:

  • Avoid far calls between functions - branches are expensive, and far branches are significantly more expensive. If you're calling a far function from another far function in the same section, use the IA16_CALL_LOCAL macro over a far call to save a few cycles.
  • Try word-aligning loop labels by prepending them with .align 2, 0x90 - this generates a NOP opcode if necessary. This may help a little.
wswan/guide/optimization.txt · Last modified: by asie