This page serves as a loose list of advice for getting the most out of the WonderSwan.
To optimize for speed, compile your code with -O2.
To optimize for size, compile your code with -Os.
char over int - the stack is always aligned to 2 bytes.-fno-defer-pop.While the V30MZ is an 80186-compatible CPU, its instruction timings differ wildly from common expectations and are more reflective of its 1990s-era design:
MUL is very fast on this CPU, taking 3-4 cycles. As a result, “shift plus add” ladders will almost always be slower or, at best, equal in performance.XCHG over PUSH/POP (and XCHG AX, reg over MOV AX, reg) is a popular pattern on the 8088/8086 due to the speed benefit. However, on the V30MZ, that is not the case:XCHG on V30MZ always takes 3 cycles.PUSH and POP take 1 cycle each for general registers. This means that PUSH/POP will be one cycle faster.MOV AX, reg is one byte larger than XCHG AX, reg, it also only takes 1 cycle.XLAT takes 5 cycles. Many simple CPU operations (such as SHR reg, 4 or MUL reg - both taking 3 cycles) can actually be faster.You can study the instruction timings in detail on the WSdev wiki.
There are also some additional tricks you can take advantage of:
IA16_CALL_LOCAL macro over a far call to save a few cycles..align 2, 0x90 - this generates a NOP opcode if necessary. This may help a little.
Wonderful Toolchain comes with a tool for observing the RAM/ROM allocation and per-symbol sizes: wf-wswantool usage build/your_program.elf. It is also provided in the default Makefile as make usage.
The best option is to use Mesen 2's profiler. While Mesen 2 is not 100% cycle-accurate for the WonderSwan yet, it's close enough for non-demoscene use cases.
TODO: Document how to use it.