wswan:guide:optimization
This is an old revision of the document!
Table of Contents
Optimizing programs
This page serves as a loose list of advice for getting the most out of the WonderSwan.
Optimizing C code
Optimizing for code speed
To optimize for speed, compile your code with -O2.
Optimizing for code size
To optimize for size, compile your code with -Os.
Optimizing for memory usage
- For data stored in RAM, use the smallest type possible.
- Exception: For argument passing, there is little reason to prefer
charoverint- the stack is always aligned to 2 bytes.
- By default, GCC allows function call arguments to accumulate on the stack, then pops them all at once. To reduce stack usage at the cost of a larger and slightly slower program, compile your code with
fno-defer-pop.
Optimizing assembly code
Optimizing for speed
While the V30MZ is an 80186-compatible CPU, its instruction timings differ wildly from common expectations and are more reflective of its 1990s-era design:
MULis very fast on this CPU, taking 3-4 cycles. As a result, “shift plus add” ladders will almost always be slower or, at best, equal in performance.- Using
XCHGoverPUSH/POP(andXCHG AX, regoverMOV AX, reg) is a popular pattern on the 8088/8086 due to the speed benefit. However, on the V30MZ, that is not the case:XCHGon V30MZ always takes 3 cycles.PUSHandPOPtake 1 cycle each for general registers. This means thatPUSH/POPwill be one cycle faster.- While
MOV AX, regis one byte larger thanXCHG AX, reg, it also only takes 1 cycle.
XLATtakes 5 cycles. Many simple CPU operations (such asSHR reg, 4orMUL reg- both taking 3 cycles) can actually be faster.
You can study the instruction timings in detail on the WSdev wiki.
There are also some additional tricks you can take advantage of:
- Avoid far calls between functions - branches are expensive, and far branches are significantly more expensive. If you're calling a far function from another far function in the same section, use
IA16_CALL_LOCALoverIA16_CALLto save a few cycles. - Try word-aligning loop labels by prepending them with
.align 2, 0x90- this generates a NOP opcode if necessary. This may help a little.
wswan/guide/optimization.1767185366.txt.gz · Last modified: by asie
