The problem#

Several years ago, I was in a ROM hacking IRC room where another regular Alchemic was reverse engineering the text system of the SNES game Shadowrun. He figured it out and wrote a python script to decompress the text but had some questions about why it was designed the way it was. So we're going to walk through figuring out how the code works, with some help from his notes, and try to understand the design.

If you don't want spoilers and would rather try to reverse engineer it yourself, just read up to the end of the Trace format section and see how much you can figure out on your own.

Getting started#

The starting point for our reverse engineering will be an instruction-level trace of the text decompression routine which Alchemic provided in the IRC discussion.

To interpret it, we will need documentation on the SNES's processor. That unofficial documentation of the 65xx processor family is credited to Mark Ormston and is a surprisingly entertaining read for a processor manual due to its completeness and informal style that you don't expect for such a document; I actually got distracted just reading it while writing this post.

The specific 65xx variant used in the SNES is the 5A22 which based on the 65c816, so for the purposes of that document, we are looking at the 65c816.

Trace format#

The trace has one line for each instruction executed:

$80/FE49 9C 22 02    STZ $0222  [$7E:0222]   A:C2D5 X:001D Y:0008 P:envmxdIzc
$80/FE4C E2 20       SEP #$20                A:C2D5 X:001D Y:0008 P:envmxdIzc
$80/FE4E A0 00 00    LDY #$0000              A:C2D5 X:001D Y:0008 P:envMxdIzc
$80/FE51 C2 20       REP #$20                A:C2D5 X:001D Y:0000 P:envMxdIZc
$80/FE53 A9 00 00    LDA #$0000              A:C2D5 X:001D Y:0000 P:envmxdIZc
$80/FE56 AA          TAX                     A:0000 X:001D Y:0000 P:envmxdIZc

The first column is the program counter (PC), the location in the program code where the instruction is. After that is the machine code of the instruction in bytes followed by the assembly mnemonic for the instruction. For some of the instructions, there is an address in square brackets ([]) indicating the actual value for any computed addresses. At the end is the values of the registers and flags before the instruction is executed.

This part of the trace uses a feature of the 65c816 processor which is important to be aware of, which is the "Memory size flag" which is listed in the flags as m when clear (memory reads/writes are 16-bit) or M when set (memory reads/writes are 8-bit). SEP #$20 sets the flag putting the processor into 8-bit mode, and REP #$20 clears it, putting the processor into 16-bit mode.

Since this is a trace of what we expect to be a relatively simple and repetitive piece of code, the same instructions should be executed over and over again, which we will know by the PC in the first column being the same. We can search for other instances of the same instruction by searching for its PC, which is particularly useful for finding instances of a conditional branch that go the other way to better understand the branch.

Starting our analysis#

That code quoted above just clears all of the registers and the memory location $7E:0222, presumably because it expects to use those for its temporary values and doesn't want to get confused by whatever previous values they had.

Identifying the variables#

Since we don't have variable names, to get a high-level view of what the code is doing, we will first look at the reads from and writes to memory, and then try to figure out how they relate to each other. Those are done with the LDA ("LoaD Accumulator with value") and STA ("STore Accumulator into memory") instructions which transfer data between memory and the accumulator (register A).

The following sed command selects out all of the loads and stores (ignoring loads with arguments that start with # because that just loads a constant value):

$ sed -n '/\(LDA\|STA\)/s/[^SL]*\([SL][TD]A [^\#][^ []*\).*/\1/p' texttrace.txt | sort | uniq
LDA $9D8000,x
LDA [$CF]
STA $0222
STA $0224
STA $21A0,y
STA $CF

The loads#

Syntax#

There are loads from $9D8000,x and [$CF] (and also stores to $CF). The $9D8000,x syntax is the "Absolute Long with X Indexing" addressing mode, which means that it reads from an address in memory computed as a constant 24-bit address plus an offset in the X register, so $9D8000 is some sort of data structure which is indexed by some computed value. The [$CF] syntax is the "Direct Page Indirect Long" addressing mode, which means that an address is read from $CF, which is "Direct Page" addressing referring to memory location $0000CF. In other words, it deferences the pointer stored in temporary location CF.

One of those two has to be the actual data. As the text could be anywhere, and, in particular, there is likely more than 64 kibibytes of it, most likely LDA [$CF] loads the actual text data.

`$9D8000,x`#

Looking further into this, if look just before the LDA $9D8000,x instructions, we notice they occur after a conditional addition of 2 to the X register (by two INX "INcrement X" instructions in a row):

$ grep -F -B3 'LDA $9D8000,x' texttrace.txt 
$80/FE5B 90 02       BCC $02    [$FE5F]      A:0000 X:0000 Y:0000 P:eNvmxdIzC
$80/FE5D E8          INX                     A:0000 X:0000 Y:0000 P:eNvmxdIzC
$80/FE5E E8          INX                     A:0000 X:0001 Y:0000 P:envmxdIzC
$80/FE5F BF 00 80 9D LDA $9D8000,x[$9D:8002] A:0000 X:0002 Y:0000 P:envmxdIzC
--
...
--
$81/AED7 0E 24 02    ASL $0224  [$7E:0224]   A:04EC X:04EC Y:0000 P:envmxdIzC
$81/AEDA 6B          RTL                     A:04EC X:04EC Y:0000 P:envmxdIzc
$80/FE5B 90 02       BCC $02    [$FE5F]      A:04EC X:04EC Y:0000 P:envmxdIzc
$80/FE5F BF 00 80 9D LDA $9D8000,x[$9D:84EC] A:04EC X:04EC Y:0000 P:envmxdIzc
--
...

For now we are not worrying about what the BCC $02 is conditioned on, just that the lookup is a computation like that suggests that it is a lookup table of some sort, not the text data.

`[$CF]`#

Similarly looking around the uses of $CF, we notice that after every LDA [$CF], there's a double INC $CF a couple instructions later, indicating that $CF is pointing at a piece of memory being read linearly, making it a much better candidate for being the compressed text. Additionally, the very first line of the trace is STA $CF. The trace is for the text decompression function and it would make sense for it to take as input in the accumulator the pointer to the text to decompress.

The stores#

We already figured out $CF, but that leaves $0222, $0224 and $21A0,y. The first two are single locations (using the "Absolute" addressing mode), so they are temporary variables used in the computation while the last one is indexed by Y so it is an output to some range of memory.

`$21A0,y`#

Since we expect an output of the decompressed text, most likely $21A0,y is where the decompressed text gets written out to. As evidence, we see the writes are to adjacent increasing indexes:

$ grep -F -A1 'STA $21A0,y' texttrace.txt $80/FE6F 99 A0 21    STA $21A0,y[$7E:21A0]   A:6F59 X:04F8 Y:0000 P:envMxdIzC
$80/FE72 C8          INY                     A:6F59 X:04F8 Y:0000 P:envMxdIzC
--
$80/FE7A 99 A0 21    STA $21A0,y[$7E:21A1]   A:596F X:04F8 Y:0001 P:envMxdIzC
$80/FE7D C8          INY                     A:596F X:04F8 Y:0001 P:envMxdIzC
--
$80/FE6F 99 A0 21    STA $21A0,y[$7E:21A2]   A:2075 X:0374 Y:0002 P:envMxdIzC
$80/FE72 C8          INY                     A:2075 X:0374 Y:0002 P:envMxdIzC
--
...

Note that the M flag is set so this is writing only one byte at time, which is why there is only a single INY ("INcrement Y") instruction each time.

The temporaries `$0222` and `$0224`#

Here I have to admit I simplified a bit above: STA isn't the only instruction that can modify memory:

$ grep -F '$0222' texttrace.txt | cut -d' ' -f8-9 | sort | uniq -c
     40 DEC $0222
      3 STA $0222
      1 STZ $0222
$ grep -F '$0224' texttrace.txt | cut -d' ' -f8-9 | sort | uniq -c
     40 ASL $0224
      3 STA $0224

I included the counts with uniq -c to highlight the fact that most of the mutations of these values are in fact not through STA instructions. $0222 gets decremented (DEC) a lot and $0224 gets bit shifted a lot (ASL is "Arithmetic Shift Left").

The fact that both counts are 40 seems unlikely to be a coincidence. Looking closer we notice that the DEC $0222 is followed by a BPL ("Branch on PLus") instruction and usually branches straight to the ASL $0224 instruction but sometimes does some additional work in the middle:

$81/AEC0 CE 22 02    DEC $0222  [$7E:0222]   A:04AC X:04AC Y:0006 P:envmxdIzC
$81/AEC3 10 12       BPL $12    [$AED7]      A:04AC X:04AC Y:0006 P:eNvmxdIzC
...
$81/AED7 0E 24 02    ASL $0224  [$7E:0224]   A:04AC X:04AC Y:0006 P:envmxdIzC

BPL branches if the n ("Negative") flag is clear. The DEC section in the processor documentation says the effect on the n flag is

n - Was the high bit of the result set?

In other words, n is set if the new value is negative as the high bit is the sign bit. So $0222 is a value that counts the times ASL $0224 has been executed and does some extra work whenever it hits zero.

Setting `$0222`#

To find out what it is counting down from, we look for STA $0222 instructions:

$ grep -F -B1 'STA $0222' texttrace.txt 
$81/AEC0 CE 22 02    DEC $0222  [$7E:0222]   A:0000 X:0000 Y:0000 P:envmxdIZc
$81/AEC3 10 12       BPL $12    [$AED7]      A:0000 X:0000 Y:0000 P:eNvmxdIzc
$81/AEC5 48          PHA                     A:0000 X:0000 Y:0000 P:eNvmxdIzc
$81/AEC6 A9 0F 00    LDA #$000F              A:0000 X:0000 Y:0000 P:eNvmxdIzc
$81/AEC9 8D 22 02    STA $0222  [$7E:0222]   A:000F X:0000 Y:0000 P:envmxdIzc
--
...

It always gets set to the constant #$000F or 15 whenever it goes below zero, so it is counting off 16 shifts, which makes sense for a 16-bit value being stored in $0224 being shifted off one bit at a time.

Setting `$0224`#

So what is that value stored in $0224?

$ grep -F -B2 'STA $0224' texttrace.txt 
$81/AECC A7 CF       LDA [$CF]  [$9D:C2D5]   A:000F X:0000 Y:0000 P:envmxdIzc
$81/AECE EB          XBA                     A:B0F0 X:0000 Y:0000 P:eNvmxdIzc
$81/AECF 8D 24 02    STA $0224  [$7E:0224]   A:F0B0 X:0000 Y:0000 P:eNvmxdIzc
--
...

It's the value loaded from [$CF], which we guessed above is the pointer to the text data. So $0224 contains the current two bytes of compressed text data being decompressed.

That XBA ("XBA eXchange B (upper 8bits of accumulator) and A (lower 8bits of accumulator)") instruction in there is the confusing part which is why I went to analyze this in the first place: to try to understand the code well enough to figure out why the XBA is there as it seems like it would be simpler to just store the data in the ROM such that the XBA instruction is unneeded.

Analyzing the logic#

Now that we have some concept of the what variables do, it should be easier to look at the code in terms of functions.

`JSL $81AEC0`#

After the header code quoted above, the first instruction is a JSL ("Jump Subroutine Long"), which is the 65c816 instruction for calling a function. It pairs with RTL ("ReTurn from subroutine Long") which returns from a function called using JSL.

$ grep -Fc 'JSL $81AEC0' texttrace.txt
40

We see that number 40 again, so we expect this to be where the DEC $0222 and ASL $0224 instructions happen. Let's look through the code line-by-line:

$80/FE57 22 C0 AE 81 JSL $81AEC0[$81:AEC0]   A:0000 X:0000 Y:0000 P:envmxdIZc
$81/AEC0 CE 22 02    DEC $0222  [$7E:0222]   A:0000 X:0000 Y:0000 P:envmxdIZc
$81/AEC3 10 12       BPL $12    [$AED7]      A:0000 X:0000 Y:0000 P:eNvmxdIzc

This enters the function and runs the following code only if the value of $0222 on entry is zero (by decrementing it and then checking if it is negative afterward). If $0222 is positive entering the function, then execution skips ahead to the ASL instruction.

$81/AEC5 48          PHA                     A:0000 X:0000 Y:0000 P:eNvmxdIzc

PHA ("PusH Accumulator") stashes the accumulator on the stack so the accumulator can be used for something else. The value in the accumulator isn't actually used, so in this case it would be okay to clobber it, but the compiler doesn't know that so it stashes it anyway. This is good evidence that this code was generated by a compiler and not hand-written.

$81/AEC6 A9 0F 00    LDA #$000F              A:0000 X:0000 Y:0000 P:eNvmxdIzc
$81/AEC9 8D 22 02    STA $0222  [$7E:0222]   A:000F X:0000 Y:0000 P:envmxdIzc

Set $0222 to 0xf (15 in decimal).

$81/AECC A7 CF       LDA [$CF]  [$9D:C2D5]   A:000F X:0000 Y:0000 P:envmxdIzc
$81/AECE EB          XBA                     A:B0F0 X:0000 Y:0000 P:eNvmxdIzc
$81/AECF 8D 24 02    STA $0224  [$7E:0224]   A:F0B0 X:0000 Y:0000 P:eNvmxdIzc
$81/AED2 E6 CF       INC $CF    [$00:00CF]   A:F0B0 X:0000 Y:0000 P:eNvmxdIzc
$81/AED4 E6 CF       INC $CF    [$00:00CF]   A:F0B0 X:0000 Y:0000 P:eNvmxdIzc

Set $0224 to the big endian two byte value pointed at by $CF (using XBA to switch the endianness) and increment $CF to point to the next two byte value.

$81/AED6 68          PLA                     A:F0B0 X:0000 Y:0000 P:eNvmxdIzc

PLA ("PuLl Accumulator") restores the accumulator from the stack; as mentioned above, this has no effect on the behavior of the program because that value is never used.

$81/AED7 0E 24 02    ASL $0224  [$7E:0224]   A:0000 X:0000 Y:0000 P:envmxdIZc
$81/AEDA 6B          RTL                     A:0000 X:0000 Y:0000 P:eNvmxdIzC

The function ends with the ASL $0224 instruction. The relevant effect of the shift is that the bit shifted off is stored in the c ("carry") flag (and moves the next bit into position for the next run of that instruction).

Summary of `JSL $81AEC0`#

The JSL $81AEC0 function puts the next bit of the string of bits stored at $CF (the pointer to the compressed text in the accumulator when the function is called) into the carry flag.

Use of `JSL $81AEC0`#

Now that we know what that function does, we can skip over it when analyzing the rest of the code. After it returns, the next instruction is BCC ("Branch on Carry Clear") which reads the carry flag that it sets.

$80/FE5B 90 02       BCC $02    [$FE5F]      A:0000 X:0000 Y:0000 P:eNvmxdIzC
$80/FE5D E8          INX                     A:0000 X:0000 Y:0000 P:eNvmxdIzC
$80/FE5E E8          INX                     A:0000 X:0001 Y:0000 P:envmxdIzC
$80/FE5F BF 00 80 9D LDA $9D8000,x[$9D:8002] A:0000 X:0002 Y:0000 P:envmxdIzC

If the carry flag is clear, it skips over the INX instructions, so the logic is if the next bit of the compressed text is set then increment the index into the lookup table $9D8000 of two byte values.

Use of lookup table#

The first use of the loaded value isn't actually reading the value: it's reading the flags. LDA sets the n ("negative") flag if the value loaded is negative, and the next instruction is a BPL which branches conditioned on that flag:

$80/FE63 10 F1       BPL $F1    [$FE56]      A:04F0 X:04EC Y:0000 P:envmxdIzc

If the loaded value is positive (high bit clear), then it's stored in the X register as the new offset for the next read and then it restarts the loop with reading the next bit:

$80/FE56 AA          TAX                     A:04F0 X:04EC Y:0000 P:envmxdIzc
$80/FE57 22 C0 AE 81 JSL $81AEC0[$81:AEC0]   A:04F0 X:04F0 Y:0000 P:envmxdIzc

If the loaded value is negative (high bit set), then a lot more happens:

$80/FE65 29 7F 7F    AND #$7F7F              A:D9EF X:04F8 Y:0000 P:eNvmxdIzc
$80/FE68 EB          XBA                     A:596F X:04F8 Y:0000 P:envmxdIzc

Clean up the loaded value by swaping the bytes and clearing the high bits of both of them.

$80/FE69 E2 20       SEP #$20                A:6F59 X:04F8 Y:0000 P:envmxdIzc
$80/FE6B C9 0A       CMP #$0A                A:6F59 X:04F8 Y:0000 P:envMxdIzc
$80/FE6D F0 11       BEQ $11    [$FE80]      A:6F59 X:04F8 Y:0000 P:envMxdIzC
$80/FE6F 99 A0 21    STA $21A0,y[$7E:21A0]   A:6F59 X:04F8 Y:0000 P:envMxdIzC
$80/FE72 C8          INY                     A:6F59 X:04F8 Y:0000 P:envMxdIzC

Consider just the low byte; if it's not 0x0A (ASCII newline), then append that byte to the string at $21A0 where Y is the current length of that output string.

$80/FE73 EB          XBA                     A:6F59 X:04F8 Y:0001 P:envMxdIzC

Swap to look at the other byte.

$80/FE74 F0 DB       BEQ $DB    [$FE51]      A:596F X:04F8 Y:0001 P:envMxdIzC

If that byte is zero, then skip back to $FE51 which is in the initialization code. Specifically, it sets the lookup index X back to zero and puts the processor back in 16-bit mode before reading the next bit of the input:

$80/FE51 C2 20       REP #$20                A:C2D5 X:001D Y:0000 P:envMxdIZc
$80/FE53 A9 00 00    LDA #$0000              A:C2D5 X:001D Y:0000 P:envmxdIZc
$80/FE56 AA          TAX                     A:0000 X:001D Y:0000 P:envmxdIZc
$80/FE57 22 C0 AE 81 JSL $81AEC0[$81:AEC0]   A:0000 X:0000 Y:0000 P:envmxdIZc

If the byte is not zero, then it's handled the same as the first byte:

$80/FE76 C9 0A       CMP #$0A                A:596F X:04F8 Y:0001 P:envMxdIzC
$80/FE78 F0 06       BEQ $06    [$FE80]      A:596F X:04F8 Y:0001 P:envMxdIzC
$80/FE7A 99 A0 21    STA $21A0,y[$7E:21A1]   A:596F X:04F8 Y:0001 P:envMxdIzC
$80/FE7D C8          INY                     A:596F X:04F8 Y:0001 P:envMxdIzC

Finally it jumps back to $FE51 to continue reading the input with the X register cleared:

$80/FE7E 80 D1       BRA $D1    [$FE51]      A:596F X:04F8 Y:0002 P:envMxdIzC

Summary of use of lookup table#

Entries in the lookup table consist of two 16-bit values. The bit read from the input string is used to choose which of the values to read. The values are either pointers to elsewhere in the lookup table or characters to add to the output string. The second character may be null if there's only one character to decompress. Either character may be 0x0A to mark the end of the string.

The lookup table isn't best characterized as a table; it's a binary tree where X holds the pointer to the current node. The 16-bit values are tree nodes. The high bit is a flag distinguishing inner nodes from leaf nodes. Specifically, it's a Huffman coding tree, which is a standard form of text compression.

Why is there big endian data?#

Now that we know what every instruction in this trace is doing, we can try to figure out if there is some reason why some of the data is in big endian format requiring XBA instructions to fix it because the 65c816 is a little endian processor.

Strangely, while the compressed text and leaf nodes are both in big endian format and need XBA instructions after their loads, the internal pointers in the tree are in little endian format and do not. This suggests that the compression was done offline by some process that generated big endian data and the internal pointers in the tree were generated by the compiler.

As far as I can tell, the XBA instructions just waste time when the data could have been in little endian format in the first place. Which really just brings up another question: how did they manage to accidentally generate big endian data? Looking at the endianness support of different processors, most likely the Huffman encoding was done on a Motorola 68000 which was used as the processor for several desktop computers including the early Apple Macintosh.

A Weird Imagination

Shadowrun's text compression

The problem#

Getting started#

Trace format#

Starting our analysis#

Identifying the variables#

The loads#

Syntax#

`$9D8000,x`#

`[$CF]`#

The stores#

`$21A0,y`#

The temporaries `$0222` and `$0224`#

Setting `$0222`#

Setting `$0224`#

Analyzing the logic#

`JSL $81AEC0`#

Summary of `JSL $81AEC0`#

Use of `JSL $81AEC0`#

Use of lookup table#

Summary of use of lookup table#

Why is there big endian data?#

Comments

The problem#

Getting started#

Trace format#

Starting our analysis#

Identifying the variables#

The loads#

Syntax#

$9D8000,x#

[$CF]#

The stores#

$21A0,y#

The temporaries $0222 and $0224#

Setting $0222#

Setting $0224#

Analyzing the logic#

JSL $81AEC0#

Summary of JSL $81AEC0#

Use of JSL $81AEC0#

Use of lookup table#

Summary of use of lookup table#

Why is there big endian data?#

`$9D8000,x`#

`[$CF]`#

`$21A0,y`#

The temporaries `$0222` and `$0224`#

Setting `$0222`#

Setting `$0224`#

`JSL $81AEC0`#

Summary of `JSL $81AEC0`#

Use of `JSL $81AEC0`#