Parsing Lessons

Continuing on from the previous post, I spent some time trying to figure out how best to parse the variable length instructions and figured I'd doc the failures here.

Base assumptions

These tokens will be used for the examples below:

define token opcode8(8)
    op8 = (0,7)
    op8_idx_bits = (2,3)
;
define token data8 (8)
  imm8=(0,7)
;

Too much data consumed

For this example consider the tables and constructor (i.e. instruction) below. I know that these could really be just done with the one line, but if you're attempting to make things modular and reusable, you'll see what I'm driving at:

imm_b:  "#"imm8 is imm8 { export *[const]:1 imm8; }

OP_b: is op8_idx_bits=0x00 ; imm_b { export imm_b; }
OP_b: is ...

:ADCA OP_b is (op8=0x73 | op8=0x63) ; OP_b { }

So I'd like to be able to have OP_b define what it exports based upon the op8_idx_bits field from the opcode. Unfortunately as it's defined above this is going to end up consuming 3 bytes. Like you see on the ram 000002 line here.

overshooting disassembly

It's going to first match the opcode8 token because the op8 field and consume those 8 bits out of the data. It's then going to attempt to match the op8_idx_bits token; and if it finds bits 2,3 set to 0 in the next 8 bits, it's going to consume the whole byte, then attempt to export the next 8 bits as the immediate data. This is because I've defined this with the ; (concatenation) operator and each one has been appended. But just applying & here causes mismatches and fails to compile.

How to fix?

I fixed this by using the & operator and a context register. I added the definition below for a context register.

define context contextreg
  idx_reg=(0,2) # hold onto some data extracted
;

I was then able to adjust the definition of the instruction to say I wanted access to the op8_idx_bits field and then set that in the context register in the idx_reg field. The table entry then defines what it needs the context register's fields to match and can then decompile correctly. This can be expanded out to handle the variations on the arguments.

OP_b: is idx_reg=0 & imm_b { export imm_b; }
OP_b: is idx_reg=...  

:ADCA OP_b is op8_idx_bits & (op8=0x73 | op8=0x63) ; OP_b [idx_reg=op8_idx_bits]{ }

The small test parses correctly after doing this.

working disassembly

I know, I know, this is actually a really contrived example, but you get the idea ;) .