Found an interesting post by someone
here, which looks like an extension of your method, i.e. it builds tokens which represent more characters by letting tokens expand into other tokens. This, he claims, gives 50% compression!
If you were happy to go with an arbitrary bitstream rather than one char/token per byte, you could improve on this even more.
One character literal can be packed into 5 bits, by assigning values like this:
0 = space
1-26 = a-z
27 = full stop
28 = comma
29 = apostrophe
30 = question mark
31 = escape code which modifies the next character literal, like this:
0 = hyphen
1-26 = A-Z (capitals)
27 = colon
28 = semicolon
29 = quote mark
30 = exclamation mark
31 = slash (or whatever)
Then, signify either a literal or a token like this:
* read 1 bit
* if 0, read 5 bits and treat as a character literal
* if 1, read 7 bits (or 8 if you want a bigger token table) and treat as a token
Reading arbitrary numbers of bits from a bitstream is really easy in 6502; I can post some code from a decompression routine of mine if you're interested.
Like this, simple untokenised text gets compressed to 75%, and with tokens, even more.
Also, remember there are other tricks, e.g. omitting spaces after punctuation (can be inferred at the rendering stage), also omitting capitalisation at the start of the paragraph or after a full stop.