www.retrosoftware.co.uk
http://www.retrosoftware.co.uk/forum/

BASIC tokenisation
http://www.retrosoftware.co.uk/forum/viewtopic.php?f=73&t=332
Page 1 of 1

Author:  elaverick [ Mon Aug 24, 2009 1:58 am ]
Post subject:  BASIC tokenisation

I know when BASIC programs get saved to disk they are tokenized and not just saved as the raw listing. My question is how are they tokenized? Is there any documentation to say how the code is actually stored? Ideally I'd like to see some examples of code being tokenized and detokenized... anyone able to help?

Author:  MartinB [ Mon Aug 24, 2009 7:21 am ]
Post subject:  Re: BASIC tokenisation

You may know the token values (?) but if not, it may help as a start point.

Basic has a keyword/token look-up table in rom which lists, in alphabetical order, each keyword in ASCII followed by two bytes where the first is the token (Bit 7 will always be set, i.e. $80 or greater) and the second byte contains special internal control bits, if any, pertaining to the keyword. The Basic 1 table starts at $8060, Basic 2 at $8071 and HiBasic at $B871.

You could use a simple Basic peek and display loop to print them out using the token (top bit set remember) as a 'next' pointer or use a PC hex editor or similar on the rom image(s).

Martin

Author:  jgharston [ Tue Aug 25, 2009 3:57 pm ]
Post subject:  Re: BASIC tokenisation

BASIC programs aren't tokenised when they're saved, they're tokenised when they are entered into memory.
http://mdfs.net/Docs/Comp/BBCBasic/Tokens gives a list of BBC BASIC tokens, and http://mdfs.net/Info/Comp/BBCBasic/ProgTips/ has code to use BASIC's inbuilt tokeniser.

Author:  MartinB [ Tue Aug 25, 2009 8:08 pm ]
Post subject:  Re: BASIC tokenisation

Hi Jonathan and welcome to RS :D

Quote:
BASIC programs aren't tokenised when they're saved...

That wording is perhaps slightly misleading. If you were to type 10 X = Y AND Z and then SAVE"TEST" followed by *DUMP TEST, you would see Blah, Blah, 80, Blah Blah in that the AND will have been tokenised to $80.

I agree it's been tokenised as it was entered into memory but the saved program will therefore always be tokenised on disc unless *SPOOL and LIST were used.

I know you know all that but I think it read as if saved programs aren't tokenised.

Martin

Author:  elaverick [ Tue Aug 25, 2009 11:32 pm ]
Post subject:  Re: BASIC tokenisation

Yeap its a symantic distinction but I know what you both mean... my bad for lacking precision in my original post.

Thanks for the info to both of you. I'm hoping to add BASIC to my IDE so I'm trying to understand how it's crunching everything down. It doesn't look too complex, but its been ages since I've done anything this close to real programming (too used to having my hand held and walked through things these days)

Author:  RichTW [ Wed Aug 26, 2009 7:16 am ]
Post subject:  Re: BASIC tokenisation

Never seen this trick to use EVAL to force tokenisation... interesting, thanks!

Author:  ThomasHarte [ Fri Aug 28, 2009 9:19 pm ]
Post subject:  Re: BASIC tokenisation

Sorry, slightly slow on the uptake...

ElectrEm contains a full and severable BASIC 2 tokeniser and detokeniser — it's used whenever you save or load a BASIC program as plaintext. The files are attached, in case they're useful. They're originally GPL2, but if you'd rather have them under a different licence then name it and I'll consider it (actually, I'll almost certainly say 'yes', but I don't want this text becoming a licence in itself, if you see what I mean).

EDIT: oh, an explicit part of this code being used in the emulator is that it expects to tokenise and detokenise from a RAM dump, not a file on disk. Should be trivial to adjust — just skip the bit where it grabs the value of PAGE to determine where to read/write BASIC from and the sanity checks on that and on TOP.

Attachments:
BASIC tokeniser.zip [6.37 KiB]
Downloaded 9 times

Author:  elaverick [ Sat Aug 29, 2009 8:31 pm ]
Post subject:  Re: BASIC tokenisation

ThomasHarte wrote:
Sorry, slightly slow on the uptake...

ElectrEm contains a full and severable BASIC 2 tokeniser and detokeniser — it's used whenever you save or load a BASIC program as plaintext. The files are attached, in case they're useful. They're originally GPL2, but if you'd rather have them under a different licence then name it and I'll consider it (actually, I'll almost certainly say 'yes', but I don't want this text becoming a licence in itself, if you see what I mean).

EDIT: oh, an explicit part of this code being used in the emulator is that it expects to tokenise and detokenise from a RAM dump, not a file on disk. Should be trivial to adjust — just skip the bit where it grabs the value of PAGE to determine where to read/write BASIC from and the sanity checks on that and on TOP.


Hi Thomas,
Thanks for the code, it looks pretty good. Am I right in thinking that ImportBASIC brings a standard TXT file into Memory and tokenises it, while ExportBASIC takes a BAS memory dump and detokenises it out as plain text? If that is indeed the case then I think I probably want to be taking it the other way (I.E. taking plain text from a char* and then writing that out as tokenized BASIC). To keep things nice and modular I want to keep the program as a seperate EXE so I think I just need to get my head around exactly what's going on and then flip the process.

Author:  ThomasHarte [ Sun Aug 30, 2009 12:17 am ]
Post subject:  Re: BASIC tokenisation

elaverick wrote:
Hi Thomas,
Thanks for the code, it looks pretty good. Am I right in thinking that ImportBASIC brings a standard TXT file into Memory and tokenises it, while ExportBASIC takes a BAS memory dump and detokenises it out as plain text? If that is indeed the case then I think I probably want to be taking it the other way (I.E. taking plain text from a char* and then writing that out as tokenized BASIC). To keep things nice and modular I want to keep the program as a seperate EXE so I think I just need to get my head around exactly what's going on and then flip the process.

Yep, you need to call SetupBASICTables() once so that it can prepare some look-up stuff (specifically, a sort-of hash table for faster string -> token lookup), then if you were to use it exactly as ElectrEm does you'd call ImportBASIC to have an ASCII file tokenised and stuffed into memory in the format the BASIC 2 ROM expects, ExportBASIC to have in-memory contents written out as an ASCII text file. Each returns 'true' on success, 'false' on failure. If there is an error, you can call GetBASICError to get a textual description of the error.

I guess you'd want to use a version of ImportBASIC, as it does the ASCII to tokenised conversion. The only internal function you should need to change is int my_fgetc, which acts just like fgetc but ignores \r. Adapt it to get the next ASCII character from wherever you'd prefer, obviously kill the stdio fopen/fclose/etc stuff and seed the Addr variable within ImportBASIC to 0 rather than to PAGE, kill the TOP sanity check and everything should work.

That is, assuming tokenised programs have exactly the same layout on disk/tape as in memory.

Author:  elaverick [ Sun Aug 30, 2009 6:48 pm ]
Post subject:  Re: BASIC tokenisation

ThomasHarte wrote:
That is, assuming tokenised programs have exactly the same layout on disk/tape as in memory.


Apparently they are :D

I need to tweak it a bit as it's currently producing 16Kb fixed size files which is a bit wasteful, but it works perfectly in the emulator. As soon as I've got that I'll post it here.

Author:  elaverick [ Sun Aug 30, 2009 10:04 pm ]
Post subject:  Re: BASIC tokenisation

Ok here we go. This is the first release of BASICTok based on Thomas's excellent ElectrEm code.

I've uploaded both the EXE and source (ready to compile in Visual Studio 2008). I've done next to nothing with the code so it's still under the GPL2 license. It's currently a plaintext to BBC converter only at the moment, but it won't take long to add the code to go the other way. It also shouldn't be too tricky to add support for generating BASIC IV code as well (as far as I can tell it just uses a differnent footer code (not header for some crazy reason I can't quite figure))

My *nix box is screwed at the moment, so if anyone wants to compile a *Nix version of this then do please feel free.

[EDIT] Temporarily removed the files... done something dumb before I uploaded that's knackered it...

Author:  elaverick [ Sun Aug 30, 2009 10:55 pm ]
Post subject:  Re: BASIC tokenisation

Ho hum, looks like I've done something really daft somewhere along the way and I now seem to be getting garbage in the tokenisation. I'll have a proper look at it tomorrow and see if I can post a solved version.

Page 1 of 1 All times are UTC [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/