It is currently Mon Oct 20, 2014 5:46 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: BASIC tokenisation
PostPosted: Mon Aug 24, 2009 1:58 am 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
I know when BASIC programs get saved to disk they are tokenized and not just saved as the raw listing. My question is how are they tokenized? Is there any documentation to say how the code is actually stored? Ideally I'd like to see some examples of code being tokenized and detokenized... anyone able to help?


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Mon Aug 24, 2009 7:21 am 
Offline
User avatar
 Profile

Joined: Fri Apr 25, 2008 7:55 pm
Posts: 147
You may know the token values (?) but if not, it may help as a start point.

Basic has a keyword/token look-up table in rom which lists, in alphabetical order, each keyword in ASCII followed by two bytes where the first is the token (Bit 7 will always be set, i.e. $80 or greater) and the second byte contains special internal control bits, if any, pertaining to the keyword. The Basic 1 table starts at $8060, Basic 2 at $8071 and HiBasic at $B871.

You could use a simple Basic peek and display loop to print them out using the token (top bit set remember) as a 'next' pointer or use a PC hex editor or similar on the rom image(s).

Martin


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Tue Aug 25, 2009 3:57 pm 
Offline
 Profile

Joined: Sat Aug 22, 2009 7:45 pm
Posts: 34
BASIC programs aren't tokenised when they're saved, they're tokenised when they are entered into memory.
http://mdfs.net/Docs/Comp/BBCBasic/Tokens gives a list of BBC BASIC tokens, and http://mdfs.net/Info/Comp/BBCBasic/ProgTips/ has code to use BASIC's inbuilt tokeniser.


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Tue Aug 25, 2009 8:08 pm 
Offline
User avatar
 Profile

Joined: Fri Apr 25, 2008 7:55 pm
Posts: 147
Hi Jonathan and welcome to RS :D

Quote:
BASIC programs aren't tokenised when they're saved...

That wording is perhaps slightly misleading. If you were to type 10 X = Y AND Z and then SAVE"TEST" followed by *DUMP TEST, you would see Blah, Blah, 80, Blah Blah in that the AND will have been tokenised to $80.

I agree it's been tokenised as it was entered into memory but the saved program will therefore always be tokenised on disc unless *SPOOL and LIST were used.

I know you know all that but I think it read as if saved programs aren't tokenised.

Martin


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Tue Aug 25, 2009 11:32 pm 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
Yeap its a symantic distinction but I know what you both mean... my bad for lacking precision in my original post.

Thanks for the info to both of you. I'm hoping to add BASIC to my IDE so I'm trying to understand how it's crunching everything down. It doesn't look too complex, but its been ages since I've done anything this close to real programming (too used to having my hand held and walked through things these days)


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Wed Aug 26, 2009 7:16 am 
Offline
User avatar
 Profile

Joined: Mon Jan 07, 2008 6:46 pm
Posts: 380
Location: Málaga, Spain
Never seen this trick to use EVAL to force tokenisation... interesting, thanks!


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Fri Aug 28, 2009 9:19 pm 
Offline
 Profile

Joined: Mon Aug 04, 2008 1:54 pm
Posts: 55
Sorry, slightly slow on the uptake...

ElectrEm contains a full and severable BASIC 2 tokeniser and detokeniser — it's used whenever you save or load a BASIC program as plaintext. The files are attached, in case they're useful. They're originally GPL2, but if you'd rather have them under a different licence then name it and I'll consider it (actually, I'll almost certainly say 'yes', but I don't want this text becoming a licence in itself, if you see what I mean).

EDIT: oh, an explicit part of this code being used in the emulator is that it expects to tokenise and detokenise from a RAM dump, not a file on disk. Should be trivial to adjust — just skip the bit where it grabs the value of PAGE to determine where to read/write BASIC from and the sanity checks on that and on TOP.


Attachments:
BASIC tokeniser.zip [6.37 KiB]
Downloaded 8 times
Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Sat Aug 29, 2009 8:31 pm 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
ThomasHarte wrote:
Sorry, slightly slow on the uptake...

ElectrEm contains a full and severable BASIC 2 tokeniser and detokeniser — it's used whenever you save or load a BASIC program as plaintext. The files are attached, in case they're useful. They're originally GPL2, but if you'd rather have them under a different licence then name it and I'll consider it (actually, I'll almost certainly say 'yes', but I don't want this text becoming a licence in itself, if you see what I mean).

EDIT: oh, an explicit part of this code being used in the emulator is that it expects to tokenise and detokenise from a RAM dump, not a file on disk. Should be trivial to adjust — just skip the bit where it grabs the value of PAGE to determine where to read/write BASIC from and the sanity checks on that and on TOP.


Hi Thomas,
Thanks for the code, it looks pretty good. Am I right in thinking that ImportBASIC brings a standard TXT file into Memory and tokenises it, while ExportBASIC takes a BAS memory dump and detokenises it out as plain text? If that is indeed the case then I think I probably want to be taking it the other way (I.E. taking plain text from a char* and then writing that out as tokenized BASIC). To keep things nice and modular I want to keep the program as a seperate EXE so I think I just need to get my head around exactly what's going on and then flip the process.


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Sun Aug 30, 2009 12:17 am 
Offline
 Profile

Joined: Mon Aug 04, 2008 1:54 pm
Posts: 55
elaverick wrote:
Hi Thomas,
Thanks for the code, it looks pretty good. Am I right in thinking that ImportBASIC brings a standard TXT file into Memory and tokenises it, while ExportBASIC takes a BAS memory dump and detokenises it out as plain text? If that is indeed the case then I think I probably want to be taking it the other way (I.E. taking plain text from a char* and then writing that out as tokenized BASIC). To keep things nice and modular I want to keep the program as a seperate EXE so I think I just need to get my head around exactly what's going on and then flip the process.

Yep, you need to call SetupBASICTables() once so that it can prepare some look-up stuff (specifically, a sort-of hash table for faster string -> token lookup), then if you were to use it exactly as ElectrEm does you'd call ImportBASIC to have an ASCII file tokenised and stuffed into memory in the format the BASIC 2 ROM expects, ExportBASIC to have in-memory contents written out as an ASCII text file. Each returns 'true' on success, 'false' on failure. If there is an error, you can call GetBASICError to get a textual description of the error.

I guess you'd want to use a version of ImportBASIC, as it does the ASCII to tokenised conversion. The only internal function you should need to change is int my_fgetc, which acts just like fgetc but ignores \r. Adapt it to get the next ASCII character from wherever you'd prefer, obviously kill the stdio fopen/fclose/etc stuff and seed the Addr variable within ImportBASIC to 0 rather than to PAGE, kill the TOP sanity check and everything should work.

That is, assuming tokenised programs have exactly the same layout on disk/tape as in memory.


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Sun Aug 30, 2009 6:48 pm 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
ThomasHarte wrote:
That is, assuming tokenised programs have exactly the same layout on disk/tape as in memory.


Apparently they are :D

I need to tweak it a bit as it's currently producing 16Kb fixed size files which is a bit wasteful, but it works perfectly in the emulator. As soon as I've got that I'll post it here.


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Sun Aug 30, 2009 10:04 pm 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
Ok here we go. This is the first release of BASICTok based on Thomas's excellent ElectrEm code.

I've uploaded both the EXE and source (ready to compile in Visual Studio 2008). I've done next to nothing with the code so it's still under the GPL2 license. It's currently a plaintext to BBC converter only at the moment, but it won't take long to add the code to go the other way. It also shouldn't be too tricky to add support for generating BASIC IV code as well (as far as I can tell it just uses a differnent footer code (not header for some crazy reason I can't quite figure))

My *nix box is screwed at the moment, so if anyone wants to compile a *Nix version of this then do please feel free.

[EDIT] Temporarily removed the files... done something dumb before I uploaded that's knackered it...


Top
 
 Post subject: Re: BASIC tokenisation
PostPosted: Sun Aug 30, 2009 10:55 pm 
Offline
 Profile

Joined: Sun Aug 09, 2009 6:55 pm
Posts: 13
Ho hum, looks like I've done something really daft somewhere along the way and I now seem to be getting garbage in the tokenisation. I'll have a proper look at it tomorrow and see if I can post a solved version.


Top
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron