It is currently Mon Oct 20, 2014 3:59 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: BASIC USR() v.s. ASM JSR
PostPosted: Sun Oct 28, 2012 9:23 am 
Offline
 Profile

Joined: Tue Oct 19, 2010 7:45 am
Posts: 25
Hi all, some help would be appreciated.

To get back into assembler I'm trying to code some routines to display hexagonal maps in mode2. It's coming along nicely but I can't figure out this behavior.

The following routine (with some data parts left out) displays a hexagonal sprite at one of 149 screen positions which are arranged in an hexagonal grid.

Due to the EOR it nicely removes the hex again when calling it for the second time. This way it displays a the map defined from 1510 onwards.

To speed up things I want to move the map drawing to assembler as well. As a first test to see how fast that would be I made the test loop at line 1320, which I presumed to have the same sort of behaviour. However, somehow the sprite reference seems to get mixed up in this case (see screenshot)

So can anybody tell what the USR at 1480 call is doing to make everything (seemingly) correct? I'm not using any stack now so I presume I don't have to save registers as usual.



Code:
>LIST
   10MODE2
   20REMVDU28,16,31,19,0
   30DIMMC% &500
   40FORopt%=0TO2STEP2
   50P%=MC%
   60[
   70OPT opt%
   80.hexer  ROL A:STA hexnr
   90        TXA:CLC:ROL A:TAX:BCCloscr
  100       \Carry is set, so use hscreen position
  110.hiscr  LDA hscreen,X:STA &81:INX:LDA hscreen,X:STA&80:JMPldspr
  120.loscr  LDA screen,X:STA &81:INX:LDA screen,X:STA&80
  130.ldspr  LDX hexnr:LDA spstrt,X:STA &82:INX:LDA spstrt,X:STA&83
  140:
  150.main  \Loop for 6 rows
  160        LDA#&06
  170        STA &84
  180.row   \Loop for 6 bloc4s
  190        LDX #&06
  200.blk    LDY #&00
  210        LDA (&82),Y:EOR (&80),Y:STA (&80),Y:INY
  220        LDA (&82),Y:EOR (&80),Y:STA (&80),Y:INY
  230        LDA (&82),Y:EOR (&80),Y:STA (&80),Y:INY
  240        LDA (&82),Y:EOR (&80),Y:STA (&80),Y
  250       \Increase screen base address by 8 (skip 4 bytes)
  260        CLC:LDA #&08:ADC &80:STA &80:BCC spradd
  270        LDA #&00:ADC &81:STA &81
  280       \Increase sprite base address by 4
  290.spradd CLC:LDA #&04:ADC &82:STA &82:BCC nxtblk
  300        LDA #&00:ADC &83:STA &83
  310.nxtblk DEX:BNE blk
  320       \Check last nibble of screen address
  330        LDA &80:AND #&0F:CMP #&0C:BEQ fulrow
  340        LDA &80:AND #&0F:CMP #&04:BEQ fulrow
  350       \Nibble=0 or 8, so substract &2C
  360        SEC:LDA &80:SBC #&2C:STA &80:BCS nxtrow
  370        LDA &81:SBC #&00:STA &81
  380        JMP nxtrow
  390.fulrow CLC:LDA #&4C:ADC &80:STA &80:LDA #&02:ADC &81:STA &81
  400.nxtrow DEC &84:BNE row
  410        RTS
  420:
  430.spstrt
  440EQUW water:EQUW plains:EQUW hills:EQUW desert:EQUW city
  450.hexnr   EQUB &00       \Hex to be plotted
  460:
  470\6 rows of 6 blocks of 4 bytes top to bottom
  480\block is 4 bytes bottom to top
  490.plains
  500EQUD&00000000:EQUD&0C040400:EQUD&0C0C0C00
  510EQUD&0C0C0400:EQUD&0C080800:EQUD&00000000
  520EQUD&04040000:EQUD&0C0C0C08:EQUD&0C0C0C0C
  530EQUD&0C0C080C:EQUD&0C0C0C0C:EQUD&08080000
  540EQUD&0C0C0C04:EQUD&0C0C0C04:EQUD&0C080C0C
  550EQUD&0C0C0C0C:EQUD&0C0C080C:EQUD&0C0C0C08
  560EQUD&040C0C0C:EQUD&0C0C080C:EQUD&0C0C0C0C
  570EQUD&0C040C0C:EQUD&0C0C080C:EQUD&080C0C0C
  580EQUD&00000404:EQUD&0C080C0C:EQUD&0C0C0C0C
  590EQUD&0C0C0C0C:EQUD&0C0C0C0C:EQUD&00000808
  600EQUD&00000000:EQUD&0004040C:EQUD&00040C0C
  610EQUD&000C0C08:EQUD&0008080C:EQUD&00000000
  620.water
<SNIP>
  750.hills
<SNIP>
  880.city
<SNIP>
 1010.desert
<SNIP>
 1140.screen \vertical rows of 10 hexes
 1150EQUD&80370030:EQUD&8046003F:EQUD&8055004E:EQUD&8064005D:EQUD&8073006C
<SNIP>
 1280.hscreen
 1290EQUD&6075E06D
 1300EQUD&0C3C8C34:EQUD&0C4B8C43:EQUD&0C5A8C52:EQUD&0C698C61:EQUD&0C788C70
 1310EQUD&B0393032:EQUD&B0483041:EQUD&B0573050:EQUD&B066305F:EQUD&B075306E
 1311:
 1320.test LDA#100:STA&85
 1330.dal  LDA#1:LDX&85:JSRhexer:DEC&85:BNEdal:RTS
 1340]
 1350NEXTopt%
 1360CLS
 1370DIMMAP%(10,15)
 1380FORY%=0TO9:FORX%=0TO14:READ MAP%(Y%,X%):NEXTX%:NEXTY%
 1390FORY%=0TO9:FORX%=0TO14:I%=X%*10+Y%:PROChex(I%,MAP%(Y%,X%)):NEXTX%:NEXTY%
 1400REMFORi%=0TO149:PROChex(i%,0):NEXTi%
 1410REMCALL(test)
 1420:
 1430END
 1440:
 1450DEFPROChex(pos%,type%)
 1460LOCALA%,X%,Y%,R%,C%
 1470A%=type%:X%=pos%:Y%=0
 1480R%=USR(hexer)
 1490ENDPROC
 1500:
 1510DATA 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1
 1520DATA 0,0,0,0,0,1,1,2,2,2,3,3,1,2,3
 1530DATA 0,0,0,0,0,0,1,1,2,2,2,3,1,2,3
 1540DATA 1,0,0,0,0,0,0,1,1,2,2,2,1,2,3
 1550DATA 1,0,0,0,0,0,0,1,1,4,4,2,2,2,2
 1560DATA 1,1,1,0,0,0,0,0,1,1,4,4,1,2,2
 1570DATA 2,1,1,1,1,0,0,0,1,1,4,1,1,1,2
 1580DATA 2,2,1,1,1,1,1,0,1,1,1,0,0,0,2
 1590DATA 3,2,2,2,1,1,1,1,0,1,1,0,0,2,2
 1600DATA 3,3,2,2,2,1,2,1,0,1,0,1,0,0,2
>


Attachments:
faultyhex.png [134.54 KiB]
Downloaded 114 times
Top
 
PostPosted: Sun Oct 28, 2012 1:56 pm 
Offline
 Profile

Joined: Tue Oct 19, 2010 7:45 am
Posts: 25
Found the culprit!!!!! It always helps to formulate the question :|

The first ROL A rolls "in" a set carry flag. Putting a CLC at the start fixed it.

However, if anyone has pointers how to speed up the routine let me know. Although displaying the map through machine code is much faster, it will still not make for a smooth refresh I'm afraid.

Now to move everything to Sideways Ram and make it an OSWORD.....


Top
 
PostPosted: Mon Oct 29, 2012 2:06 pm 
Offline
User avatar
 Profile

Joined: Mon Jan 07, 2008 6:46 pm
Posts: 380
Location: Málaga, Spain
Hi! Looking nice! :)

ASL A is a left-shift which shifts in a zero rather than the carry flag, so this is preferable to CLC:ROL A.

I can see a few additional improvements, but nothing major.

Instead of:
Code:
  250       \Increase screen base address by 8 (skip 4 bytes)
  260        CLC:LDA #&08:ADC &80:STA &80:BCC spradd
  270        LDA #&00:ADC &81:STA &81
  280       \Increase sprite base address by 4
  290.spradd
do:
Code:
  250       \Increase screen base address by 8 (skip 4 bytes)
  260        CLC:LDA #&08:ADC &80:STA &80:BCC spradd
  270        INC &81
  280       \Increase sprite base address by 4
  290.spradd

(and similar for the block below)

These lines:
Code:
  320       \Check last nibble of screen address
  330        LDA &80:AND #&0F:CMP #&0C:BEQ fulrow
  340        LDA &80:AND #&0F:CMP #&04:BEQ fulrow
can, I think, be more simply expressed as:
Code:
  320       \Check last nibble of screen address
  330        LDA &80:AND #&07:CMP #&04:BEQ fulrow


Then, at line 360, a similar optimisation to the first one:
Code:
  360        SEC:LDA &80:SBC #&2C:STA &80:BCS nxtrow
  370        LDA &81:SBC #&00:STA &81
becomes:
Code:
  360        SEC:LDA &80:SBC #&2C:STA &80:BCS nxtrow
  370        DEC &81


And finally, if it gets to fulrow, we know C is already set (from the CMP earlier), so we can save a CLC by changing:
Code:
  390.fulrow CLC:LDA #&4C:ADC &80:STA &80:LDA #&02:ADC &81:STA &81
to:
Code:
  390.fulrow LDA #&4B:ADC &80:STA &80:LDA #&02:ADC &81:STA &81


If you could find an efficient way to do so, i.e. with little overhead for the non-true case, it might be worth checking if you're plotting the top or bottom row, and if so only plotting the middle 4 byte columns (as the first and last will be blank in the case of a hexagonal sprite). This'd only be worthwhile though if you could incur little overhead in the other rows, otherwise the extra checks would negate any optimisation gained.


Top
 
PostPosted: Mon Oct 29, 2012 2:48 pm 
Offline
 Profile

Joined: Tue Oct 19, 2010 7:45 am
Posts: 25
Tnx, gonna give those tips a go. :D

Actually I already pondered leaving out the corner blocks. The added benefit would be that the definitions would actually align to pages (2 sprites per page).
When that's the case the sprite base address calculation would not have to take the high byte into account if the sprite start address is correctly aligned.

I'd have to count the cycles thought to see if it will actually speed things up.

Of course the speed doesn't matter that much on a fixed map, but I'd like this to window a much larger map and do some kind of scrolling. If the hex plotting routine is moved to SWR and can be called through an osword, the map can be whatever a 2nd processor can hold.


Top
 
PostPosted: Mon Oct 29, 2012 4:20 pm 
Offline
User avatar
 Profile

Joined: Mon Jan 07, 2008 6:46 pm
Posts: 380
Location: Málaga, Spain
Yep, that's a good idea, and if you align the sprite data to page boundaries, you won't incur the 1 cycle penalty in LDA(),Y for crossing a page boundary. I'd be inclined to handle the top row and bottom row as special cases outside of the main loop, maybe just as a subroutine which is called before and afterwards.


Top
 
PostPosted: Tue Oct 30, 2012 7:54 am 
Offline
 Profile

Joined: Tue Oct 19, 2010 7:45 am
Posts: 25
Added your suggestions and they work fine.

I like your take on the nibble check. Had to write it down to see how that worked.

As expected it doesn't really have much visible impact on a complete screen refresh.
I expect removing the corners won't do that either.

First priority now is the SWR OSWORD code. Going through Bruce Smith's book.


Top
 
PostPosted: Sun Nov 04, 2012 5:35 am 
Offline
 Profile

Joined: Sat Aug 22, 2009 7:45 pm
Posts: 34
pstnotpd wrote:
Now to move everything to Sideways Ram and make it an OSWORD.....
Make sure you use the correct layour for the control block:
BeebWiki:OSWORD
and make sure you don't clash with an existing OSWORD:
BeebWiki:OSWORDs
It's best practise to float the proposed OSWORD number and parameter block layout on the BBC Micro Mailing List before developing the code.


Top
 
PostPosted: Sun Nov 04, 2012 8:55 am 
Offline
 Profile

Joined: Tue Oct 19, 2010 7:45 am
Posts: 25
jgharston wrote:
It's best practise to float the proposed OSWORD number and parameter block layout on the BBC Micro Mailing List before developing the code.

I've been a "lurker" on the mailing list for quite a while now ;)

For testing I'm going with OSWORD &65 as in Bruce Smith's book. I first want to get it working.
And by working I mean it should be possible to show hexes from the ARM7 scheme interpreter. 8-)

For now I managed to remove the 4*4 byte blocks at the corners like RichTW suggested so the hex definitions are now (and must now be) nicely page aligned per 2.

And I'm pleasantly surprised the code seems to work fine in all 20K modes (0-2). For the "wargame" purpose I have in mind it might be better to go with detailed mode 0 hexes.....

To be cont'd


Top
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron