www.retrosoftware.co.uk
http://www.retrosoftware.co.uk/forum/

Line drawing: the horror!
http://www.retrosoftware.co.uk/forum/viewtopic.php?f=73&t=344
Page 1 of 1

Author:  ThomasHarte [ Tue Sep 15, 2009 9:58 pm ]
Post subject:  Line drawing: the horror!

Please excuse my brevity; the forum's auto-logout cost me the original version of this post... the title refers to the fun of dealing with a non-linear framebuffer, of course!

I'm experimenting with vector graphics. My primary target machine is the Electron. My plan is currently to use a 256x256 window on the lefthand side of a standard Mode 4 (no CRTC = no real control here, alas) screen. My screen start address will always be 256-byte aligned, though I can't recall if the code I'm about to post still relies on that assumption. All clipping will occur before drawing, so none of the drawing parts need to worry about clipping and all coordinates can be dealt with safely as 8bit numbers.

Today's fun is a line plotter. It's Bresenham. So far I've only bothered to implement the case where the x delta is greater than the y delta and both x and y increase from the first specified coordinate to the second. I've written the following (BeebAsm-type) code:
Code:
.SetTable
   equb 128, 64, 32, 16, 8, 4, 2, 1

.GetTLAddressForXY
{
   lda #248
   equb $8f   ;
   equw temp1   ; "sax temp1", so temp1 = x&~7 (x offset to 8x8 block)

   lda #0
   sta temp2
   tya
   and #248   ; temp2:a = (y >> 3)*8
   pha

   pla
   asl a
   rol temp2
   asl a
   rol temp2
   asl a
   rol temp2   ; temp2:a = (y >> 3)*64

   sta temp3
   lda temp2
   sta temp4
   lda temp3   ; temp2:a copied to temp4:temp3

   asl a
   rol temp2
   asl a
   rol temp2   ; temp2:a = (y >> 3)*256

   adc temp3
   sta address   ; add low byte of (y >> 3)*64 to low byte of (y >> 3)*256, store as low byte of address to modify

   lda temp2
   adc temp4
   adc #(ScrBuffer DIV 256)
   sta address+1   ; add high byte of (y >> 3)*64 to high byte of (y >> 3)*256 and screen buffer start address
         ; store as high byte of address to modify

   rts
}

.DrawLine
{
   ; plots a line from (x1, y1) to (x2, y2)
   ; clobbers everything

   ; (0, 100) to (100, 150)

.xplyplxmajor
   ; loop for x increasing, y increasing, x difference greater than y difference

   ldx x1
   ldy y1
   jsr GetTLAddressForXY

   sec
   lda y2
   sbc y1
   sta y2      ; y2 = deltay

   sec
   lda x2
   sbc x1
   sta x2      ; x2 = deltax

   lsr a      ; a = error

   pha      ; perform y = y1&7, x1 &= 7
      lda y1
      and #7
      tay

      lda x1
      and #7
      sta x1

      tax
      lda SetTable, x
      sta pixelmask+1
   pla

   ldx x2      ; x = x2 = pixel count

.pixelloopx0
   ; plot current pixel
   pha
      lda (address),y
.pixelmask   ora #128
      sta (address),y
   pla

   ; subtract deltay from error
   sec
   sbc y2
   bcs noyinc   ; don't increment y if error >= 0

      clc
      adc x2      ; add deltax to error

      pha
      iny      ; increment y
      tya
      and #7
      bne noyoverflow   ; if y hasn't just hit 8 then everything is fine

         tay      ; if it has then add 320 to the storage address
         clc
         lda address
         adc #64
         sta address
         lda address+1
         adc #1
         sta address+1

   .noyoverflow
      pla

.noyinc
   inc x1      ; increment x

   lsr pixelmask+1   ; shift the pixel mask right one, so that the next plot will be to the next pixel along
   pha
   lda x1      ; check if x just crossed an 8 pixel boundary
   and #7
   bne noxoverflow

      sta x1   ; reset pixel mask to 128 if so, and add 8 to the current screen address
      lda #128
      sta pixelmask+1
      clc
      lda address
      adc #8
      sta address
      lda address+1
      adc #0
      sta address+1

.noxoverflow
   pla
   dex      ; decrease x count, go again if necessary
   bne pixelloopx0

   rts
}

GetTLAddressForXY is a hasty chop up of my individual pixel plotter. It may still contain some redundancy in this version, since here it is used only to get the address at the top of the 8-byte column of pixels.

My plan is to sort co-ordinates so that I'm always drawing in increasing x, and to hastily dynamically reprogram to deal with decreasing y. Then the y major greater than x major version of the code will be a recut of this stuff.

Before I do that, I thought I'd post it here for comment, since I figure probably there are a bunch of clever optimisations I am missing or maybe I've just taken the wrong approach in some little bits. I know that a Bresenham plotter is not exactly a complicated piece of code, but I'm not exactly overly experienced with 6502 programming, so it's possible that I just have some bad habits.

Anyone? Any thoughts?

Author:  ThomasHarte [ Tue Sep 15, 2009 11:42 pm ]
Post subject:  Re: Line drawing: the horror!

Definitely going to bed now, but I thought I'd post my quickly modified version to do left-to-right, deltax > deltay but with deltay either positive or negative. My current plan is to look at it again later in the week, see if I can optimise it a little more (still without spending too much memory), add the deltay > deltax equivalent, write a clearing version (which returns at least the pixels along a line to black and is allowed to reset any other pixels in addition) and put them in the wiki in case they're useful for anyone else and in the hope that if anyone else has any smart optimisation ideas or even just better overall code then they'll turf mine out.
Code:
.DrawLine
{
   ; plots a line from (x1, y1) to (x2, y2)
   ; clobbers everything

.xplyplxmajor
   ; loop for x increasing, y increasing, x difference greater than y difference

   ldx x1
   ldy y1
   jsr GetTLAddressForXY

   sec
   lda y2
   sbc y1
   sta y2      ; y2 = deltay
   bcc ynegative   ; dynamically reprogram loop depending on whether deltay is negative

   ldx #&c8
   stx yinc   ; set yinc to be in
   ldx #64
   stx yincadc1+1
   ldx #1
   stx yincadc2+1   ; add 320 to the base address each time an 8x8 is stepped out of in the y direction
   ldx #&29
   stx ycomp   ; set the potential cmp to 7 to be a duplicate of the preceding or
   jmp yend

.ynegative
   sec      ; negate deltay, since it needs to be an absolute value for the Bresenham loop
   lda #0
   sbc y2
   sta y2   
   ldx #&88
   stx yinc   ; set yin to be dey
   ldx #&c0
   stx yincadc1+1
   ldx #&fe
   stx yincadc2+1   ; add -320 to the base address each time an 8x8 is stepped out of in the y direction
   ldx #&c9
   stx ycomp   ; set the potential cmp to 7 to be an actual cmp - 8x8 was left just now if modified y is 7

.yend

   sec
   lda x2
   sbc x1
   sta x2      ; x2 = deltax

   lsr a      ; a = error

   pha      ; perform y = y1&7, x1 &= 7
      lda y1
      and #7
      tay

      lda x1
      and #7
      sta x1

      tax
      lda SetTable, x
      sta pixelmask+1
   pla

   ldx x2      ; x = x2 = pixel count

.pixelloopx0
   ; plot current pixel
   pha
      lda (address),y
.pixelmask   ora #128
      sta (address),y
   pla

   ; subtract deltay from error
   sec
   sbc y2
   bcs noyinc   ; don't increment y if error >= 0

      clc
      adc x2      ; add deltax to error

      pha
.yinc      dey      ; increment[/decrement] y
      tya
      and #7
.ycomp      cmp #7
      bne noyoverflow   ; if y hasn't just hit 8 then everything is fine

         tay      ; if it has then add 320 to the storage address
         clc
         lda address
.yincadc1      adc #64
         sta address
         lda address+1
.yincadc2      adc #1
         sta address+1

   .noyoverflow
      pla

.noyinc
   inc x1      ; increment x

   lsr pixelmask+1   ; shift the pixel mask right one, so that the next plot will be to the next pixel along
   pha
   lda x1      ; check if x just crossed an 8 pixel boundary
   and #7
   bne noxoverflow

      sta x1   ; reset pixel mask to 128 if so, and add 8 to the current screen address
      lda #128
      sta pixelmask+1
      clc
      lda address
      adc #8
      sta address
      lda address+1
      adc #0
      sta address+1

.noxoverflow
   pla
   dex      ; decrease x count, go again if necessary
   bne pixelloopx0

   rts
}


EDIT: in case anyone is wondering about some of the weird comments about yin, etc (hopefully fixed now, but maybe I missed some), it's that I hadn't yet disabled the new silent autocorrect in OS X's TextEdit.

Author:  RichTW [ Wed Sep 16, 2009 9:17 am ]
Post subject:  Re: Line drawing: the horror!

Hi Thomas,

Yeah, looks pretty good! But a few comments (on your first piece of code)...

ThomasHarte wrote:
Code:
.SetTable
   equb 128, 64, 32, 16, 8, 4, 2, 1

.GetTLAddressForXY
{
   lda #248
   equb $8f   ;
   equw temp1   ; "sax temp1", so temp1 = x&~7 (x offset to 8x8 block)
This won't run on a Master, so probably best avoided.

Quote:
Code:
   lda #0
   sta temp2
   tya
   and #248   ; temp2:a = (y >> 3)*8
   pha

   pla
   asl a
   rol temp2
   asl a
   rol temp2
   asl a
   rol temp2   ; temp2:a = (y >> 3)*64

   sta temp3
   lda temp2
   sta temp4
   lda temp3   ; temp2:a copied to temp4:temp3

   asl a
   rol temp2
   asl a
   rol temp2   ; temp2:a = (y >> 3)*256

   adc temp3
   sta address   ; add low byte of (y >> 3)*64 to low byte of (y >> 3)*256, store as low byte of address to modify

   lda temp2
   adc temp4
   adc #(ScrBuffer DIV 256)
   sta address+1   ; add high byte of (y >> 3)*64 to high byte of (y >> 3)*256 and screen buffer start address
         ; store as high byte of address to modify

   rts
}
I know this routine is only called once, but if you wanted to speed it up a bit more, I think the best way is from 2 32 byte lookup tables which return the LSB and MSB of the screen address of each row:
Code:
tya
lsr a
lsr a
lsr a
tax
lda scraddrlo,x
sta address
lda scraddrhi,x
sta address+1


Quote:
Code:
   ; subtract deltay from error
   sec
   sbc y2
   bcs noyinc   ; don't increment y if error >= 0

      clc
Obvious optimisation, but this clc isn't required.

Quote:
Code:
   lsr pixelmask+1   ; shift the pixel mask right one, so that the next plot will be to the next pixel along
   pha
   lda x1      ; check if x just crossed an 8 pixel boundary
   and #7
   bne noxoverflow
You can check whether it has crossed a boundary simply by testing the zero flag after the lsr pixelmask+1 (it will be zero if you have shifted the last bit out).

Quote:
Code:
      sta x1   ; reset pixel mask to 128 if so, and add 8 to the current screen address
      lda #128
      sta pixelmask+1
      clc
      lda address
      adc #8
      sta address
      lda address+1
      adc #0
      sta address+1
Don't need to do this - for 16-bit + 8-bit add, just write:
Code:
clc
lda address
adc #8
sta address
bcc nocarry
inc address+1
.nocarry

You don't actually need the clc either. You know carry will always be set following the lsr pixelmask+1 (it will have shifted out a 1), so omit the clc, and adc #7 instead.

Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back - or even just writing:
Code:
sta preserveme+1
  ...
  ...
  ...
.preserveme
lda #0

I think that ensuring that you always plot from left to right (by ensuring startx <= endx) is a good plan, and then self-modifying the code to inc or dec in y. Another possible optimisation in the routine where dx > dy might be to only do the lda/ora/sta to screen memory when you are moving to the next byte, and meanwhile to build up an image of the entire byte to be ORed separately (because (indirect,Y) addressing is slower, especially for sta).

Of course, if you were targeting the Beeb, you'd save a lot by shrinking the screen to 32 columns wide, thereby giving you 256 byte wide screen rows :)

One thing you should consider right from the start, by the way, is clipping - assuming that you might sometimes need to render lines which clip off the side of the screen. If it's a full 3D game, then probably you will hold your vertex coordinates as 16-bit values anyway. You then have two choices as to how to plot arbritrarily clipped lines:

* Pass 16-bit values to your line plotter, and let it determine whether the line is totally offscreen (for a quick reject), otherwise run the Bresenham algorithm in 16-bit, and skip plotting for any coordinates where the x or y are out of range. This is a robust method, but is computationally wasteful if you know that everything you plot is going to be within screen bounds (if, for example, a simple bounding box check has shown all the coordinates to be in range).

* Do your clipping before you call your line plotter, and feed it in-range 8-bit coordinates, as you are now. This will involve you probably needing to do some division (you could do repeated subtraction, like Bresenham does, but this will be slower), something like:
Code:
if startx < 0:
  startx_clipped = 0
  starty_clipped = starty - (dy/dx) * startx

if endx > 255:
  endx_clipped = 255
  endy_clipped = endy - (dy/dx) * (endx - 255)
...and so on.

Hope there's something of use there for you :)

(Edit: just read your caveat about clipping in the 2nd paragraph - looks like you've already got this one wrapped up then!)

Author:  RichTW [ Wed Sep 16, 2009 9:25 am ]
Post subject:  Re: Line drawing: the horror!

Oh, and another small point - you might find that the routine is more useful if it doesn't plot the final point in the line, since the end point of one line is most likely the start point of another.

Even the graphics hardware of games consoles tends to make this same optimisation, and not plot the bottom or right hand edges of triangles, simply because those pixels are likely to be part of an adjoining triangle (and this would leave ugly overprinting on the edges if you a plotting a mesh with some kind of transparency).

Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)!

Author:  RichTW [ Wed Sep 16, 2009 12:55 pm ]
Post subject:  Re: Line drawing: the horror!

ThomasHarte wrote:
Code:
   sec
   lda y2
   sbc y1
   sta y2      ; y2 = deltay
   bcc ynegative   ; dynamically reprogram loop depending on whether deltay is negative
     .....
     .....
.ynegative
   sec      ; negate deltay, since it needs to be an absolute value for the Bresenham loop
   lda #0
   sbc y2
   sta y2
Another nice little 6502 trick; if you want to negate a value in A, you can do it a bit more quickly like this:
Code:
sec
lda y1
sbc y2
bcc ynegative
sta y2
   .....
   .....
.ynegative
eor #255
adc #1
sta y2

Author:  ThomasHarte [ Wed Sep 16, 2009 2:16 pm ]
Post subject:  Re: Line drawing: the horror!

Have suddenly realised I'm busy for most of the rest of the week, but I do intend to return with full code for this tiny segment of my stuff as soon as it is ready. In the meantime:
Quote:
This won't run on a Master, so probably best avoided.

The program won't run on any BBC, so I'm not bothered by that. It's no more Electron-specific than the code that disables all uninteresting interrupts, for example (as I don't think the OS has an entry point for that sort of stuff?), and the other code that subsequently reads the keyboard. So I just need to remember to flag this stuff up so that I can find it later should I want to do a BBC port.
Quote:
I know this routine is only called once, but if you wanted to speed it up a bit more, I think the best way is from 2 32 byte lookup tables which return the LSB and MSB of the screen address of each row

Smart move. And it's used much more than it looks, being essentially a copy and paste from my pixel plotter. And 32 bytes isn't exactly a bother, even if it actually were 32 bytes and not 32 bytes – (saved instructions * 2).
Quote:
You can check whether it has crossed a boundary simply by testing the zero flag after the lsr pixelmask+1 (it will be zero if you have shifted the last bit out).

Neat spot! I will vainly pretend that it would have occurred to me.
Quote:
Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back

Shouldn't both alternative be 3 cycles?
Quote:
Another possible optimisation in the routine where dx > dy might be to only do the lda/ora/sta to screen memory when you are moving to the next byte

Oh, yes. I've just switched back to the 6502 from the z80 and haven't fully mentally adjusted. I don't think my thinking went beyond "can't keep it in a register, will have to leave it in memory", never got to "should keep it in faster memory, move it out to slower later". Which I guess doesn't just refer to addressing mode access speeds as any users with a turbo board get 2Mhz access on the areas below where the OS tends to put the screen, 1Mhz above, so I might as well be mindful of them.
Quote:
Oh, and another small point - you might find that the routine is more useful if it doesn't plot the final point in the line, since the end point of one line is most likely the start point of another.

Yep, yep. I'm an OpenGL programmer mostly (ES usually, since I now do iPhone full time), so completely au fait with the diamond exit rule. I just hadn't stopped to think how Bresenham maps onto that. I guess I'll need to be a bit careful in that if I'm always flipping coords so that x1 is to the left of x2 where the line is wider than it it tall then I may well be switching which is the last pixel before entering the substantial drawing.
Quote:
Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)!

I'm not 100% certain exactly what the plotting loop will look like yet. I was thinking I'd go for a separate blanking routine that is allowed to blank arbitrarily many pixels of the 3d area as long as it gets at least the ones on the line, which I think would allow me to divide the x resolution by 8 and write whole 0 bytes (or, rather, pairs of them owing to rounding) at a time. But I accept that with EORing a shape you have the obvious advantage that when you're drawing the next frame you can calculate one object, update that object, calculate the next object, etc, to give a better feeling of movement versus the limited expected framerate. I guess you could even update the player position every frame, update objects only on alternate frames, etc. Not sure which would look less disorientating.
Quote:
Another nice little 6502 trick; if you want to negate a value in A, you can do it a bit more quickly like this:

Oh, neat. It hadn't occurred to me that I know that carry isn't set from the comparison immediately above.
Quote:
If it's a full 3D game, then probably you will hold your vertex coordinates as 16-bit values anyway.

It's not really anything in particular, just a mess around, maybe get together some maths and plotting routines that I like. The same approach left me with a full 3d engine on a z80 machine (video below), but I've no actual aims and it isn't meant to be a port of that code.

That said, my prior approach (for an Elite-style view) was (1) map user into object space, determine face visibility; (2) from face visibility determine line visibility; (3) clip each line in 3d space; (4) project vertices and draw. Separate objects are much, much smaller than the space they inhabit, meaning that I can use optimised table-based multiplication (f(x) = (x^2)/4, unoriginally) most of the time and break out the full precision stuff very rarely.

EDIT: forgot the video, and it seems my memory is faulty if I believe that you can embed videos here: http://www.youtube.com/watch?v=j0xN_Mi3B_I

Author:  RichTW [ Wed Sep 16, 2009 9:48 pm ]
Post subject:  Re: Line drawing: the horror!

ThomasHarte wrote:
Quote:
Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back

Shouldn't both alternative be 3 cycles?
No, PLA is 4 cycles, so you save one cycle doing STA zp...LDA zp instead of PHA...PLA. And, as they say, it all adds up...

Quote:
Yep, yep. I'm an OpenGL programmer mostly (ES usually, since I now do iPhone full time), so completely au fait with the diamond exit rule. I just hadn't stopped to think how Bresenham maps onto that. I guess I'll need to be a bit careful in that if I'm always flipping coords so that x1 is to the left of x2 where the line is wider than it it tall then I may well be switching which is the last pixel before entering the substantial drawing.
Hmmm, just thinking about it a little bit more - there's no real analogue with lines, only with 2 dimensional shapes, so the best you can do is omit the final point, and then assume that lines are drawn to form a face with a consistent winding order. Of course, this means you have to be a little more clever if you swap the start and end points over.

Quote:
Quote:
Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)!

I'm not 100% certain exactly what the plotting loop will look like yet. I was thinking I'd go for a separate blanking routine that is allowed to blank arbitrarily many pixels of the 3d area as long as it gets at least the ones on the line, which I think would allow me to divide the x resolution by 8 and write whole 0 bytes (or, rather, pairs of them owing to rounding) at a time. But I accept that with EORing a shape you have the obvious advantage that when you're drawing the next frame you can calculate one object, update that object, calculate the next object, etc, to give a better feeling of movement versus the limited expected framerate. I guess you could even update the player position every frame, update objects only on alternate frames, etc. Not sure which would look less disorientating.
I think Elite actually updated its graphics line-at-a-time, rather than object-at-a-time, in order to reduce the flicker as much as possible. Since it doesn't do any kind of double-buffering, there has to be as little time as possible inbetween erasing the old line and plotting the new one. Probably when it plots the new line, it notes the coordinates so it can unplot it trivially the following frame. It may even plot the new one first, and then erase the old one! And of course EOR is the only way this can work without risking leaving little holes everywhere in your graphics!

Quote:
It's not really anything in particular, just a mess around, maybe get together some maths and plotting routines that I like. The same approach left me with a full 3d engine on a z80 machine (video below), but I've no actual aims and it isn't meant to be a port of that code.
Nice video! You must be the only person who ever bought a Sam Coupé :)

Quote:
That said, my prior approach (for an Elite-style view) was (1) map user into object space, determine face visibility; (2) from face visibility determine line visibility; (3) clip each line in 3d space; (4) project vertices and draw. Separate objects are much, much smaller than the space they inhabit, meaning that I can use optimised table-based multiplication (f(x) = (x^2)/4, unoriginally) most of the time and break out the full precision stuff very rarely.
Yes, I'd define the model vertices with 8-bit values - that's bound to be sufficient. You could then do much of the transform to screen space simply in 8-bit values too.

<thinking out loud>
If we have:
Code:
 x = point in local space
 x' = point in screen space
 LW = local to world transform
 CW = camera to world transform


then, the full transform from local to screen space is:
 x' = x . LW . CW^-1

Treating the transforms as a 3x3 unit matrix rotation, followed by a translation, we can write them as:

 x . LW = x . LWr + LWt
 x . CW = x . CWr + CWt

Expanding the inverse, we have:

 x . CW^-1 = (x - CWt) . CWr^-1
           = x . CWr^-1 - CWt . CWr^-1

so, expanding the full transform, we get:

 x' = (x . LWr + LWt) . (CWr + CWt)^-1
   = (x . LWr + LWt) . CWr^-1 - CWt . CWr^-1
   = x . (LWr . CWr^-1) + [(LWt - CWt) . CWr^-1]
          ^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^
         8-bit rotation     16-bit translation

where: CWr^-1 is simply the transpose of CWr
</thinking out loud>

Sorry if my maths has gone a bit squiffy somewhere, but I think that's right. Anyway, the point is that the rotation bit of the transform will always be values between -1 and 1, so you can do an 8-bit * 8-bit multiply and discard the LSB of the result. Then you just add this result to the 16-bit constant which represents the 'translation' part of the transform, and that's it... super-cheap 3d transformations, once you've calculated your rotation and translation transforms for each object.

Then you need to do the perspective divide, scissor to viewport, and the hidden line removal...hmmmm....

I wonder if it'd possible to write better code than Elite? Somehow I have the feeling that those guys just got it absolutely right first time - it's incredible really, because these days there's so much to read about 3d transformations and their ilk, but back then they were real pioneers - I doubt there was so much written on the subject.

Ok, I've written enough - it's 10.50 at night and my brain aches... cheerio!

Author:  ThomasHarte [ Thu Sep 17, 2009 1:15 pm ]
Post subject:  Re: Line drawing: the horror!

RichTW wrote:
Hmmm, just thinking about it a little bit more - there's no real analogue with lines, only with 2 dimensional shapes, so the best you can do is omit the final point, and then assume that lines are drawn to form a face with a consistent winding order. Of course, this means you have to be a little more clever if you swap the start and end points over.

Yeah, hairline lines are a major hassle. I know the OpenGL rule does exactly what you'd hope for and aims for lines that straddle polygons to, generally, have half their pixels in one and half in the other. And then the line never actually exits the diamond it comes to rest in, so the final pixel is omitted.

RichTW wrote:
I think Elite actually updated its graphics line-at-a-time, rather than object-at-a-time, in order to reduce the flicker as much as possible. Since it doesn't do any kind of double-buffering, there has to be as little time as possible inbetween erasing the old line and plotting the new one. Probably when it plots the new line, it notes the coordinates so it can unplot it trivially the following frame. It may even plot the new one first, and then erase the old one! And of course EOR is the only way this can work without risking leaving little holes everywhere in your graphics!

I was thinking of a small buffer for lines (allowing, say, 64/frame), the result of all 3d transforms queues up the new results and when the frame is sufficiently prepared, it waits for vsync and plots. Or probably I'd triple buffer the lines (since the description is very compact), to eliminate the vsync wait if I stick to full-frame at a time updates.

I guess if per-line updates is fine (and it'll probably have to be), I can maintain a simple line draw queue, have object updates push erase old line & draw new lines pair to the end of the queue, have the vsync routine process as much of the list that exists up to there. But I wonder if it'd end up looking odd.
RichTW wrote:
Nice video! You must be the only person who ever bought a Sam Coupé :)

Yes. An error, in retrospect. The screen start address is constant (ala the Spectrum), and the CPU takes something like 4 frames to scroll a screen of its own graphics. The CPU sounds like it'll be a lot faster than the Spectrum, but ends up not being because the 4bpp graphics fetch makes the memory contention much, much worse and the design means that all RAM is always contended. But at least they managed to introduce the wait cycles only when pixels are actually being read, unlike the geniuses behind the Electron. Though it's very easy to criticise 20+ years after the fact and with no idea of the original time/money/transistor budgets.
RichTW wrote:
<thinking out loud> ...

It ends up falling out like that without even really doing the prepatory thinking if you're lucky. If the camera and object have 3x3 orientation matrices and position vectors, obviously you need to compose those (well, the inverse of the former, as you say) to figure out what to do with the object geometry. And you end up with a 3x3 orientation matrix and position vector, with the 3x3 and the model geometry being in the range -1 to +1 and the position vector being where the stuff with a larger range naturally ends up. The small range multiplications and full range multiplications end up naturally separated.
RichTW wrote:
Then you need to do the perspective divide, scissor to viewport, and the hidden line removal...hmmmm....

On the Sam, at least, I scissor in 3d (including to z=1) then project. I couldn't think of an in-principle reason why it would be slower for a line that needs to be scissored and figured I'd at least be able to limit my divide routine to an 8bit output, as well as throwing away lines that are completely off screen with no projection (albeit that I could have done that anyway, regardless of where I clip).
RichTW wrote:
I wonder if it'd possible to write better code than Elite? Somehow I have the feeling that those guys just got it absolutely right first time - it's incredible really, because these days there's so much to read about 3d transformations and their ilk, but back then they were real pioneers - I doubt there was so much written on the subject.

I've always figured that, logically, someone should be able to do better now due to the huge range of literature and research that has occurred in the interim, but I am definitely not going to do better and the lack of any commercial reason for anyone to dedicate serious blocks of time to attempting to do so makes it improbable. To be honest, I'll be extremely happy if I'm able to come up with output that is a bit like Elite but only much, much worse.

For anyone that wants to do better, there's a paper on run-slice line drawing with no divide operation from about 1994 that I can't find a free download. And I think Wu did some post-Bresenham work on line drawing as a finite-state automata that's meant to cut out cycles, but I think it's a much more elaborate algorithm and may end up costing too much for a BBC/Electron in pure bytes. I need to look that up. But I guess nothing about 3d transforms has advanced.

Anyway, I'll keep at it, post something new if I ever write anything new. I'll definitely be a lot happier to mess around when I've got at least line plotting going.

Author:  ThomasHarte [ Thu Sep 17, 2009 8:44 pm ]
Post subject:  Re: Line drawing: the horror!

Essentially the same code as before, with all of your suggested optimisations rolled in:
Code:
.SetTable
   equb 128, 64, 32, 16, 8, 4, 2, 1

.ScreenAddressPointersLow
   equb (ScrBuffer MOD 256),      ((ScrBuffer+320) MOD 256),   ((ScrBuffer+640) MOD 256),   ((ScrBuffer+960) MOD 256)
   equb ((ScrBuffer+1280) MOD 256),   ((ScrBuffer+1600) MOD 256),   ((ScrBuffer+1920) MOD 256),   ((ScrBuffer+2240) MOD 256)
   equb ((ScrBuffer+2560) MOD 256),   ((ScrBuffer+2880) MOD 256),   ((ScrBuffer+3200) MOD 256),   ((ScrBuffer+3520) MOD 256)
   equb ((ScrBuffer+3840) MOD 256),   ((ScrBuffer+4160) MOD 256),   ((ScrBuffer+4480) MOD 256),   ((ScrBuffer+4800) MOD 256)
   equb ((ScrBuffer+5120) MOD 256),   ((ScrBuffer+5440) MOD 256),   ((ScrBuffer+5760) MOD 256),   ((ScrBuffer+6080) MOD 256)
   equb ((ScrBuffer+6400) MOD 256),   ((ScrBuffer+6720) MOD 256),   ((ScrBuffer+7040) MOD 256),   ((ScrBuffer+7360) MOD 256)
   equb ((ScrBuffer+7680) MOD 256),   ((ScrBuffer+8000) MOD 256),   ((ScrBuffer+8320) MOD 256),   ((ScrBuffer+8640) MOD 256)
   equb ((ScrBuffer+8960) MOD 256),   ((ScrBuffer+9280) MOD 256),   ((ScrBuffer+9600) MOD 256),   ((ScrBuffer+9920) MOD 256)

.ScreenAddressPointersHigh
   equb (ScrBuffer DIV 256),      ((ScrBuffer+320) DIV 256),   ((ScrBuffer+640) DIV 256),   ((ScrBuffer+960) DIV 256)
   equb ((ScrBuffer+1280) DIV 256),   ((ScrBuffer+1600) DIV 256),   ((ScrBuffer+1920) DIV 256),   ((ScrBuffer+2240) DIV 256)
   equb ((ScrBuffer+2560) DIV 256),   ((ScrBuffer+2880) DIV 256),   ((ScrBuffer+3200) DIV 256),   ((ScrBuffer+3520) DIV 256)
   equb ((ScrBuffer+3840) DIV 256),   ((ScrBuffer+4160) DIV 256),   ((ScrBuffer+4480) DIV 256),   ((ScrBuffer+4800) DIV 256)
   equb ((ScrBuffer+5120) DIV 256),   ((ScrBuffer+5440) DIV 256),   ((ScrBuffer+5760) DIV 256),   ((ScrBuffer+6080) DIV 256)
   equb ((ScrBuffer+6400) DIV 256),   ((ScrBuffer+6720) DIV 256),   ((ScrBuffer+7040) DIV 256),   ((ScrBuffer+7360) DIV 256)
   equb ((ScrBuffer+7680) DIV 256),   ((ScrBuffer+8000) DIV 256),   ((ScrBuffer+8320) DIV 256),   ((ScrBuffer+8640) DIV 256)
   equb ((ScrBuffer+8960) DIV 256),   ((ScrBuffer+9280) DIV 256),   ((ScrBuffer+9600) DIV 256),   ((ScrBuffer+9920) DIV 256)

.GetTLAddressForXY
{
   tya
   lsr a
   lsr a
   lsr a
   tay

   txa
   and #248

   clc
   adc ScreenAddressPointersLow, y
   sta address
   lda ScreenAddressPointersHigh, y
   adc #0
   sta address+1

   rts
}

.DrawLine
{
   ; plots a line from (x1, y1) to (x2, y2)
   ; clobbers everything

.xplyplxmajor
   ; loop for x increasing, x difference greater than y difference

   ldx x1
   ldy y1
   jsr GetTLAddressForXY

   sec
   lda y2
   sbc y1
   bcc ynegative   ; reprogram loop depending on whether deltay is negative

      sta y2      ; y2 = deltay

      ldx #&c8
      stx yinc   ; set yinc to be iny

      ldx #64
      stx yincadc1+1
      ldx #1
      stx yincadc2+1   ; add 320 to the base address each time an 8x8 is stepped out of in the y direction

      ldx #8
      stx ycomp+1   ; just stepped outside 8x8 block on a y increment if y is now 8

   jmp yend

.ynegative
      eor #255   ; negate deltay, since it needs to be an absolute value for the Bresenham loop
      adc #1      ; carry is clear from the calculation of deltay, sothis is safe
      sta y2      ; y2 = deltay

      ldx #&88
      stx yinc   ; set yinc to be dey

      ldx #&c0
      stx yincadc1+1
      ldx #&fe
      stx yincadc2+1   ; add -320 to the base address each time an 8x8 is stepped out of in the y direction

      ldx #255
      stx ycomp+1   ; just stepped outside 8x8 block on a y decrement if y is now 255
.yend

   sec
   lda x2
   sbc x1
   sta x2      ; x2 = deltax

   lsr a      ; temp0 = error
   sta temp0

   lda y1      ; perform y = y1&7, set initial pixel mask
   and #7
   tay

   lda x1      ; set initial pixel mask, using a quick table lookup
   and #7
   tax
   lda SetTable, x
   sta pixelmask+1

   ldx x2      ; x1 = pixel count
   stx x1

   ldx #0      ; seed current pixel mask to 0

.pixelloopx0
   ; add current pixel to pixel mask
      txa
.pixelmask   eor #128
      tax

   ; subtract deltay from error
   sec
   lda temp0
   sbc y2
   bcs noyinc   ; don't increment y if error >= 0

      adc x2      ; add deltax to error - carry is clear, or we wouldn't even be here

      sta temp1   ; save a for later

      txa         ; store pixel mask to memory, reset
      eor (address), y
      sta (address), y
      ldx #0

.yinc      iny      ; increment[/decrement] y
      tya
.ycomp      cmp #8      ; compare to 8[/255] to see if the y step has exited the block
      bne noyoverflow   ; if y hasn't just jumped beyond its current 8x8 block then everything is fine

         and #7
         tay

         clc      ; if it has then add[/subtract] 320 to the storage address
         lda address
.yincadc1      adc #64
         sta address
         lda address+1
.yincadc2      adc #1
         sta address+1

   .noyoverflow
      lda temp1

.noyinc
   sta temp0

   lsr pixelmask+1   ; shift the pixel mask right one, so that the next plot will be to the next pixel along
   bcc noxoverflow   ; if x has not just crossed an 8 pixel boundary then there's no need to update the address

      txa   ; store pixelbyte to memory, reset
      eor (address), y
      sta (address), y
      ldx #0

      lda #128   ; reset pixel mask
      sta pixelmask+1

      lda address   ; add 8 to screen address
      adc #7      ; carry is always set, so this is equivalent to add 8 (kudos to RichTW for an optimisation I would not have spotted)
      sta address
      bcc noxoverflow
      inc address+1

.noxoverflow
   dec x1      ; decrease x count, go again if necessary
   bne pixelloopx0

   txa   ; store pixelbyte to memory, reset
   eor (address), y
   sta (address), y

   rts
}


Still no y major code, no actual check that x2 > x1.

Author:  RichTW [ Fri Sep 18, 2009 7:17 am ]
Post subject:  Re: Line drawing: the horror!

I don't have time to reply in full right now, but do realise that in BeebAsm you can write stuff like this:
Code:
.ScreenAddressPointersLow
FOR n, 0, 31
EQUB (&5800 + n * 320) AND 255
NEXT

.ScreenAddressPointersHigh
FOR n, 0, 31
EQUB (&5800 + n * 320) DIV 256
NEXT

Definitely a time-saver, and rather clearer too! (You can also use functions like SIN and COS to build the appropriate tables).

Author:  RichTW [ Fri Sep 18, 2009 8:54 am ]
Post subject:  Re: Line drawing: the horror!

I did have another question though - on the SAM version, how many bits precision were you using for your rotation matrix elements? It seems to me there are a few choices:

  • If you want each value to fit in a byte, and you want to be able to represent -1 and +1 exactly, then you have to choose 6 fractional bits and the other two become the integer portion. This allows you to actually represent -2.0...1.984, but of course you restrict the range from -1.0...1.0. Then you have to take bits 6-13 of the result. This doesn't sound like a lot of precision though - 6 bits for the matrix element? That's rubbish! Perhaps I'm spoiled with my Wii graphics hardware...

  • If you don't care about being able to represent 1.0 exactly, you can choose 7 fractional bits and 1 sign bit, which gives you an effective range of -1.0...0.992, and you then take bits 7-14 of the result. My instinct though is that this could cause appreciable errors, particularly when multiplying many sines/cosines to create the final transform.

    (Edit: to clarify, I guess there are two ways to get round this:
    • The first is to multiply all your sine/cosine values by 127, but then in the multiplication to effectively divide the result by 128 when you extract the result. This way, every multiplication loses some precision in the result.

    • The better way I guess is to fudge the entry in the sine table so that just sin(90º) yields a value slightly less than 1, which will fit ok into the 8 bits. The question is whether the slight discontinuity would be jarring when the one of the transform components passes through 90º?)

  • Then I got to thinking about using a 10-bit value for transform values, essentially the first byte is the 8-bit two's complement fractional value, and then the second byte has only three valid values:

    • If it's 0, then the first byte represents a regular positive number
    • If it's 1, then this is how we represent 1.0 - therefore the first byte is simply 0.
    • If it's 255, then the first byte represents a negative number in two's complement. In addition, if the first byte is 0, then this is our representation of -1.0

    What I figured was that the multiply routine would make a special case of inputs with a low byte of 0, and return the appropriate value (either val * 0, val * 1, val * -1; or if both zero, 0, 1 or -1). By checking the two signs of the high bytes, you would jump to slightly different multiply routines which would calculate different variations of f(a+b)-f(a-b), e.g. if a is negative and b is positive, you would calculate f(abs(a)-b)-f(abs(a)+b), or in other words, f(-a-b)-f(b-a). This is quicker than "test input signs/negate inputs" at the top, and then "test input signs/negate result" at the bottom. (Perhaps there's a better way which involves fixing the table somehow, but I haven't yet figured it out... my experience so far has shown this routine to work only with unsigned inputs).

    Of course, with this method you only need to keep the top 8 bits of the multiply result (no shifting required), and maintain this basic high byte (0, 1 or 255). Then adding/subtracting these transforms (as has to be done in matrix construction / multiplies) is a simple 16-bit operation.

    Anyway, perhaps all this is overkill, and it's sufficient to approximate 1.0 to a slightly lower value... I'd be interested to know what you did in the Z80 code...

Author:  ThomasHarte [ Sat Sep 19, 2009 12:03 am ]
Post subject:  Re: Line drawing: the horror!

I'll respond to the multiplication stuff in the dedicated thread. For now, here's my current, complete exclusive-or Bresenham for 256x256 portions of 1bpp screens. No doubt further cycles could be shaved. Would probably need to take a step back. Or, if anyone else can spot anything then I'd be appreciative.

The only 'new' things of interest are that I've implemented the y-major part of the routine, which works from bottom to top (because dey will set a flag when it exits an 8x8 flag going upward, but won't when going downwards), and am writing deltax and deltay directly into the inner loops rather than storing them on the zero page. Which is a couple more cycles saved per pixel. On the Electron in Mode 4 there are 64 useful cycles/scanline, so that's like saving almost 8 scanlines of processing on the longest possible line... definitely worth trying to squeeze everything out of this code.
Code:
.ScreenAddressPointersLow
   FOR n, 0, 31
   EQUB (ScrBuffer + n * 320) MOD 256
   NEXT

.ScreenAddressPointersHigh
   FOR n, 0, 31
   EQUB (ScrBuffer + n * 320) DIV 256
   NEXT

.SetTable
   equb 128, 64, 32, 16, 8, 4, 2, 1

.GetTLAddressForXY
{
   tya
   lsr a
   lsr a
   lsr a
   tay

   txa
   and #248

   clc
   adc ScreenAddressPointersLow, y
   sta address
   lda ScreenAddressPointersHigh, y
   adc #0
   sta address+1

   rts
}

.DrawLine
{
   ; plots a line from (x1, y1) to (x2, y2)
   ; clobbers everything

   ; decide whether x major or y major

   lda #0
   sta temp2
   sta temp3      ; temp2 will be the x negative flag, temp3 will be the y negative flag; assume both positive

   sec
   lda y2
   sbc y1
   bcs ypositive

      eor #255   ; negate deltay
      adc #1      ; carry is clear from the calculation of deltay, so this is safe
      sta temp3   ; store deltay in temp3 so that something non-zero is there
.ypositive

   sta temp1      ; store deltay in temp1

   sec
   lda x2
   sbc x1
   bcs xpositive

      eor #255   ; negate deltax
      adc #1
      sta temp2

.xpositive
   sta temp0      ; store deltax in temp0
   cmp temp1      ; compare to deltay

   bcs xmajor      ; if deltax was larger, jump into the xmajor loop
   jmp ymajor      ; otherwise, head on over to the ymajor



.xmajor            ; temp0 = abs(deltax), temp1 = abs(deltay), temp2 is non-zero if x direction is negative, temp3 is non-zero if y direction is negative
{
   ; if here then x difference greater than y difference

   lda temp2      ; copy (x2, y2) to (x1, y1) if this line goes from right to left, to ensure that the loop need only ever go left to right
   beq noswap

      ldx x2   ; copy x2 to x1
      stx x1

      ldy y2   ; copy y2 to y1
      sty y1

      lda temp3   ; if y was flagged negative, flag it positive - and vice versa
      bne setypos

      inc temp3
      jmp getaddr

.setypos
      lda #0
      sta temp3

.getaddr

   jsr GetTLAddressForXY   ; this is very slightly faster than just rolling over into the code immediately below
   jmp addressgot

.noswap
   ldx x1
   ldy y1
   jsr GetTLAddressForXY

.addressgot

   lda temp3
   bne ynegative   ; reprogram loop depending on whether deltay is negative

      ldx #&c8
      stx yinc   ; set yinc to be iny

      ldx #64
      stx yincadc1+1
      ldx #1
      stx yincadc2+1   ; add 320 to the base address each time an 8x8 is stepped out of in the y direction

      ldx #8
      stx ycomp+1   ; just stepped outside 8x8 block on a y increment if y is now 8

   jmp yend

.ynegative

      ldx #&88
      stx yinc   ; set yinc to be dey

      ldx #&c0
      stx yincadc1+1
      ldx #&fe
      stx yincadc2+1   ; add -320 to the base address each time an 8x8 is stepped out of in the y direction

      ldx #255
      stx ycomp+1   ; just stepped outside 8x8 block on a y decrement if y is now 255
.yend

   lda y1      ; y = y1&7
   and #7
   tay

   lda x1      ; set initial pixel mask, using a quick table lookup
   and #7
   tax
   lda SetTable, x
   sta pixelmask+1

   lda temp0
   sta adcdeltax+1   ; fill in deltax
   inc temp0      ; temp0 = pixel count

   lsr a
   sta temp2   ; temp2 = error

   lda temp1
   sta sbcdeltay+1      ; fill in deltay

   ldx #0      ; seed current pixel mask to 0

.pixelloop
   ; add current pixel to pixel mask
      txa
.pixelmask   ora #128
      tax

   ; subtract deltay from error
   sec
   lda temp2
.sbcdeltay   sbc #0   ; will be dynamically written in
   bcs noyinc   ; don't increment y if error >= 0

.adcdeltax   adc #0      ; add deltax to error - carry is clear, or we wouldn't even be here

      sta temp1   ; save a for later

      txa         ; store pixel mask to memory, reset
      eor (address), y
      sta (address), y
      ldx #0

.yinc      iny      ; increment[/decrement] y
      tya
.ycomp      cmp #8      ; compare to 8[/255] to see if the y step has exited the block
      bne noyoverflow   ; if y hasn't just jumped beyond its current 8x8 block then everything is fine

         and #7
         tay

         clc      ; if it has then add[/subtract] 320 to the storage address
         lda address
.yincadc1      adc #64
         sta address
         lda address+1
.yincadc2      adc #1
         sta address+1

   .noyoverflow
      lda temp1

.noyinc
   sta temp2

   lsr pixelmask+1   ; shift the pixel mask right one, so that the next plot will be to the next pixel along
   bcc noxoverflow   ; if x has not just crossed an 8 pixel boundary then there's no need to update the address

      txa   ; store pixelbyte to memory, reset
      eor (address), y
      sta (address), y
      ldx #0

      lda #128   ; reset pixel mask
      sta pixelmask+1

      lda address   ; add 8 to screen address
      adc #7      ; carry is always set, so this is equivalent to add 8
      sta address
      bcc noxoverflow
      inc address+1

.noxoverflow
   dec temp0      ; decrease x count, go again if necessary
   bne pixelloop

   txa   ; store pixelbyte to memory, reset
   eor (address), y
   sta (address), y

   rts
}


.ymajor      ; temp0 = abs(deltax), temp1 = abs(deltay), temp2 is non-zero if x direction is negative, temp3 is non-zero if y direction is negative
{

   lda temp3      ; copy (x2, y2) to (x1, y1) if this line goes from top to bottom, to ensure that the loop need only ever go bottom to top (yes, backwards)
   bne noswap
   
      ldx x2      ; copy x2 to x1
      stx x1

      ldy y2      ; copy y2 to y1
      sty y1

      lda temp2   ; if x was flagged negative, flag it positive - and vice versa
      bne setxpos

      inc temp2
      jmp getaddr

.setxpos
      lda #0
      sta temp2

.getaddr

   jsr GetTLAddressForXY   ; this is very slightly faster than just rolling over into the code immediately below
   jmp addressgot
   
.noswap
   ldx x1
   ldy y1
   jsr GetTLAddressForXY

.addressgot

   lda temp2
   bne xnegative   ; reprogram loop depending on whether deltax is negative
   
      lda #7
      sta xadc+1         ; add 7(+1 = 8) to low byte to move across a column
      
      lda #&e6
      sta xadc2         ; inc top byte...
      
      lda #&90
      sta xbcc         ; ...if carry

      lda #&4e
      sta pixelshift   ; lsr to move from one pixel to the next

      lda #128
      sta pixelreload+1

   jmp xend
   
.xnegative

      lda #&f7
      sta xadc+1         ; subtract 9(-1 = 8) to low byte to move across a column

      lda #&c6
      sta xadc2         ; dec top byte...

      lda #&b0
      sta xbcc         ; ...if not carry

      lda #&0e
      sta pixelshift   ; asl to move from one pixel to the next

      lda #1
      sta pixelreload+1
.xend

   lda y1      ; y = y1&7
   and #7
   tay

   lda x1      ; set initial pixel mask, using a quick table lookup
   and #7
   tax
   lda SetTable, x
   sta pixelmask+1

   lda temp0
   sta subdeltax+1      ; write in deltax

   lda temp1
   sta adddeltay+1      ; write in deltay
   inc temp1         ; temp1 = pixel count

   lsr a
   tax   ; x = error

.pixelloop
   ; add a pixel to the display
         lda (address),y
.pixelmask   eor #128
         sta (address),y

   ; subtract deltax from error
   sec
   txa
.subdeltax   sbc #0
   bcs noxinc   ; don't increment x if error >= 0

.adddeltay   adc #0      ; add deltay to error - carry is clear, or we wouldn't even be here

.pixelshift   asl pixelmask+1
      bcc noxinc

      tax   ; save a for later

.pixelreload
      lda #128
      sta pixelmask+1

      ; carry is definitely set - following will be reprogrammed to add or subtract 8
      lda address
.xadc   adc #7
      sta address
.xbcc   bcc noad1inc
.xadc2   inc address+1
      
   .noad1inc

      txa

.noxinc
   tax

   dey
   bpl noyoverflow

      clc            ;add -320 to move up one line
      lda address
      adc #&c0
      sta address
      lda address+1
      adc #&fe
      sta address+1

      ldy #7

.noyoverflow
   dec temp1
   bne pixelloop

.drawline
   rts
}
}

Page 1 of 1 All times are UTC [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/