| www.retrosoftware.co.uk http://www.retrosoftware.co.uk/forum/ |
|
| Line drawing: the horror! http://www.retrosoftware.co.uk/forum/viewtopic.php?f=73&t=344 |
Page 1 of 1 |
| Author: | ThomasHarte [ Tue Sep 15, 2009 9:58 pm ] |
| Post subject: | Line drawing: the horror! |
Please excuse my brevity; the forum's auto-logout cost me the original version of this post... the title refers to the fun of dealing with a non-linear framebuffer, of course! I'm experimenting with vector graphics. My primary target machine is the Electron. My plan is currently to use a 256x256 window on the lefthand side of a standard Mode 4 (no CRTC = no real control here, alas) screen. My screen start address will always be 256-byte aligned, though I can't recall if the code I'm about to post still relies on that assumption. All clipping will occur before drawing, so none of the drawing parts need to worry about clipping and all coordinates can be dealt with safely as 8bit numbers. Today's fun is a line plotter. It's Bresenham. So far I've only bothered to implement the case where the x delta is greater than the y delta and both x and y increase from the first specified coordinate to the second. I've written the following (BeebAsm-type) code: Code: .SetTable equb 128, 64, 32, 16, 8, 4, 2, 1 .GetTLAddressForXY { lda #248 equb $8f ; equw temp1 ; "sax temp1", so temp1 = x&~7 (x offset to 8x8 block) lda #0 sta temp2 tya and #248 ; temp2:a = (y >> 3)*8 pha pla asl a rol temp2 asl a rol temp2 asl a rol temp2 ; temp2:a = (y >> 3)*64 sta temp3 lda temp2 sta temp4 lda temp3 ; temp2:a copied to temp4:temp3 asl a rol temp2 asl a rol temp2 ; temp2:a = (y >> 3)*256 adc temp3 sta address ; add low byte of (y >> 3)*64 to low byte of (y >> 3)*256, store as low byte of address to modify lda temp2 adc temp4 adc #(ScrBuffer DIV 256) sta address+1 ; add high byte of (y >> 3)*64 to high byte of (y >> 3)*256 and screen buffer start address ; store as high byte of address to modify rts } .DrawLine { ; plots a line from (x1, y1) to (x2, y2) ; clobbers everything ; (0, 100) to (100, 150) .xplyplxmajor ; loop for x increasing, y increasing, x difference greater than y difference ldx x1 ldy y1 jsr GetTLAddressForXY sec lda y2 sbc y1 sta y2 ; y2 = deltay sec lda x2 sbc x1 sta x2 ; x2 = deltax lsr a ; a = error pha ; perform y = y1&7, x1 &= 7 lda y1 and #7 tay lda x1 and #7 sta x1 tax lda SetTable, x sta pixelmask+1 pla ldx x2 ; x = x2 = pixel count .pixelloopx0 ; plot current pixel pha lda (address),y .pixelmask ora #128 sta (address),y pla ; subtract deltay from error sec sbc y2 bcs noyinc ; don't increment y if error >= 0 clc adc x2 ; add deltax to error pha iny ; increment y tya and #7 bne noyoverflow ; if y hasn't just hit 8 then everything is fine tay ; if it has then add 320 to the storage address clc lda address adc #64 sta address lda address+1 adc #1 sta address+1 .noyoverflow pla .noyinc inc x1 ; increment x lsr pixelmask+1 ; shift the pixel mask right one, so that the next plot will be to the next pixel along pha lda x1 ; check if x just crossed an 8 pixel boundary and #7 bne noxoverflow sta x1 ; reset pixel mask to 128 if so, and add 8 to the current screen address lda #128 sta pixelmask+1 clc lda address adc #8 sta address lda address+1 adc #0 sta address+1 .noxoverflow pla dex ; decrease x count, go again if necessary bne pixelloopx0 rts } GetTLAddressForXY is a hasty chop up of my individual pixel plotter. It may still contain some redundancy in this version, since here it is used only to get the address at the top of the 8-byte column of pixels. My plan is to sort co-ordinates so that I'm always drawing in increasing x, and to hastily dynamically reprogram to deal with decreasing y. Then the y major greater than x major version of the code will be a recut of this stuff. Before I do that, I thought I'd post it here for comment, since I figure probably there are a bunch of clever optimisations I am missing or maybe I've just taken the wrong approach in some little bits. I know that a Bresenham plotter is not exactly a complicated piece of code, but I'm not exactly overly experienced with 6502 programming, so it's possible that I just have some bad habits. Anyone? Any thoughts? |
|
| Author: | ThomasHarte [ Tue Sep 15, 2009 11:42 pm ] |
| Post subject: | Re: Line drawing: the horror! |
Definitely going to bed now, but I thought I'd post my quickly modified version to do left-to-right, deltax > deltay but with deltay either positive or negative. My current plan is to look at it again later in the week, see if I can optimise it a little more (still without spending too much memory), add the deltay > deltax equivalent, write a clearing version (which returns at least the pixels along a line to black and is allowed to reset any other pixels in addition) and put them in the wiki in case they're useful for anyone else and in the hope that if anyone else has any smart optimisation ideas or even just better overall code then they'll turf mine out. Code: .DrawLine { ; plots a line from (x1, y1) to (x2, y2) ; clobbers everything .xplyplxmajor ; loop for x increasing, y increasing, x difference greater than y difference ldx x1 ldy y1 jsr GetTLAddressForXY sec lda y2 sbc y1 sta y2 ; y2 = deltay bcc ynegative ; dynamically reprogram loop depending on whether deltay is negative ldx #&c8 stx yinc ; set yinc to be in ldx #64 stx yincadc1+1 ldx #1 stx yincadc2+1 ; add 320 to the base address each time an 8x8 is stepped out of in the y direction ldx #&29 stx ycomp ; set the potential cmp to 7 to be a duplicate of the preceding or jmp yend .ynegative sec ; negate deltay, since it needs to be an absolute value for the Bresenham loop lda #0 sbc y2 sta y2 ldx #&88 stx yinc ; set yin to be dey ldx #&c0 stx yincadc1+1 ldx #&fe stx yincadc2+1 ; add -320 to the base address each time an 8x8 is stepped out of in the y direction ldx #&c9 stx ycomp ; set the potential cmp to 7 to be an actual cmp - 8x8 was left just now if modified y is 7 .yend sec lda x2 sbc x1 sta x2 ; x2 = deltax lsr a ; a = error pha ; perform y = y1&7, x1 &= 7 lda y1 and #7 tay lda x1 and #7 sta x1 tax lda SetTable, x sta pixelmask+1 pla ldx x2 ; x = x2 = pixel count .pixelloopx0 ; plot current pixel pha lda (address),y .pixelmask ora #128 sta (address),y pla ; subtract deltay from error sec sbc y2 bcs noyinc ; don't increment y if error >= 0 clc adc x2 ; add deltax to error pha .yinc dey ; increment[/decrement] y tya and #7 .ycomp cmp #7 bne noyoverflow ; if y hasn't just hit 8 then everything is fine tay ; if it has then add 320 to the storage address clc lda address .yincadc1 adc #64 sta address lda address+1 .yincadc2 adc #1 sta address+1 .noyoverflow pla .noyinc inc x1 ; increment x lsr pixelmask+1 ; shift the pixel mask right one, so that the next plot will be to the next pixel along pha lda x1 ; check if x just crossed an 8 pixel boundary and #7 bne noxoverflow sta x1 ; reset pixel mask to 128 if so, and add 8 to the current screen address lda #128 sta pixelmask+1 clc lda address adc #8 sta address lda address+1 adc #0 sta address+1 .noxoverflow pla dex ; decrease x count, go again if necessary bne pixelloopx0 rts } EDIT: in case anyone is wondering about some of the weird comments about yin, etc (hopefully fixed now, but maybe I missed some), it's that I hadn't yet disabled the new silent autocorrect in OS X's TextEdit. |
|
| Author: | RichTW [ Wed Sep 16, 2009 9:17 am ] |
| Post subject: | Re: Line drawing: the horror! |
Hi Thomas, Yeah, looks pretty good! But a few comments (on your first piece of code)... ThomasHarte wrote: Code: .SetTable equb 128, 64, 32, 16, 8, 4, 2, 1 .GetTLAddressForXY { lda #248 equb $8f ; equw temp1 ; "sax temp1", so temp1 = x&~7 (x offset to 8x8 block) Quote: Code: lda #0 sta temp2 tya and #248 ; temp2:a = (y >> 3)*8 pha pla asl a rol temp2 asl a rol temp2 asl a rol temp2 ; temp2:a = (y >> 3)*64 sta temp3 lda temp2 sta temp4 lda temp3 ; temp2:a copied to temp4:temp3 asl a rol temp2 asl a rol temp2 ; temp2:a = (y >> 3)*256 adc temp3 sta address ; add low byte of (y >> 3)*64 to low byte of (y >> 3)*256, store as low byte of address to modify lda temp2 adc temp4 adc #(ScrBuffer DIV 256) sta address+1 ; add high byte of (y >> 3)*64 to high byte of (y >> 3)*256 and screen buffer start address ; store as high byte of address to modify rts } Code: tya lsr a lsr a lsr a tax lda scraddrlo,x sta address lda scraddrhi,x sta address+1 Quote: Code: ; subtract deltay from error sec sbc y2 bcs noyinc ; don't increment y if error >= 0 clc Quote: Code: lsr pixelmask+1 ; shift the pixel mask right one, so that the next plot will be to the next pixel along pha lda x1 ; check if x just crossed an 8 pixel boundary and #7 bne noxoverflow Quote: Code: sta x1 ; reset pixel mask to 128 if so, and add 8 to the current screen address lda #128 sta pixelmask+1 clc lda address adc #8 sta address lda address+1 adc #0 sta address+1 Code: clc lda address adc #8 sta address bcc nocarry inc address+1 .nocarry You don't actually need the clc either. You know carry will always be set following the lsr pixelmask+1 (it will have shifted out a 1), so omit the clc, and adc #7 instead. Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back - or even just writing: Code: sta preserveme+1 ... ... ... .preserveme lda #0 I think that ensuring that you always plot from left to right (by ensuring startx <= endx) is a good plan, and then self-modifying the code to inc or dec in y. Another possible optimisation in the routine where dx > dy might be to only do the lda/ora/sta to screen memory when you are moving to the next byte, and meanwhile to build up an image of the entire byte to be ORed separately (because (indirect,Y) addressing is slower, especially for sta). Of course, if you were targeting the Beeb, you'd save a lot by shrinking the screen to 32 columns wide, thereby giving you 256 byte wide screen rows One thing you should consider right from the start, by the way, is clipping - assuming that you might sometimes need to render lines which clip off the side of the screen. If it's a full 3D game, then probably you will hold your vertex coordinates as 16-bit values anyway. You then have two choices as to how to plot arbritrarily clipped lines: * Pass 16-bit values to your line plotter, and let it determine whether the line is totally offscreen (for a quick reject), otherwise run the Bresenham algorithm in 16-bit, and skip plotting for any coordinates where the x or y are out of range. This is a robust method, but is computationally wasteful if you know that everything you plot is going to be within screen bounds (if, for example, a simple bounding box check has shown all the coordinates to be in range). * Do your clipping before you call your line plotter, and feed it in-range 8-bit coordinates, as you are now. This will involve you probably needing to do some division (you could do repeated subtraction, like Bresenham does, but this will be slower), something like: Code: if startx < 0: ...and so on.startx_clipped = 0 starty_clipped = starty - (dy/dx) * startx if endx > 255: endx_clipped = 255 endy_clipped = endy - (dy/dx) * (endx - 255) Hope there's something of use there for you (Edit: just read your caveat about clipping in the 2nd paragraph - looks like you've already got this one wrapped up then!) |
|
| Author: | RichTW [ Wed Sep 16, 2009 9:25 am ] |
| Post subject: | Re: Line drawing: the horror! |
Oh, and another small point - you might find that the routine is more useful if it doesn't plot the final point in the line, since the end point of one line is most likely the start point of another. Even the graphics hardware of games consoles tends to make this same optimisation, and not plot the bottom or right hand edges of triangles, simply because those pixels are likely to be part of an adjoining triangle (and this would leave ugly overprinting on the edges if you a plotting a mesh with some kind of transparency). Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)! |
|
| Author: | RichTW [ Wed Sep 16, 2009 12:55 pm ] |
| Post subject: | Re: Line drawing: the horror! |
ThomasHarte wrote: Code: sec lda y2 sbc y1 sta y2 ; y2 = deltay bcc ynegative ; dynamically reprogram loop depending on whether deltay is negative ..... ..... .ynegative sec ; negate deltay, since it needs to be an absolute value for the Bresenham loop lda #0 sbc y2 sta y2 Code: sec
lda y1 sbc y2 bcc ynegative sta y2 ..... ..... .ynegative eor #255 adc #1 sta y2 |
|
| Author: | ThomasHarte [ Wed Sep 16, 2009 2:16 pm ] |
| Post subject: | Re: Line drawing: the horror! |
Have suddenly realised I'm busy for most of the rest of the week, but I do intend to return with full code for this tiny segment of my stuff as soon as it is ready. In the meantime: Quote: This won't run on a Master, so probably best avoided. The program won't run on any BBC, so I'm not bothered by that. It's no more Electron-specific than the code that disables all uninteresting interrupts, for example (as I don't think the OS has an entry point for that sort of stuff?), and the other code that subsequently reads the keyboard. So I just need to remember to flag this stuff up so that I can find it later should I want to do a BBC port. Quote: I know this routine is only called once, but if you wanted to speed it up a bit more, I think the best way is from 2 32 byte lookup tables which return the LSB and MSB of the screen address of each row Smart move. And it's used much more than it looks, being essentially a copy and paste from my pixel plotter. And 32 bytes isn't exactly a bother, even if it actually were 32 bytes and not 32 bytes – (saved instructions * 2). Quote: You can check whether it has crossed a boundary simply by testing the zero flag after the lsr pixelmask+1 (it will be zero if you have shifted the last bit out). Neat spot! I will vainly pretend that it would have occurred to me. Quote: Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back Shouldn't both alternative be 3 cycles? Quote: Another possible optimisation in the routine where dx > dy might be to only do the lda/ora/sta to screen memory when you are moving to the next byte Oh, yes. I've just switched back to the 6502 from the z80 and haven't fully mentally adjusted. I don't think my thinking went beyond "can't keep it in a register, will have to leave it in memory", never got to "should keep it in faster memory, move it out to slower later". Which I guess doesn't just refer to addressing mode access speeds as any users with a turbo board get 2Mhz access on the areas below where the OS tends to put the screen, 1Mhz above, so I might as well be mindful of them. Quote: Oh, and another small point - you might find that the routine is more useful if it doesn't plot the final point in the line, since the end point of one line is most likely the start point of another. Yep, yep. I'm an OpenGL programmer mostly (ES usually, since I now do iPhone full time), so completely au fait with the diamond exit rule. I just hadn't stopped to think how Bresenham maps onto that. I guess I'll need to be a bit careful in that if I'm always flipping coords so that x1 is to the left of x2 where the line is wider than it it tall then I may well be switching which is the last pixel before entering the substantial drawing. Quote: Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)! I'm not 100% certain exactly what the plotting loop will look like yet. I was thinking I'd go for a separate blanking routine that is allowed to blank arbitrarily many pixels of the 3d area as long as it gets at least the ones on the line, which I think would allow me to divide the x resolution by 8 and write whole 0 bytes (or, rather, pairs of them owing to rounding) at a time. But I accept that with EORing a shape you have the obvious advantage that when you're drawing the next frame you can calculate one object, update that object, calculate the next object, etc, to give a better feeling of movement versus the limited expected framerate. I guess you could even update the player position every frame, update objects only on alternate frames, etc. Not sure which would look less disorientating. Quote: Another nice little 6502 trick; if you want to negate a value in A, you can do it a bit more quickly like this: Oh, neat. It hadn't occurred to me that I know that carry isn't set from the comparison immediately above. Quote: If it's a full 3D game, then probably you will hold your vertex coordinates as 16-bit values anyway. It's not really anything in particular, just a mess around, maybe get together some maths and plotting routines that I like. The same approach left me with a full 3d engine on a z80 machine (video below), but I've no actual aims and it isn't meant to be a port of that code. That said, my prior approach (for an Elite-style view) was (1) map user into object space, determine face visibility; (2) from face visibility determine line visibility; (3) clip each line in 3d space; (4) project vertices and draw. Separate objects are much, much smaller than the space they inhabit, meaning that I can use optimised table-based multiplication (f(x) = (x^2)/4, unoriginally) most of the time and break out the full precision stuff very rarely. EDIT: forgot the video, and it seems my memory is faulty if I believe that you can embed videos here: http://www.youtube.com/watch?v=j0xN_Mi3B_I |
|
| Author: | RichTW [ Wed Sep 16, 2009 9:48 pm ] |
| Post subject: | Re: Line drawing: the horror! |
ThomasHarte wrote: Quote: Using pha...pla to preserve A around a block of code is not optimal; you get slightly better performance from storing to a zp location and loading it back Shouldn't both alternative be 3 cycles? Quote: Yep, yep. I'm an OpenGL programmer mostly (ES usually, since I now do iPhone full time), so completely au fait with the diamond exit rule. I just hadn't stopped to think how Bresenham maps onto that. I guess I'll need to be a bit careful in that if I'm always flipping coords so that x1 is to the left of x2 where the line is wider than it it tall then I may well be switching which is the last pixel before entering the substantial drawing. Hmmm, just thinking about it a little bit more - there's no real analogue with lines, only with 2 dimensional shapes, so the best you can do is omit the final point, and then assume that lines are drawn to form a face with a consistent winding order. Of course, this means you have to be a little more clever if you swap the start and end points over.Quote: Quote: Doing this, you might even be able to use EOR plotting and use the same routine to plot and unplot (like Elite)! I'm not 100% certain exactly what the plotting loop will look like yet. I was thinking I'd go for a separate blanking routine that is allowed to blank arbitrarily many pixels of the 3d area as long as it gets at least the ones on the line, which I think would allow me to divide the x resolution by 8 and write whole 0 bytes (or, rather, pairs of them owing to rounding) at a time. But I accept that with EORing a shape you have the obvious advantage that when you're drawing the next frame you can calculate one object, update that object, calculate the next object, etc, to give a better feeling of movement versus the limited expected framerate. I guess you could even update the player position every frame, update objects only on alternate frames, etc. Not sure which would look less disorientating. Quote: It's not really anything in particular, just a mess around, maybe get together some maths and plotting routines that I like. The same approach left me with a full 3d engine on a z80 machine (video below), but I've no actual aims and it isn't meant to be a port of that code. Nice video! You must be the only person who ever bought a Sam Coupé Quote: That said, my prior approach (for an Elite-style view) was (1) map user into object space, determine face visibility; (2) from face visibility determine line visibility; (3) clip each line in 3d space; (4) project vertices and draw. Separate objects are much, much smaller than the space they inhabit, meaning that I can use optimised table-based multiplication (f(x) = (x^2)/4, unoriginally) most of the time and break out the full precision stuff very rarely. Yes, I'd define the model vertices with 8-bit values - that's bound to be sufficient. You could then do much of the transform to screen space simply in 8-bit values too.<thinking out loud> If we have: Code: x = point in local space </thinking out loud>x' = point in screen space LW = local to world transform CW = camera to world transform then, the full transform from local to screen space is: x' = x . LW . CW^-1 Treating the transforms as a 3x3 unit matrix rotation, followed by a translation, we can write them as: x . LW = x . LWr + LWt x . CW = x . CWr + CWt Expanding the inverse, we have: x . CW^-1 = (x - CWt) . CWr^-1 = x . CWr^-1 - CWt . CWr^-1 so, expanding the full transform, we get: x' = (x . LWr + LWt) . (CWr + CWt)^-1 = (x . LWr + LWt) . CWr^-1 - CWt . CWr^-1 = x . (LWr . CWr^-1) + [(LWt - CWt) . CWr^-1] ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ 8-bit rotation 16-bit translation where: CWr^-1 is simply the transpose of CWr Sorry if my maths has gone a bit squiffy somewhere, but I think that's right. Anyway, the point is that the rotation bit of the transform will always be values between -1 and 1, so you can do an 8-bit * 8-bit multiply and discard the LSB of the result. Then you just add this result to the 16-bit constant which represents the 'translation' part of the transform, and that's it... super-cheap 3d transformations, once you've calculated your rotation and translation transforms for each object. Then you need to do the perspective divide, scissor to viewport, and the hidden line removal...hmmmm.... I wonder if it'd possible to write better code than Elite? Somehow I have the feeling that those guys just got it absolutely right first time - it's incredible really, because these days there's so much to read about 3d transformations and their ilk, but back then they were real pioneers - I doubt there was so much written on the subject. Ok, I've written enough - it's 10.50 at night and my brain aches... cheerio! |
|
| Author: | ThomasHarte [ Thu Sep 17, 2009 1:15 pm ] |
| Post subject: | Re: Line drawing: the horror! |
RichTW wrote: Hmmm, just thinking about it a little bit more - there's no real analogue with lines, only with 2 dimensional shapes, so the best you can do is omit the final point, and then assume that lines are drawn to form a face with a consistent winding order. Of course, this means you have to be a little more clever if you swap the start and end points over. Yeah, hairline lines are a major hassle. I know the OpenGL rule does exactly what you'd hope for and aims for lines that straddle polygons to, generally, have half their pixels in one and half in the other. And then the line never actually exits the diamond it comes to rest in, so the final pixel is omitted. RichTW wrote: I think Elite actually updated its graphics line-at-a-time, rather than object-at-a-time, in order to reduce the flicker as much as possible. Since it doesn't do any kind of double-buffering, there has to be as little time as possible inbetween erasing the old line and plotting the new one. Probably when it plots the new line, it notes the coordinates so it can unplot it trivially the following frame. It may even plot the new one first, and then erase the old one! And of course EOR is the only way this can work without risking leaving little holes everywhere in your graphics! I was thinking of a small buffer for lines (allowing, say, 64/frame), the result of all 3d transforms queues up the new results and when the frame is sufficiently prepared, it waits for vsync and plots. Or probably I'd triple buffer the lines (since the description is very compact), to eliminate the vsync wait if I stick to full-frame at a time updates. I guess if per-line updates is fine (and it'll probably have to be), I can maintain a simple line draw queue, have object updates push erase old line & draw new lines pair to the end of the queue, have the vsync routine process as much of the list that exists up to there. But I wonder if it'd end up looking odd. RichTW wrote: Nice video! You must be the only person who ever bought a Sam Coupé Yes. An error, in retrospect. The screen start address is constant (ala the Spectrum), and the CPU takes something like 4 frames to scroll a screen of its own graphics. The CPU sounds like it'll be a lot faster than the Spectrum, but ends up not being because the 4bpp graphics fetch makes the memory contention much, much worse and the design means that all RAM is always contended. But at least they managed to introduce the wait cycles only when pixels are actually being read, unlike the geniuses behind the Electron. Though it's very easy to criticise 20+ years after the fact and with no idea of the original time/money/transistor budgets. RichTW wrote: <thinking out loud> ... It ends up falling out like that without even really doing the prepatory thinking if you're lucky. If the camera and object have 3x3 orientation matrices and position vectors, obviously you need to compose those (well, the inverse of the former, as you say) to figure out what to do with the object geometry. And you end up with a 3x3 orientation matrix and position vector, with the 3x3 and the model geometry being in the range -1 to +1 and the position vector being where the stuff with a larger range naturally ends up. The small range multiplications and full range multiplications end up naturally separated. RichTW wrote: Then you need to do the perspective divide, scissor to viewport, and the hidden line removal...hmmmm.... On the Sam, at least, I scissor in 3d (including to z=1) then project. I couldn't think of an in-principle reason why it would be slower for a line that needs to be scissored and figured I'd at least be able to limit my divide routine to an 8bit output, as well as throwing away lines that are completely off screen with no projection (albeit that I could have done that anyway, regardless of where I clip). RichTW wrote: I wonder if it'd possible to write better code than Elite? Somehow I have the feeling that those guys just got it absolutely right first time - it's incredible really, because these days there's so much to read about 3d transformations and their ilk, but back then they were real pioneers - I doubt there was so much written on the subject. I've always figured that, logically, someone should be able to do better now due to the huge range of literature and research that has occurred in the interim, but I am definitely not going to do better and the lack of any commercial reason for anyone to dedicate serious blocks of time to attempting to do so makes it improbable. To be honest, I'll be extremely happy if I'm able to come up with output that is a bit like Elite but only much, much worse. For anyone that wants to do better, there's a paper on run-slice line drawing with no divide operation from about 1994 that I can't find a free download. And I think Wu did some post-Bresenham work on line drawing as a finite-state automata that's meant to cut out cycles, but I think it's a much more elaborate algorithm and may end up costing too much for a BBC/Electron in pure bytes. I need to look that up. But I guess nothing about 3d transforms has advanced. Anyway, I'll keep at it, post something new if I ever write anything new. I'll definitely be a lot happier to mess around when I've got at least line plotting going. |
|
| Author: | ThomasHarte [ Thu Sep 17, 2009 8:44 pm ] |
| Post subject: | Re: Line drawing: the horror! |
Essentially the same code as before, with all of your suggested optimisations rolled in: Code: .SetTable equb 128, 64, 32, 16, 8, 4, 2, 1 .ScreenAddressPointersLow equb (ScrBuffer MOD 256), ((ScrBuffer+320) MOD 256), ((ScrBuffer+640) MOD 256), ((ScrBuffer+960) MOD 256) equb ((ScrBuffer+1280) MOD 256), ((ScrBuffer+1600) MOD 256), ((ScrBuffer+1920) MOD 256), ((ScrBuffer+2240) MOD 256) equb ((ScrBuffer+2560) MOD 256), ((ScrBuffer+2880) MOD 256), ((ScrBuffer+3200) MOD 256), ((ScrBuffer+3520) MOD 256) equb ((ScrBuffer+3840) MOD 256), ((ScrBuffer+4160) MOD 256), ((ScrBuffer+4480) MOD 256), ((ScrBuffer+4800) MOD 256) equb ((ScrBuffer+5120) MOD 256), ((ScrBuffer+5440) MOD 256), ((ScrBuffer+5760) MOD 256), ((ScrBuffer+6080) MOD 256) equb ((ScrBuffer+6400) MOD 256), ((ScrBuffer+6720) MOD 256), ((ScrBuffer+7040) MOD 256), ((ScrBuffer+7360) MOD 256) equb ((ScrBuffer+7680) MOD 256), ((ScrBuffer+8000) MOD 256), ((ScrBuffer+8320) MOD 256), ((ScrBuffer+8640) MOD 256) equb ((ScrBuffer+8960) MOD 256), ((ScrBuffer+9280) MOD 256), ((ScrBuffer+9600) MOD 256), ((ScrBuffer+9920) MOD 256) .ScreenAddressPointersHigh equb (ScrBuffer DIV 256), ((ScrBuffer+320) DIV 256), ((ScrBuffer+640) DIV 256), ((ScrBuffer+960) DIV 256) equb ((ScrBuffer+1280) DIV 256), ((ScrBuffer+1600) DIV 256), ((ScrBuffer+1920) DIV 256), ((ScrBuffer+2240) DIV 256) equb ((ScrBuffer+2560) DIV 256), ((ScrBuffer+2880) DIV 256), ((ScrBuffer+3200) DIV 256), ((ScrBuffer+3520) DIV 256) equb ((ScrBuffer+3840) DIV 256), ((ScrBuffer+4160) DIV 256), ((ScrBuffer+4480) DIV 256), ((ScrBuffer+4800) DIV 256) equb ((ScrBuffer+5120) DIV 256), ((ScrBuffer+5440) DIV 256), ((ScrBuffer+5760) DIV 256), ((ScrBuffer+6080) DIV 256) equb ((ScrBuffer+6400) DIV 256), ((ScrBuffer+6720) DIV 256), ((ScrBuffer+7040) DIV 256), ((ScrBuffer+7360) DIV 256) equb ((ScrBuffer+7680) DIV 256), ((ScrBuffer+8000) DIV 256), ((ScrBuffer+8320) DIV 256), ((ScrBuffer+8640) DIV 256) equb ((ScrBuffer+8960) DIV 256), ((ScrBuffer+9280) DIV 256), ((ScrBuffer+9600) DIV 256), ((ScrBuffer+9920) DIV 256) .GetTLAddressForXY { tya lsr a lsr a lsr a tay txa and #248 clc adc ScreenAddressPointersLow, y sta address lda ScreenAddressPointersHigh, y adc #0 sta address+1 rts } .DrawLine { ; plots a line from (x1, y1) to (x2, y2) ; clobbers everything .xplyplxmajor ; loop for x increasing, x difference greater than y difference ldx x1 ldy y1 jsr GetTLAddressForXY sec lda y2 sbc y1 bcc ynegative ; reprogram loop depending on whether deltay is negative sta y2 ; y2 = deltay ldx #&c8 stx yinc ; set yinc to be iny ldx #64 stx yincadc1+1 ldx #1 stx yincadc2+1 ; add 320 to the base address each time an 8x8 is stepped out of in the y direction ldx #8 stx ycomp+1 ; just stepped outside 8x8 block on a y increment if y is now 8 jmp yend .ynegative eor #255 ; negate deltay, since it needs to be an absolute value for the Bresenham loop adc #1 ; carry is clear from the calculation of deltay, sothis is safe sta y2 ; y2 = deltay ldx #&88 stx yinc ; set yinc to be dey ldx #&c0 stx yincadc1+1 ldx #&fe stx yincadc2+1 ; add -320 to the base address each time an 8x8 is stepped out of in the y direction ldx #255 stx ycomp+1 ; just stepped outside 8x8 block on a y decrement if y is now 255 .yend sec lda x2 sbc x1 sta x2 ; x2 = deltax lsr a ; temp0 = error sta temp0 lda y1 ; perform y = y1&7, set initial pixel mask and #7 tay lda x1 ; set initial pixel mask, using a quick table lookup and #7 tax lda SetTable, x sta pixelmask+1 ldx x2 ; x1 = pixel count stx x1 ldx #0 ; seed current pixel mask to 0 .pixelloopx0 ; add current pixel to pixel mask txa .pixelmask eor #128 tax ; subtract deltay from error sec lda temp0 sbc y2 bcs noyinc ; don't increment y if error >= 0 adc x2 ; add deltax to error - carry is clear, or we wouldn't even be here sta temp1 ; save a for later txa ; store pixel mask to memory, reset eor (address), y sta (address), y ldx #0 .yinc iny ; increment[/decrement] y tya .ycomp cmp #8 ; compare to 8[/255] to see if the y step has exited the block bne noyoverflow ; if y hasn't just jumped beyond its current 8x8 block then everything is fine and #7 tay clc ; if it has then add[/subtract] 320 to the storage address lda address .yincadc1 adc #64 sta address lda address+1 .yincadc2 adc #1 sta address+1 .noyoverflow lda temp1 .noyinc sta temp0 lsr pixelmask+1 ; shift the pixel mask right one, so that the next plot will be to the next pixel along bcc noxoverflow ; if x has not just crossed an 8 pixel boundary then there's no need to update the address txa ; store pixelbyte to memory, reset eor (address), y sta (address), y ldx #0 lda #128 ; reset pixel mask sta pixelmask+1 lda address ; add 8 to screen address adc #7 ; carry is always set, so this is equivalent to add 8 (kudos to RichTW for an optimisation I would not have spotted) sta address bcc noxoverflow inc address+1 .noxoverflow dec x1 ; decrease x count, go again if necessary bne pixelloopx0 txa ; store pixelbyte to memory, reset eor (address), y sta (address), y rts } Still no y major code, no actual check that x2 > x1. |
|
| Author: | RichTW [ Fri Sep 18, 2009 7:17 am ] |
| Post subject: | Re: Line drawing: the horror! |
I don't have time to reply in full right now, but do realise that in BeebAsm you can write stuff like this: Code: .ScreenAddressPointersLow FOR n, 0, 31 EQUB (&5800 + n * 320) AND 255 NEXT .ScreenAddressPointersHigh FOR n, 0, 31 EQUB (&5800 + n * 320) DIV 256 NEXT Definitely a time-saver, and rather clearer too! (You can also use functions like SIN and COS to build the appropriate tables). |
|
| Author: | RichTW [ Fri Sep 18, 2009 8:54 am ] |
| Post subject: | Re: Line drawing: the horror! |
I did have another question though - on the SAM version, how many bits precision were you using for your rotation matrix elements? It seems to me there are a few choices:
|
|
| Author: | ThomasHarte [ Sat Sep 19, 2009 12:03 am ] |
| Post subject: | Re: Line drawing: the horror! |
I'll respond to the multiplication stuff in the dedicated thread. For now, here's my current, complete exclusive-or Bresenham for 256x256 portions of 1bpp screens. No doubt further cycles could be shaved. Would probably need to take a step back. Or, if anyone else can spot anything then I'd be appreciative. The only 'new' things of interest are that I've implemented the y-major part of the routine, which works from bottom to top (because dey will set a flag when it exits an 8x8 flag going upward, but won't when going downwards), and am writing deltax and deltay directly into the inner loops rather than storing them on the zero page. Which is a couple more cycles saved per pixel. On the Electron in Mode 4 there are 64 useful cycles/scanline, so that's like saving almost 8 scanlines of processing on the longest possible line... definitely worth trying to squeeze everything out of this code. Code: .ScreenAddressPointersLow
FOR n, 0, 31 EQUB (ScrBuffer + n * 320) MOD 256 NEXT .ScreenAddressPointersHigh FOR n, 0, 31 EQUB (ScrBuffer + n * 320) DIV 256 NEXT .SetTable equb 128, 64, 32, 16, 8, 4, 2, 1 .GetTLAddressForXY { tya lsr a lsr a lsr a tay txa and #248 clc adc ScreenAddressPointersLow, y sta address lda ScreenAddressPointersHigh, y adc #0 sta address+1 rts } .DrawLine { ; plots a line from (x1, y1) to (x2, y2) ; clobbers everything ; decide whether x major or y major lda #0 sta temp2 sta temp3 ; temp2 will be the x negative flag, temp3 will be the y negative flag; assume both positive sec lda y2 sbc y1 bcs ypositive eor #255 ; negate deltay adc #1 ; carry is clear from the calculation of deltay, so this is safe sta temp3 ; store deltay in temp3 so that something non-zero is there .ypositive sta temp1 ; store deltay in temp1 sec lda x2 sbc x1 bcs xpositive eor #255 ; negate deltax adc #1 sta temp2 .xpositive sta temp0 ; store deltax in temp0 cmp temp1 ; compare to deltay bcs xmajor ; if deltax was larger, jump into the xmajor loop jmp ymajor ; otherwise, head on over to the ymajor .xmajor ; temp0 = abs(deltax), temp1 = abs(deltay), temp2 is non-zero if x direction is negative, temp3 is non-zero if y direction is negative { ; if here then x difference greater than y difference lda temp2 ; copy (x2, y2) to (x1, y1) if this line goes from right to left, to ensure that the loop need only ever go left to right beq noswap ldx x2 ; copy x2 to x1 stx x1 ldy y2 ; copy y2 to y1 sty y1 lda temp3 ; if y was flagged negative, flag it positive - and vice versa bne setypos inc temp3 jmp getaddr .setypos lda #0 sta temp3 .getaddr jsr GetTLAddressForXY ; this is very slightly faster than just rolling over into the code immediately below jmp addressgot .noswap ldx x1 ldy y1 jsr GetTLAddressForXY .addressgot lda temp3 bne ynegative ; reprogram loop depending on whether deltay is negative ldx #&c8 stx yinc ; set yinc to be iny ldx #64 stx yincadc1+1 ldx #1 stx yincadc2+1 ; add 320 to the base address each time an 8x8 is stepped out of in the y direction ldx #8 stx ycomp+1 ; just stepped outside 8x8 block on a y increment if y is now 8 jmp yend .ynegative ldx #&88 stx yinc ; set yinc to be dey ldx #&c0 stx yincadc1+1 ldx #&fe stx yincadc2+1 ; add -320 to the base address each time an 8x8 is stepped out of in the y direction ldx #255 stx ycomp+1 ; just stepped outside 8x8 block on a y decrement if y is now 255 .yend lda y1 ; y = y1&7 and #7 tay lda x1 ; set initial pixel mask, using a quick table lookup and #7 tax lda SetTable, x sta pixelmask+1 lda temp0 sta adcdeltax+1 ; fill in deltax inc temp0 ; temp0 = pixel count lsr a sta temp2 ; temp2 = error lda temp1 sta sbcdeltay+1 ; fill in deltay ldx #0 ; seed current pixel mask to 0 .pixelloop ; add current pixel to pixel mask txa .pixelmask ora #128 tax ; subtract deltay from error sec lda temp2 .sbcdeltay sbc #0 ; will be dynamically written in bcs noyinc ; don't increment y if error >= 0 .adcdeltax adc #0 ; add deltax to error - carry is clear, or we wouldn't even be here sta temp1 ; save a for later txa ; store pixel mask to memory, reset eor (address), y sta (address), y ldx #0 .yinc iny ; increment[/decrement] y tya .ycomp cmp #8 ; compare to 8[/255] to see if the y step has exited the block bne noyoverflow ; if y hasn't just jumped beyond its current 8x8 block then everything is fine and #7 tay clc ; if it has then add[/subtract] 320 to the storage address lda address .yincadc1 adc #64 sta address lda address+1 .yincadc2 adc #1 sta address+1 .noyoverflow lda temp1 .noyinc sta temp2 lsr pixelmask+1 ; shift the pixel mask right one, so that the next plot will be to the next pixel along bcc noxoverflow ; if x has not just crossed an 8 pixel boundary then there's no need to update the address txa ; store pixelbyte to memory, reset eor (address), y sta (address), y ldx #0 lda #128 ; reset pixel mask sta pixelmask+1 lda address ; add 8 to screen address adc #7 ; carry is always set, so this is equivalent to add 8 sta address bcc noxoverflow inc address+1 .noxoverflow dec temp0 ; decrease x count, go again if necessary bne pixelloop txa ; store pixelbyte to memory, reset eor (address), y sta (address), y rts } .ymajor ; temp0 = abs(deltax), temp1 = abs(deltay), temp2 is non-zero if x direction is negative, temp3 is non-zero if y direction is negative { lda temp3 ; copy (x2, y2) to (x1, y1) if this line goes from top to bottom, to ensure that the loop need only ever go bottom to top (yes, backwards) bne noswap ldx x2 ; copy x2 to x1 stx x1 ldy y2 ; copy y2 to y1 sty y1 lda temp2 ; if x was flagged negative, flag it positive - and vice versa bne setxpos inc temp2 jmp getaddr .setxpos lda #0 sta temp2 .getaddr jsr GetTLAddressForXY ; this is very slightly faster than just rolling over into the code immediately below jmp addressgot .noswap ldx x1 ldy y1 jsr GetTLAddressForXY .addressgot lda temp2 bne xnegative ; reprogram loop depending on whether deltax is negative lda #7 sta xadc+1 ; add 7(+1 = 8) to low byte to move across a column lda #&e6 sta xadc2 ; inc top byte... lda #&90 sta xbcc ; ...if carry lda #&4e sta pixelshift ; lsr to move from one pixel to the next lda #128 sta pixelreload+1 jmp xend .xnegative lda #&f7 sta xadc+1 ; subtract 9(-1 = 8) to low byte to move across a column lda #&c6 sta xadc2 ; dec top byte... lda #&b0 sta xbcc ; ...if not carry lda #&0e sta pixelshift ; asl to move from one pixel to the next lda #1 sta pixelreload+1 .xend lda y1 ; y = y1&7 and #7 tay lda x1 ; set initial pixel mask, using a quick table lookup and #7 tax lda SetTable, x sta pixelmask+1 lda temp0 sta subdeltax+1 ; write in deltax lda temp1 sta adddeltay+1 ; write in deltay inc temp1 ; temp1 = pixel count lsr a tax ; x = error .pixelloop ; add a pixel to the display lda (address),y .pixelmask eor #128 sta (address),y ; subtract deltax from error sec txa .subdeltax sbc #0 bcs noxinc ; don't increment x if error >= 0 .adddeltay adc #0 ; add deltay to error - carry is clear, or we wouldn't even be here .pixelshift asl pixelmask+1 bcc noxinc tax ; save a for later .pixelreload lda #128 sta pixelmask+1 ; carry is definitely set - following will be reprogrammed to add or subtract 8 lda address .xadc adc #7 sta address .xbcc bcc noad1inc .xadc2 inc address+1 .noad1inc txa .noxinc tax dey bpl noyoverflow clc ;add -320 to move up one line lda address adc #&c0 sta address lda address+1 adc #&fe sta address+1 ldy #7 .noyoverflow dec temp1 bne pixelloop .drawline rts } } |
|
| Page 1 of 1 | All times are UTC [ DST ] |
| Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |
|