Fast multiplication routines

From Retrosoftware

(Difference between revisions)
Jump to: navigation, search
m
m
Line 1: Line 1:
-
Here's a little trick I'd never come across before: doubtless it's well known, but thought I'd share the goodies. Normally, multiplication of two 8-bit numbers can be achieved with a little routine such as this:
+
''[[SampleCodeLibrary|Sample Code Library]] > Fast multiplication routines''
 +
 
 +
 
 +
= Fast multiplication routines in 6502 =
 +
 
 +
This is a copy of an article I posted to the Beeb mailing list yonks ago - perhaps it's useful to someone...?
 +
 
 +
 
 +
== Multiplication by iteration and shifting ==
 +
 
 +
Normally, multiplication of two 8-bit numbers can be achieved with a little routine such as this:
<tt>
<tt>
Line 26: Line 36:
This completes in an average of 113 cycles (excluding the multiply by zero special case) - not bad, but it's possible to do better, provided you're happy to set aside some space for some tables...
This completes in an average of 113 cycles (excluding the multiply by zero special case) - not bad, but it's possible to do better, provided you're happy to set aside some space for some tables...
 +
 +
== Faster multiplication by table lookup ==
There is a method which yields the product of two values from the difference between two squares.
There is a method which yields the product of two values from the difference between two squares.

Revision as of 07:20, 28 June 2008

Sample Code Library > Fast multiplication routines


Fast multiplication routines in 6502

This is a copy of an article I posted to the Beeb mailing list yonks ago - perhaps it's useful to someone...?


Multiplication by iteration and shifting

Normally, multiplication of two 8-bit numbers can be achieved with a little routine such as this:

.multiply
\ Multiplies A*X, and stores result in product/product+1

CPX #0:BEQ zero
DEX:STX product+1
LSR A:STA product
LDA #0
BCC s1:ADC product+1:.s1 ROR A:ROR product
BCC s2:ADC product+1:.s2 ROR A:ROR product
BCC s3:ADC product+1:.s3 ROR A:ROR product
BCC s4:ADC product+1:.s4 ROR A:ROR product
BCC s5:ADC product+1:.s5 ROR A:ROR product
BCC s6:ADC product+1:.s6 ROR A:ROR product
BCC s7:ADC product+1:.s7 ROR A:ROR product
BCC s8:ADC product+1:.s8 ROR A:ROR product
STA product+1
RTS
.zero
STX product:STX product+1
RTS

This completes in an average of 113 cycles (excluding the multiply by zero special case) - not bad, but it's possible to do better, provided you're happy to set aside some space for some tables...


Faster multiplication by table lookup

There is a method which yields the product of two values from the difference between two squares.

Mathematically:

 (a+b)^2 = a^2 + b^2 + 2ab     --> (I)
 (a-b)^2 = a^2 + b^2 - 2ab     --> (II)

(I) minus (II) gives:

 (a+b)^2 - (a-b)^2 = 4ab

or, in other words:

       (a+b)^2     (a-b)^2
 ab =  -------  -  -------
          4           4

So this means we can store a table of f(n) = n^2 / 4 for n = 0..510, and then can achieve multiplication via a single 16-bit subtract! In reality, we use 4 lookup tables; 2 for the LSB/MSBs of the 16-bit value when n is less than 256, and a further 2 for when n is 256 or greater.

Curiously, the integer division by 4 - and hence, truncation of the values stored in the tables - doesn't appear to sacrifice any accuracy. It's as if there's never any 'borrow' from bit 2 to bit 1 in any of the subtractions it ever performs, but I haven't investigated why this should be. Perhaps someone else can explain? I honestly expected to have to implement some fiddle factor to make up for truncation when I was writing this, and yet it seems totally unnecessary.

Here's some code:

 10 sqrlo256%=&900
 20 sqrlo512%=&A00
 30 sqrhi256%=&B00
 40 sqrhi512%=&C00
 50 :
 60 FOR N%=0 TO 255
 70 s256%=(N%*N%) DIV 4
 80 s512%=((N%+256)*(N%+256)) DIV 4
 90 N%?sqrlo256%=s256% AND 255
100 N%?sqrhi256%=s256% DIV 256
110 N%?sqrlo512%=s512% AND 255
120 N%?sqrhi512%=s512% DIV 256
130 NEXT
140 :
150 num1=&70:num2=&71:result=&72
160 FOR N%=0 TO 2 STEP 2:P%=&700:[OPT N%
170 .mult
180 SEC:LDA num1:SBC num2:BCS positive
190 EOR #255:ADC #1:.positive TAY
200 CLC:LDA num1:ADC num2:TAX
210 BCS morethan256:SEC
220 LDA sqrlo256%,X:SBC sqrlo256%,Y:STA result
230 LDA sqrhi256%,X:SBC sqrhi256%,Y:STA result+1
240 RTS
250 .morethan256
260 LDA sqrlo512%,X:SBC sqrlo256%,Y:STA result
270 LDA sqrhi512%,X:SBC sqrhi256%,Y:STA result+1
280 RTS
290 ]:NEXT
300 :
310 !result=0
320 REPEAT
330 ?num1=RND(256)-1:?num2=RND(256)-1:CALL mult
340 PRINT;?num1;"*";?num2;"=";!result;" (";?num1*?num2;")"
350 IF ?num1*?num2<>!result:END
360 UNTIL FALSE

So there's a routine which achieves an 8 bit * 8 bit multiplication in just 55 cycles - over twice the speed of the original routine. The only problem with this is the amount of table space required - 1k.

It turns out that we can reduce the table space by 256 bytes by unifying sqrlo256 and sqrlo512. They are virtually the same, apart from that sqrlo512(n) = sqrlo256(n) EOR &80 when n is odd.

Which leads us to the following space optimisation:

 10 sqrlo%=&900
 20 sqrhi256%=&A00
 30 sqrhi512%=&B00
 40 :
 50 FOR N%=0 TO 255
 60 s256%=(N%*N%) DIV 4
 70 s512%=((N%+256)*(N%+256)) DIV 4
 80 N%?sqrlo%=s256% AND 255
 90 N%?sqrhi256%=s256% DIV 256
100 N%?sqrhi512%=s512% DIV 256
110 NEXT
120 :
130 num1=&70:num2=&71:result=&72
140 FOR N%=0 TO 2 STEP 2:P%=&700:[OPT N%
150 .mult
160 SEC:LDA num1:SBC num2:BCS positive
170 EOR #255:ADC #1:.positive TAY
180 CLC:LDA num1:ADC num2:TAX
190 BCS morethan256
200 LDA sqrhi256%,X:STA result+1
210 LDA sqrlo%,X:SEC:BCS lessthan256
220 .morethan256
230 LDA sqrhi512%,X:STA result+1
240 TXA:AND #1:BEQ skip:LDA #&80:.skip
250 EOR sqrlo%,X:.lessthan256
260 SBC sqrlo%,Y:STA result
270 LDA result+1:SBC sqrhi256%,Y:STA result+1
280 RTS
290 ]:NEXT
300 :
310 !result=0
320 REPEAT
330 ?num1=RND(256)-1:?num2=RND(256)-1:CALL mult
340 PRINT;?num1;"*";?num2;"=";!result;" (";?num1*?num2;")"
350 IF ?num1*?num2<>!result:END
360 UNTIL FALSE

...which is an average of about 67 cycles, i.e. still substantially better than the original routine.

I wish I'd thought of this trick when I was writing multiplication-intensive demos years ago...