Divides on the ARM7TDMI

Piglet · Jun 9, 2008

The ARM7TDMI is a great processor. My only gripe is that there is no divide instruction. Now, The inner-loop of a divide in ARM7TDMI code is 4 instructions so even if unrolled then it's going to take a Loooonnnggg time. OK, I know most people doing 3D on the thing will point out that drawing is the biggest overhead (especially with textures) but I'm looking at CELP speech decoding and for my purposes, it IS most decidedly the slowest part.

I mean, you can find the MSB and index your jump into the code (in fact, if you do this, maybe you can have a 3-instructions per bit method) but even allowing for this, it may take 96 cycles + setup overheads.

I did consider using a log/antilog table because I don't need totally perfect accuracy. Maybe combining the two will be fast and accurate....

input, anyone?

Sean;-)

Piglet · Jun 9, 2008

Here is the GNU implementation

; Enter with numbers in Ra and Rb.
MOV Rcnt,#1 ; Bit to control the division.
Div1

CMP Rb,#0x80000000 ; Move Rb until greater than Ra.
CMPCC Rb,Ra
MOVCC Rb,Rb,ASL#1
MOVCC Rcnt,Rcnt,ASL#1
BCC Div1
MOV Rc,#0

Div2
CMP Ra,Rb ; Test for possible subtraction.
SUBCS Ra,Ra,Rb ; Subtract if ok,
ADDCS Rc,Rc,Rcnt ; Put relevant bit into result
MOVS Rcnt,Rcnt,LSR#1 ; Shift control bit
MOVNE Rb,Rb,LSR#1 ; Halve unless finished.
BNE Div2 ; Divide result in Rc, remainder in Ra.

The Div1 loop can be replaced with a matrix of:

CMP Ra,Rb,ASL (x)

with branches

The Div2 loop can similarly loose the CMP Ra,RB. Since it's unrolled and you branch into it at the right place, you can also loos MOVS Rcnt,Rcnt,LSR#1, the MOVNO Rb,Rb,LSR #1 and the BNE Div2

Therefore you end up with a lot of:

AND rd,rb,#1 LSL (x)
ADD rc,rc,rd

So, instead if shifting, you bit-test Rb and place the result into the NEW register Rd. You would need 31 of these for a 16.16/16.16 divide and some normalisation at the end. I imagine a divide would then take a fixed amount of time. About 80 cycles at a guess...

Piglet · Jun 9, 2008

Sorry, one one point I was talking rubbish. The number of cycles is NOT fixed. 80 would be maximum since it branches into the unrolled and pared down Div2 'loop'. Later tonight I will get it working (on paper at least) so I will be able to give a proper cycle-count and such.

Of course, since the multiply gives a 64 bit result (if you want), then for most applications, the reSPAMSPAMSPAMSPAMSPAMcal of the values can be used and then only multiplies are needed.

I remember talking to a representative of the 3DFX 3-D card manufacturers who explained that rather than a Z buffer, they used a W buffer, which was 1/Z.... so you multiply UP rather than divide down to find the Z...

This was about 14 years ago and I know I was impressed at the time...

marshallh · Jun 10, 2008

The W-buffer is actually a very good idea and concept. I don't know why it wasn't adopted more widely, I suppose it was probably too late in the development of the modern graphics pipline.

Direct3D supported it and tried to mainstream it, I used it in the software reference rasterizer mode and it totally eliminated z-fighting (rounding errors and loss of precision in the zbuffer causing objects to overlap)

If I was doing this, I'd probably just use a lookup table and some rough linear interpolation.

DevL · Jun 15, 2008

are we talking in DS context here ?
Hint May be we can count on Co-Processor in divide...

Piglet · Jun 15, 2008

I'm thinking of the GBA with it's ARM7TDMI... so no FPU.

DevL · Jun 20, 2008

check the site http://www.btinternet.com/~ahcox/GBA/index.html with links on Division:

Piglet · Jun 24, 2008

Thanks for the link, it's most helpful. Can anyone comment on the source I produced? Can it be made faster, for example? I know it doesn't give a remainder, but a quick multuply and subtract will do that, one assumes...

blasty · Aug 20, 2008

There are also a bunch of SWI's for division if I recall correctly. Dunno how costy those are. And try to avoid floats at all costs, since there's no FPU.

Divides on the ARM7TDMI

Piglet Spirited Member

Piglet Spirited Member

Piglet Spirited Member

marshallh N64 Coder

DevL Robust Member

Piglet Spirited Member

DevL Robust Member

Piglet Spirited Member

blasty Member

Share This Page

Divides on the ARM7TDMI

Piglet Spirited Member

Piglet Spirited Member

Piglet Spirited Member

marshallh N64 Coder

DevL Robust Member

Piglet Spirited Member

DevL Robust Member

Piglet Spirited Member

blasty Member

Share This Page

Useful Searches