This 1/sqrt(x) implementation is a 12 bits fixed point implementation
and a bit faster than a 32 bits divide (it takes about 11% less time
to complete) and could be even faster if one requires only 8 bits.
Also, precision starts getting poor for big values of n which are
likely to be required by small acceleration values.
Implementation by Roland Brochard <zuzuf86@gmail.com>.
Note: If you wonder how code doing multiplications can be faster than
code doing just shifts and increments: I've measured it. One million
square roots in 30 seconds with the new code instead of 220 seconds
with the old code on a Gen7 20 MHz. That's just 30 microseconds or
600 CPU cycles per root.
Code used for the measurement (by a stopwatch) in mendel.c:
...
*include "dda_maths.h"
*include "delay.h"
int main (void)
{
uint32_t i, j;
serial_init();
sei();
serial_writestr_P(PSTR("start\n"));
for (i = 0; i < 1000000; i++) {
j = int_sqrt(i);
}
serial_writestr_P(PSTR("done\n"));
delay_ms(20);
cli();
init();
...
--Traumflug
This is a version of muldiv() with qn and rn precalculated,
so it can be avoided to re-calclulate it on every instance.
Yet another 116 bytes, unfortunately.
We have multiplies followed by divides all over the place and
most of them are difficult to handle regarding overflows. This
new algorithm handles this fine in all cases, as long as all
three operators and the overall result fits into 32 bits.