Implementation by Roland Brochard <zuzuf86@gmail.com>.
Note: If you wonder how code doing multiplications can be faster than
code doing just shifts and increments: I've measured it. One million
square roots in 30 seconds with the new code instead of 220 seconds
with the old code on a Gen7 20 MHz. That's just 30 microseconds or
600 CPU cycles per root.
Code used for the measurement (by a stopwatch) in mendel.c:
...
*include "dda_maths.h"
*include "delay.h"
int main (void)
{
uint32_t i, j;
serial_init();
sei();
serial_writestr_P(PSTR("start\n"));
for (i = 0; i < 1000000; i++) {
j = int_sqrt(i);
}
serial_writestr_P(PSTR("done\n"));
delay_ms(20);
cli();
init();
...
--Traumflug
This is a version of muldiv() with qn and rn precalculated,
so it can be avoided to re-calclulate it on every instance.
Yet another 116 bytes, unfortunately.
We have multiplies followed by divides all over the place and
most of them are difficult to handle regarding overflows. This
new algorithm handles this fine in all cases, as long as all
three operators and the overall result fits into 32 bits.