Adds G29 commands to register bed level points. When three points
are registered, the plane of the bed is calculated and dynamic bed
leveling takes effect.
Add a warning if bed-leveling is enabled when MAX_JERK_Z is zero.
In this case lookahead will always fail when bed-leveling is active
since the Z-axis is not allowed to move during lookahead.
We need the fastest axis instead of its steps.
Eleminates also an overflow when ACCELERATION > 596.
We save 118 bytes program and 2 bytes data.
Reviewer Traumflug's note: I see 100 bytes program and 32 bytes
RAM saving on ATmegas here. 16 and 32 on the LPC 1114. Either way:
great stuff!
Should be done for temptable in ThermistorTable.h, too, but this
would mess up an existing users' configuration.
This tries to put emphasis on the fact that you have to read
these values with pgm_read_*() instead of just using the variable.
Unfortunately, gcc compiler neither inserts PROGMEM reading
instructions automatically when reading data stored in flash,
nor does it complain or warn about the missing read instructions.
As such it's very easy to accidently handle data stored in flash
just like normal data. It'll compile and work ... you just read
arbitrary data (often, but not always zeros) instead of what you
intend.
A generic implementation here will allow callers to pass the
target axis in as a parameter so the callers can also be made more
generic.
Traumflug notes:
Split out application of the new implementation in dda.c into its
own commit.
This actually costs 128 bytes, but as we can access axes from within
a loop now, I expect to get more savings elsewhere.
Interestingly, binary size is raised by another 18 bytes if
um_to_steps(int32_t, enum axis_e)
is changed to
um_to_steps(enum axis_e, int32_t)
even on the 8-bit ATmega. While putting the axis number to the
front might be a bit more logical (think of additional parameters,
the axis number position would move), NXP application note
AN10963 states on page 10ff, 16-bit data should be 16-bit aligned
and 32-bit data should be 32-bit aligned for best performance.
Well, so let's do it this way.
This macro is pretty expensive (700 bytes, well, stuff is now
calculated at runtime), so there's no chance to use it in multiple
places and we likely also need this in dda_lookahead.c to achieve
full 4 axis compatibility there.
This 1/sqrt(x) implementation is a 12 bits fixed point implementation
and a bit faster than a 32 bits divide (it takes about 11% less time
to complete) and could be even faster if one requires only 8 bits.
Also, precision starts getting poor for big values of n which are
likely to be required by small acceleration values.
Implementation by Roland Brochard <zuzuf86@gmail.com>.
Note: If you wonder how code doing multiplications can be faster than
code doing just shifts and increments: I've measured it. One million
square roots in 30 seconds with the new code instead of 220 seconds
with the old code on a Gen7 20 MHz. That's just 30 microseconds or
600 CPU cycles per root.
Code used for the measurement (by a stopwatch) in mendel.c:
...
*include "dda_maths.h"
*include "delay.h"
int main (void)
{
uint32_t i, j;
serial_init();
sei();
serial_writestr_P(PSTR("start\n"));
for (i = 0; i < 1000000; i++) {
j = int_sqrt(i);
}
serial_writestr_P(PSTR("done\n"));
delay_ms(20);
cli();
init();
...
--Traumflug
This is a version of muldiv() with qn and rn precalculated,
so it can be avoided to re-calclulate it on every instance.
Yet another 116 bytes, unfortunately.
We have multiplies followed by divides all over the place and
most of them are difficult to handle regarding overflows. This
new algorithm handles this fine in all cases, as long as all
three operators and the overall result fits into 32 bits.