We need the fastest axis instead of its steps.
Eleminates also an overflow when ACCELERATION > 596.
We save 118 bytes program and 2 bytes data.
Reviewer Traumflug's note: I see 100 bytes program and 32 bytes
RAM saving on ATmegas here. 16 and 32 on the LPC 1114. Either way:
great stuff!
Should be done for temptable in ThermistorTable.h, too, but this
would mess up an existing users' configuration.
This tries to put emphasis on the fact that you have to read
these values with pgm_read_*() instead of just using the variable.
Unfortunately, gcc compiler neither inserts PROGMEM reading
instructions automatically when reading data stored in flash,
nor does it complain or warn about the missing read instructions.
As such it's very easy to accidently handle data stored in flash
just like normal data. It'll compile and work ... you just read
arbitrary data (often, but not always zeros) instead of what you
intend.
A generic implementation here will allow callers to pass the
target axis in as a parameter so the callers can also be made more
generic.
Traumflug notes:
Split out application of the new implementation in dda.c into its
own commit.
This actually costs 128 bytes, but as we can access axes from within
a loop now, I expect to get more savings elsewhere.
Interestingly, binary size is raised by another 18 bytes if
um_to_steps(int32_t, enum axis_e)
is changed to
um_to_steps(enum axis_e, int32_t)
even on the 8-bit ATmega. While putting the axis number to the
front might be a bit more logical (think of additional parameters,
the axis number position would move), NXP application note
AN10963 states on page 10ff, 16-bit data should be 16-bit aligned
and 32-bit data should be 32-bit aligned for best performance.
Well, so let's do it this way.
Test code which wants to customize config.h can do so without
touching config.h itself by wrapping config.h in a macro variable
which is passed in to the compiler. It defaults to "config.h" if
no override is provided.
This change would break makefile dependency checking since the selection
of a different header file on the command line is not noticed by make
as a build-trigger. To solve this, we add a layer to the BUILDDIR path
so build products are now specific to the USER_CONFIG choice if it is
not "config.h".
This macro is pretty expensive (700 bytes, well, stuff is now
calculated at runtime), so there's no chance to use it in multiple
places and we likely also need this in dda_lookahead.c to achieve
full 4 axis compatibility there.
For now this is for the initial rampup calculation, only, notably
for moving the Z axis (which else gets far to few rampup steps on
a typical mendel-like printer).
The used macro was verified with this test code (in mendel.c):
[...]
int main (void) {
init();
uint32_t speed, spm;
char string[128];
for (spm = 2000; spm < 4099000; spm <<= 1) {
for (speed = 11; speed < 65536; speed *= 8) {
sersendf_P(PSTR("spm = %lu speed %lu ==> macro %lu "),
spm, speed, ACCELERATE_RAMP_LEN_SPM(speed, spm));
delay_ms(10);
sprintf(string, "double %f\n",
(double)speed * (double)speed / ((double)7200000 * (double)ACCELERATION / (double)spm));
serial_writestr((uint8_t *)string);
delay_ms(10);
}
}
[...]
Note: to link the test code, this linker flag is required to add
the full printf library (which does print doubles):
LDFLAGS += -Wl,-u,vfprintf -lprintf_flt -lm
His implementation was done on every step and as it turns out,
the very same maths works just fine in the clock interrupt.
Reason for the clock interrupt is: it allows about 3 times
higher step rates.
This strategy is not only substantially faster, but also
a bit smaller.
One funny anecdote: the acceleration initialisation value, C0,
was taken from elsewhere in the code as-is. Still it had to be
adjusted by a factor of sqrt(2) to now(!) match the physics
formulas and to get ramps reasonably matching the prediction
(and my pocket calculator). Apparently the code before
accumulated enough rounding errors to compensate for the
wrong formula.
This 1/sqrt(x) implementation is a 12 bits fixed point implementation
and a bit faster than a 32 bits divide (it takes about 11% less time
to complete) and could be even faster if one requires only 8 bits.
Also, precision starts getting poor for big values of n which are
likely to be required by small acceleration values.
This means, modify existing code to let the lookahead algorithms
do their work. It also means to remove some unused code in
dda_lookahead.c and reordering some code to make it work with
LOOKAHEAD undefined.
This is a version of muldiv() with qn and rn precalculated,
so it can be avoided to re-calclulate it on every instance.
Yet another 116 bytes, unfortunately.
This gets rid of overflows at micrometer to step conversion as
much as possible within 31 bits. It also opens the door to get
STEPS_PER_M configurable at runtime.
This also costs 290 bytes, unfortunately.
We have multiplies followed by divides all over the place and
most of them are difficult to handle regarding overflows. This
new algorithm handles this fine in all cases, as long as all
three operators and the overall result fits into 32 bits.