Previously some features were excluded based on whether SIMULATOR
was defined. But in fact these should have been included when __AVR__
was defined. These used to be the same thing, but now with ARM coming
into the picture, they are not. Fix the situation so AVR includes are
truly only used when __AVR__ is defined.
The _crc16_update function appears to be specific to AVR; I've kept the
alternate implementation limited to AVR in that case in crc.c. I think
this is the right thing to do, but I am not sure. Maybe ARM has some
equivalent function in their libraries.
The trick is to use doubles earlier. As these calculations are
optimised out anyways, binary size and performance is kept.
Verified to have an identical outcome on a few common steps/mm and
acceleration cases.
... instead of trying to fire an interrupt as quickly as possible.
This affects ACCELERATION_TEMPORAL only. It almost doubles the
achievable step rate. Measured maximum step rate (X axis only,
100 mm moves) is 40'000 steps/s on a 16 MHz electronics, so
approx. 50'000 steps/s on a 20 MHz controller, which is even
a bit faster than the ACCELERATION_RAMPING algorithm.
Tests with temporary test code were run and judging by these
tests, clock interrupts are now very reliable up to the point
where processing speed is simply exhaused.
Performance with ACCELERATION_RAMPING: this costs 10 bytes
binary size and exactly 2 clock cycles per step interrupt or
0.6% performance even. We could avoid this with a lot
of #ifdefs, but considering ACCELERATION_TEMPORAL will one
day be the default acceleration, skip these #ifdefs, also
for better code readability.
$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20528 bytes 144% 67% 33% 16%
RAM : 2188 bytes 214% 107% 54% 27%
EEPROM : 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 838.
LED on time minimum: 304 clock cycles.
LED on time maximum: 715 clock cycles.
LED on time average: 310.717 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 8585.
LED on time minimum: 309 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 360.051 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 304 clock cycles.
LED on time maximum: 710 clock cycles.
LED on time average: 332.32 clock cycles.
Performance for ACCELERATION_RAMPING unchanged:
$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode
[...]
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20518 bytes 144% 67% 33% 16%
RAM : 2188 bytes 214% 107% 54% 27%
EEPROM : 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 838.
LED on time minimum: 302 clock cycles.
LED on time maximum: 713 clock cycles.
LED on time average: 308.72 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 8585.
LED on time minimum: 307 clock cycles.
LED on time maximum: 710 clock cycles.
LED on time average: 358.051 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 302 clock cycles.
LED on time maximum: 708 clock cycles.
LED on time average: 330.322 clock cycles.
Pure cosmetical change.
Performance check:
$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode
[...]
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20518 bytes 144% 67% 33% 16%
RAM : 2188 bytes 214% 107% 54% 27%
EEPROM : 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 838.
LED on time minimum: 302 clock cycles.
LED on time maximum: 713 clock cycles.
LED on time average: 308.72 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 8585.
LED on time minimum: 307 clock cycles.
LED on time maximum: 710 clock cycles.
LED on time average: 358.051 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 302 clock cycles.
LED on time maximum: 708 clock cycles.
LED on time average: 330.322 clock cycles.
Forgotten in commit 74808610c7,
"DDA: Move axis calculations into loops, part 5.".
This and the previous commit makes ACCELERATION_TEMPORAL building
(and working!) again.
Next time, please at least try to compile the code section in
question when explicitely changing the section. In this case,
with ACCELERATION_TEMPORAL enabled. It didn't build.
Was broken with commit 95926a3f113809bde8ff0c84b94c55c73e398f67,
"DDA: Rename confusing variable name.".
It was certainly a good idea, but also always a suspect of
malfunctions and as such, almost never used. Newer code
organisation moves most of the code behind it to dda_clock()
anyways, so it also became mostly obsolete.
Rest In Peace, STEP_INTERRUPT_INTERRUPTIBLE, you were matter
of quite a number of interesting discussions and investigations.
Changes for Configtool by jbernardis <jeff.bernardis@gmail.com>
As we can always only move towards one end of an axis, one common
variable to count debouncing is sufficient.
Binary size 12 bytes smaller (and faster).
Previously, when backing off of X_MIN, X_MAX was also checked,
which of course was already open, so it signals endstop release
even while X_MIN is still closed. The issue exposed only when
endstops on both ends of an axis were defined, a more rare situation.
Essentially the fix simply makes a distinct endstop check case
for each side of each axis.
This even makes binary size 40 bytes smaller for the standard case.
This also introduces dda_kinematics.c/.h and a KINEMATICS definition,
which allows to do different distance calculations depending on the
bot kinematics in use. So far only KINEMATICS_STRAIGHT, which matches
what we had before, but other kinematics types are present in
comments already.
Goal is to calculate steps in a separate function to allow different
methods of steps calculation, which is neccessary for supporting
different kinematics types. Accordingly we have to calculate steps
for all axes before setting directions and such stuff.
This was the goal: to not bit-shift when calling setTimer(). Binary
size another 40 bytes off, about 1.2 % better performance:
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20136 bytes 141% 66% 32% 16%
RAM : 2318 bytes 227% 114% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 302 clock cycles.
LED on time maximum: 718 clock cycles.
LED on time average: 311.258 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 9124.
LED on time minimum: 307 clock cycles.
LED on time maximum: 708 clock cycles.
LED on time average: 357.417 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 302 clock cycles.
LED on time maximum: 708 clock cycles.
LED on time average: 330.322 clock cycles.
Admittedly it looks like advancing in babysteps, but really
catching every bit shifting instance isn't trivial, sometimes
these shifts are already embedded in other calculations.
Still no binary size or performance change.
While this shifting meant to increase accuracy, there's no actual
use of it, other than that this value gets shifted back and forth.
Let's start to get rid of it.
Performance stays exactly the same:
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20188 bytes 141% 66% 32% 16%
RAM : 2318 bytes 227% 114% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 306 clock cycles.
LED on time maximum: 722 clock cycles.
LED on time average: 315.253 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 9124.
LED on time minimum: 311 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 361.416 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 306 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 334.319 clock cycles.
This finally brings Z axis up to speed.
So far we always assumed the fastest axis to have the same steps/mm
as the X axis. In cases where this wasn't true, the movement
wouldn't do sufficient acceleration steps and, accordingly,
not reach the expected maximum speed. This was particularly visible
on a typical Mendel printer, where the Z axis would reach only a
6th of the commanded speed in some configurations.
'all_time' sounds like forever to me, but this variable really
tracks the last time we hit one of "all the axes". It sticks
out more now in looping, so rename it to make sense.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 9 is, finally use this set_direction() thing. As a dessert
topping, it reduces binary size by another 122 bytes.
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 19988 bytes 140% 66% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 8 is, move remaining update_current_position() into a loop.
This makes the binary 134 bytes smaller. As it's not critical,
no performance test.
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20134 bytes 141% 66% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 7 is, turn update_current_position() in dda.c partially into
a loop. Surprise, surprise, this changes neither binary size nor
performance. Looking into the generated assembly, the loop is
indeed completely unrolled. Apparently that's smaller than a
real loop.
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20270 bytes 142% 66% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 279945 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 722 clock cycles.
LED on time average: 315.253 clock cycles.
smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 3297806 clock cycles.
LED on time minimum: 311 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 361.443 clock cycles.
triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 546946 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 334.319 clock cycles.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 6c removes do_step(), but still tries to keep a loop. This
about the maximum of performance I (Traumflug) can think of.
Binary size is as good as with the former attempt, but performance
is actually pretty bad, 45% worse than without looping:
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 19876 bytes 139% 65% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 406041 clock cycles.
LED on time minimum: 448 clock cycles.
LED on time maximum: 864 clock cycles.
LED on time average: 457.253 clock cycles.
smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 4791132 clock cycles.
LED on time minimum: 453 clock cycles.
LED on time maximum: 867 clock cycles.
LED on time average: 525.113 clock cycles.
triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 800586 clock cycles.
LED on time minimum: 448 clock cycles.
LED on time maximum: 867 clock cycles.
LED on time average: 489.356 clock cycles.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 6b moves do_step() from the "tidiest" place into where it's
currently used, dda.c. Binary size goes down another 34 bytes, to
a total savings of 408 bytes and performance is much better, but
still 16% lower than without using loops:
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 19874 bytes 139% 65% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 320000 clock cycles.
LED on time minimum: 351 clock cycles.
LED on time maximum: 772 clock cycles.
LED on time average: 360.36 clock cycles.
smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 3875874 clock cycles.
LED on time minimum: 356 clock cycles.
LED on time maximum: 773 clock cycles.
LED on time average: 424.8 clock cycles.
triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 640357 clock cycles.
LED on time minimum: 351 clock cycles.
LED on time maximum: 773 clock cycles.
LED on time average: 391.416 clock cycles.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 6a is putting stuff inside the step interrupt into a loop,
too. do_step() is put into the "tidiest" place. Binary size goes
down a remarkable 374 bytes, but stepping performance suffers by
almost 30%.
Traumflug's performance measurements:
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 19908 bytes 139% 65% 32% 16%
RAM : 2302 bytes 225% 113% 57% 29%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 354537 clock cycles.
LED on time minimum: 390 clock cycles.
LED on time maximum: 806 clock cycles.
LED on time average: 399.253 clock cycles.
smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 4268896 clock cycles.
LED on time minimum: 395 clock cycles.
LED on time maximum: 807 clock cycles.
LED on time average: 467.875 clock cycles.
triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 706846 clock cycles.
LED on time minimum: 390 clock cycles.
LED on time maximum: 807 clock cycles.
LED on time average: 432.057 clock cycles.
Should be done for temptable in ThermistorTable.h, too, but this
would mess up an existing users' configuration.
This tries to put emphasis on the fact that you have to read
these values with pgm_read_*() instead of just using the variable.
Unfortunately, gcc compiler neither inserts PROGMEM reading
instructions automatically when reading data stored in flash,
nor does it complain or warn about the missing read instructions.
As such it's very easy to accidently handle data stored in flash
just like normal data. It'll compile and work ... you just read
arbitrary data (often, but not always zeros) instead of what you
intend.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 5 is move ACCELERATION_TEMPORAL's step delay calculations
into loops. Not tested, binary size change unknown.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 4 is move ACCELERATION_TEMPORAL's maximum feedrate limitation
into a loop. Not tested, binary size change unknown.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 3 is moving fast axis detection into a loop.
Binary size 84 bytes smaller.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Part 2 is moving maximum speed limit calculations into loops.
Binary size another 160 bytes smaller.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.
Traumflug notes:
Split this once huge commit into smaller ones for ease of
reviewing and bisecting (in case something went wrong).
Part 1 is to put dda_create() distance calculations into loops.
This reduces binary size by another whopping 756 bytes.
This was contributed by Phil Hord as part of another commit.
It saves 168 bytes, to it more than outweights the overhead of
introducing a generic implementation already.
Many places in the code use individual variables for int/uint values
for X, Y, Z, and E. A tip from a comment suggests making these into
arrays for scalability in the future. Replace the discrete variables
with arrays so the code can be simplified in the future.