As we're in an interrupt already, we can simplify the test for an
empty queue. Slowest step down to 446 clock cycles, another 26
ticks less. Binary size only 36 bytes up:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19472 bytes 136% 64% 31% 16%
Data: 2177 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 226 clock cycles.
LED on time maximum: 403 clock cycles.
LED on time average: 262.922 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 251 clock cycles.
LED on time maximum: 446 clock cycles.
LED on time average: 286.203 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 251 clock cycles.
LED on time maximum: 403 clock cycles.
LED on time average: 276.561 clock cycles.
Not queuing up waits for the heaters in the movement queue removes
some code in performance critical paths. What a luck we just
implemented an alternative M116 functionality with the previous
commit :-)
Performance of the slowest step is decreased a nice 29 clock
cycles and binary size decreased by a whoppy 472 bytes. That's
still 210 bytes less than before implementing the alternative
heater wait.
Best of all, average step time is down some 21 clock cycles, too,
so we increased general stepping performance by no less than 5%.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19436 bytes 136% 64% 31% 16%
Data: 2177 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 259 clock cycles.
LED on time maximum: 429 clock cycles.
LED on time average: 263.491 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 251 clock cycles.
LED on time maximum: 472 clock cycles.
LED on time average: 286.259 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 251 clock cycles.
LED on time maximum: 429 clock cycles.
LED on time average: 276.616 clock cycles.
Nullmoves are movements which don't actually move a stepper. For
example because it's a velocity change only or the movement is
shorter than a single motor step.
Not queueing them up removes the necessity to check for them,
which reduces code in critical areas. It also removes the
necessity to run dda_start() twice to get past a nullmove.
Best of this is, it also makes lookahead perform better. Before,
a nullmove just changing speed interrupted the lookahead chain,
now it no longer does. See straight-speeds.gcode and
...-Fsep.gcode, which produced different timings before, now
results are identical.
Also update the function description for dda_create().
Performance increase is impressive: another 75 clock cycles off
the slowest step, only 36 bytes binary size increase:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19652 bytes 138% 64% 31% 16%
Data: 2175 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 280 clock cycles.
LED on time maximum: 458 clock cycles.
LED on time average: 284.653 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 272 clock cycles.
LED on time maximum: 501 clock cycles.
LED on time average: 307.275 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 272 clock cycles.
LED on time maximum: 458 clock cycles.
LED on time average: 297.625 clock cycles.
Performance of straight-speeds{-Fsep}.gcode before:
straight-speeds.gcode statistics:
LED on occurences: 32000.
LED on time minimum: 272 clock cycles.
LED on time maximum: 586 clock cycles.
LED on time average: 298.75 clock cycles.
straight-speeds-Fsep.gcode statistics:
LED on occurences: 32000.
LED on time minimum: 272 clock cycles.
LED on time maximum: 672 clock cycles.
LED on time average: 298.79 clock cycles.
Now:
straight-speeds.gcode statistics:
LED on occurences: 32000.
LED on time minimum: 272 clock cycles.
LED on time maximum: 501 clock cycles.
LED on time average: 298.703 clock cycles.
straight-speeds-Fsep.gcode statistics:
LED on occurences: 32000.
LED on time minimum: 272 clock cycles.
LED on time maximum: 501 clock cycles.
LED on time average: 298.703 clock cycles.
There we save even 171 clock cycles :-)
This shaves off another 3 clock cycles without drawback. It
increases binary size by 8 bytes, but apparently only in places
where it doesn't matter.
Performance:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19616 bytes 137% 64% 31% 16%
Data: 2175 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 280 clock cycles.
LED on time maximum: 545 clock cycles.
LED on time average: 286.187 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 272 clock cycles.
LED on time maximum: 576 clock cycles.
LED on time average: 307.431 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 272 clock cycles.
LED on time maximum: 535 clock cycles.
LED on time average: 297.724 clock cycles.
This shaves off just 2 bytes binary size and saves only one clock
cycle for the slowest movement step. But heck, that's better than
nothing and comes without drawback, so let's keep this experiment.
Performance:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19608 bytes 137% 64% 31% 16%
Data: 2175 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 280 clock cycles.
LED on time maximum: 548 clock cycles.
LED on time average: 286.252 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 272 clock cycles.
LED on time maximum: 579 clock cycles.
LED on time average: 307.437 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 272 clock cycles.
LED on time maximum: 538 clock cycles.
LED on time average: 297.73 clock cycles.
Point of this change is to allow using these functions for
writing to the display, too, without duplicating all the code.
To reduce confusion, functions were renamed (they're no longer
'serial', after all:
serwrite_xxx() -> write_xxx()
sersendf_P() -> sendf_P()
To avoid changing all the existing code, a couple of macros
with the old names are provided. They might even be handy as
convenience macros.
Nicely, this addition costs no additional RAM. Not surprising, it
costs quite some binary size, 278 bytes. Sizes now:
Program: 24058 bytes 168% 79% 38% 19%
Data: 1525 bytes 149% 75% 38% 19%
EEPROM: 32 bytes 4% 2% 2% 1%
Regarding USB Serial: code was adjusted without testing on
hardware.
A comment in dda_queue.h says that MOVEBUFFER_SIZE no longer needs to be
a power of 2 in size. The comment is from a commit in 2011, but only
queue_full seems to be modified to make this true. Other places in the
code still assume that MOVEBUFFER_SIZE is always a power of 2, wrapping
it with "& (MOVEBUFFER_SIZE-1)".
Add a MB_NEXT(x) macro which can be used to definitively find the next
slot in the queue without using any boolean math. Replace all the
queue-position functions with new code that uses this MB_NEXT function
instead.
Also change the queue_full function to use this simpler method instead
of the complicated multi-step confusion which it did.
No code changes, but quite a few removals of __ARMEL_NOTYET__
guards. 20 such guards left.
Test: M105 should work and report plausible temperatures.
Current code size:
SIZES ARM... lpc1114
FLASH : 9460 bytes 29%
RAM : 1258 bytes 31%
EEPROM : 0 bytes 0%
We know already wether we start from a pause or not, so let's
take advantage of this knowledge instead of checking for
plausibility of a timer delay at interrupt time.
Costs just 8 bytes binary size:
SIZES ARM... lpc1114
FLASH : 7764 bytes 24%
RAM : 960 bytes 24%
EEPROM : 0 bytes 0%
Due to the less code at interrupt time, maximum step rate was
raised from 127.9 kHz to 130.6 kHz.
Works nicely, much less code than on AVR, because we have 32-bit
hardware timers.
Test: steppers should move. Only slowly, because dda_clock() isn't
called, yet, so no acceleration.
Pulse time on the Debug LED is 5.21 us or 250 clock ticks.
All in one chunk, because it's all hardware-independent and doing
them one by one would end up on not more than some typing
exercises.
Compiles fine. For testing, remove if (DEBUG... for M114 in
gcode_process.c. Then one can see how the queue fills up when
sending movements and M114 repeatedly. This time with actual
coordinates.
No stepper movements, yet, because set_timer() is still empty.
Compiles fine. For testing, remove if (DEBUG... for M114 in
gcode_process.c. Then one can see how the queue fills up when
sending movements and M114 repeatedly.
queue_step() isn't called, yet, the stepper timer is still missing.
The recent switch to send 'ok' postponed requires also sending a
newline in a few places, because this 'ok' is no longer at the
start of the line. Now it appears in its own line.
Some whitespace at line end was removed in heater.c.
Costs 14 bytes binary size on AVR.
Previously some features were excluded based on whether SIMULATOR
was defined. But in fact these should have been included when __AVR__
was defined. These used to be the same thing, but now with ARM coming
into the picture, they are not. Fix the situation so AVR includes are
truly only used when __AVR__ is defined.
The _crc16_update function appears to be specific to AVR; I've kept the
alternate implementation limited to AVR in that case in crc.c. I think
this is the right thing to do, but I am not sure. Maybe ARM has some
equivalent function in their libraries.
... instead of trying to fire an interrupt as quickly as possible.
This affects ACCELERATION_TEMPORAL only. It almost doubles the
achievable step rate. Measured maximum step rate (X axis only,
100 mm moves) is 40'000 steps/s on a 16 MHz electronics, so
approx. 50'000 steps/s on a 20 MHz controller, which is even
a bit faster than the ACCELERATION_RAMPING algorithm.
Tests with temporary test code were run and judging by these
tests, clock interrupts are now very reliable up to the point
where processing speed is simply exhaused.
Performance with ACCELERATION_RAMPING: this costs 10 bytes
binary size and exactly 2 clock cycles per step interrupt or
0.6% performance even. We could avoid this with a lot
of #ifdefs, but considering ACCELERATION_TEMPORAL will one
day be the default acceleration, skip these #ifdefs, also
for better code readability.
$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20528 bytes 144% 67% 33% 16%
RAM : 2188 bytes 214% 107% 54% 27%
EEPROM : 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 838.
LED on time minimum: 304 clock cycles.
LED on time maximum: 715 clock cycles.
LED on time average: 310.717 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 8585.
LED on time minimum: 309 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 360.051 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 304 clock cycles.
LED on time maximum: 710 clock cycles.
LED on time average: 332.32 clock cycles.
Pure cosmetical change.
Performance check:
$ cd testcases
$ ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode
[...]
SIZES ATmega... '168 '328(P) '644(P) '1280
FLASH : 20518 bytes 144% 67% 33% 16%
RAM : 2188 bytes 214% 107% 54% 27%
EEPROM : 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 838.
LED on time minimum: 302 clock cycles.
LED on time maximum: 713 clock cycles.
LED on time average: 308.72 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 8585.
LED on time minimum: 307 clock cycles.
LED on time maximum: 710 clock cycles.
LED on time average: 358.051 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 302 clock cycles.
LED on time maximum: 708 clock cycles.
LED on time average: 330.322 clock cycles.
It was certainly a good idea, but also always a suspect of
malfunctions and as such, almost never used. Newer code
organisation moves most of the code behind it to dda_clock()
anyways, so it also became mostly obsolete.
Rest In Peace, STEP_INTERRUPT_INTERRUPTIBLE, you were matter
of quite a number of interesting discussions and investigations.
Changes for Configtool by jbernardis <jeff.bernardis@gmail.com>
Test code which wants to customize config.h can do so without
touching config.h itself by wrapping config.h in a macro variable
which is passed in to the compiler. It defaults to "config.h" if
no override is provided.
This change would break makefile dependency checking since the selection
of a different header file on the command line is not noticed by make
as a build-trigger. To solve this, we add a layer to the BUILDDIR path
so build products are now specific to the USER_CONFIG choice if it is
not "config.h".
This obviously requires less place on the stack and accordingly a
few CPU cycles less, but more importantly, it lets decide
dda_start() whether a previous movement is to be taken into account
or not.
To make this decision more reliable, add a flag for movements done.
Else it could happen we'd try to join with a movement done long
before.
This code was accidentally removed long ago in a botched merge. This
patch recovers it and makes it build again. I've done minimal testing
and some necessary cleanup. It compiles and runs, but it probably still
has a few dust bunnies here and there.
I added registers and pin definitions to simulator.h and
simulator/simulator.c which I needed to match my Gen7-based config.
Other configs or non-AVR ports will need to define more or different
registers. Some registers are 16-bits, some are 8-bit, and some are just
constant values (enums). A more clever solution would read in the
chip-specific header and produce saner definitions which covered all
GPIOs. But this commit just takes the quick and easy path to support my
own hardware.
Most of this code originated in these commits:
commit cbf41dd4ad
Author: Stephan Walter <stephan@walter.name>
Date: Mon Oct 18 20:28:08 2010 +0200
document simulation
commit 3028b297f3
Author: Stephan Walter <stephan@walter.name>
Date: Mon Oct 18 20:15:59 2010 +0200
Add simulation code: use "make sim"
Additional tweaks:
Revert va_args processing for AVR, but keep 'int' generalization
for simulation. gcc wasn't lying. The sim really aborts without this.
Remove delay(us) from simulator (obsolete).
Improve the README.sim to demonstrate working pronterface connection
to sim. Also fix the build instructions.
Appease all stock configs.
Stub out intercom and shush usb_serial when building simulator.
Pretend to be all chip-types for config appeasement.
Replace sim_timer with AVR-simulator timer:
The original sim_timer and sim_clock provided direct replacements
for timer/clock.c in the main code. But when the main code changed,
simcode did not. The main clock.c was dropped and merged into timer.c.
Also, the timer.c now has movement calculation code in it in some
cases (ACCELERATION_TEMPORAL) and it would be wrong to teach the
simulator to do the same thing. Instead, teach the simulator to
emulate the AVR Timer1 functionality, reacting to values written to
OCR1A and OCR1B timer comparison registers.
Whenever OCR1A/B are changed, the sim_setTimer function needs to be
called. It is called automatically after a timer event, so changes
within the timer ISRs do not need to bother with this.
A C++ class could make this requirement go away by noticing the
assignment. On the other hand, a chip-agnostic timer.c would help
make the main code more portable. The latter cleanup is probably
better for us in the long run.
Before, endstops were checked on every step, wasting precious time.
Checking them 500 times a second should be more than sufficient.
Additionally, an endstop stop now properly decelerates the movement.
This is one important step towards handling accidental endstop hits
gracefully, as it avoids step losses in such situations.
As a sample application, use it in queue_empty().
There's also ATOMIC_BLOCK() coming with avr-libc, but this requires
a C99 compiler while Arduino IDE insists on running avr-gcc in C89 mode.
This means, modify existing code to let the lookahead algorithms
do their work. It also means to remove some unused code in
dda_lookahead.c and reordering some code to make it work with
LOOKAHEAD undefined.
This makes the code cleaner and the reduction of code
probably easily compensates for keeping global interrupts
enabled for a bit longer. Talked to macscifi about this.
Saves about 300 bytes of binary size.
This is a intrusive patch and for now, it's done for the X axis only.
To make comparison with the former approach easier ...
The advantages of this change:
- Converting from mm to steps in gcode_parse.c and back in dda.c
wastes cycles and accuracy.
- In dda.c, UM_PER_STEP simply goes away, so distance calculations
work now with STEPS_PER_MM > 500 just fine. 1/16 microstepping
on threaded rods (Z axis) becomes possible.
- Distance calculations (feedrate, acceleration, ...) become much
simpler.
- A wide range of STEPS_PER_M can now be handled at reasonable
(4 decimal digit) accuracy with a simple macro. Formerly,
we were limited to 500 steps/mm, now we can do 4'096 steps/mm
and could easily raise this another digit.
Disadvantages:
- STEPS_PER_MM is gone in config.h, using STEPS_PER_M is required,
because the preprocessor refuses to compare numbers with decimal
points in them.
- The DDA has to store the position in steps anyways to avoid
rounding errors.
This prevents nasty surprises because this restriction is not mentioned in the configuration files. It also allows better tuning of performance against memory usage.
Traumflug: costs 18 bytes flash, so it's bearable.
Specifically, disable interrupts just before returning and then enable
the timer interrupt if appropriate. This means that the timer interrupt
cannot actually fire until after the RETI, so the function cannot be
entered recursively.
manage the movebuffer (mb_head, mb_tail, movebuffer).
The approach taken is to leave all of the variables non-volatile
and use memory barriers to control reads/writes. read/modify/write
operations that can be done outside of the interrupt context are
protected by disabling interrupts.
Also, manually cache the pointer to the movebuffer of interest
as the compiler does a poor job of reusing the pointer even in
places that it could.
There are still some groups of accesses to the movebuffer data that need
to be protected by disabling interrupts, but these are all related
to sending back debug data. The error will cause occassional mismatched
values to be sent back via the serial connection, but they do not
affect the actual operation of the code, so they will be addressed
in a separate checkin.
Conflicts:
dda_queue.c
This costs 2 bytes of ram, but saves 60 bytes of flash. Doing so
also eliminates the need to disable interrupts while clearing flags
in the ifclock macro.
Conflicts:
clock.c
timer.c
timer.h