On ARM we use only the 16 byte hardware buffer for sending and
receiving over the serial line, which is often too short for
debugging messages. This implementation works fine and still
neither blocks nor introduces delays for short messages.
Removed while-loop. Looks like we need some more us than the LPC?!? With +7us
we do not lose characters anymore.
We have only one UART, we use only one UART, so it's pointless to
do pin mapping calculations at runtime.
SIZES ARM... stm32f411
FLASH : 4832 bytes 1%
RAM : 404 bytes 1%
EEPROM : 0 bytes 0%
@phord abstract this to: This happens only when !recalc_speed,
meaning we are cruising, not accelerating or decelerating. So it
pegs our dda->c at c_min if it never made it as far as c_min.
This commit will fix https://github.com/Traumflug/Teacup_Firmware/issues/69
delta_um can become very small, where maximum_feedrate_P is constant.
When moving this division out of the loop, the result can be wrong.
dda->total_steps becomes also very small with delta_um. So this will fit perfectly.
This reverts commit cd66feb8d1.
So let's bring this part back.
We save 35 clock cycles at 'LED on time maximum'
ATmega sizes '168 '328(P) '644(P) '1280
Program: 18038 bytes 126% 59% 29% 14%
Data: 1936 bytes 190% 95% 48% 24%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 217 clock cycles.
LED on time maximum: 520 clock cycles.
LED on time average: 249.626 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 22589.
LED on time minimum: 217 clock cycles.
LED on time maximum: 537 clock cycles.
LED on time average: 284.747 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 217 clock cycles.
LED on time maximum: 520 clock cycles.
LED on time average: 270.933 clock cycles.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 18266 bytes 128% 60% 29% 15%
Data: 1936 bytes 190% 95% 48% 24%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 243 clock cycles.
LED on time maximum: 555 clock cycles.
LED on time average: 250.375 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 22589.
LED on time minimum: 243 clock cycles.
LED on time maximum: 572 clock cycles.
LED on time average: 292.139 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 243 clock cycles.
LED on time maximum: 555 clock cycles.
LED on time average: 275.699 clock cycles.
start the simulation with ./parse_clean xyz, where 'xyz' can be anything to name the created files.
in the end you will get 3 pictures.
swan-reference-xyz.png how it should looks like.
swan-current-xyz.png how it will looks now.
swan-diff-xyz.png is the difference.
This 3 pictures show only the X-axis.
you will get also a forth file. pp-xyz.asc. you can open this file for example with meshlab and you can see that current model in 3d.
If you want to use your own gcode, please do the following:
Create a normal gcode. Delete any M116 (temp waitings). Maybe you want also deleting comments.
Then add M114 for every x line.
I do this with the swan-test.gcode:
sed '1~2 s/$/\nM114/g' < swan.gcode > swan-test.gcode
In `ACCELERATION_RAMPING` code we use the dda->id field even when we do
not enable `LOOKAHEAD`. Expose the variable and its related `idcnt`
when `ACCELERATION_RAMPING` is used.
Add a regression-test to catch this in the future.
Simple trick: raise the feedrate, no need to care about a milling
bit when running a simulation. This reduces simulated time and as
such, duration of the simulation (by about 50%).
Also remove G-code which was never executed because simulations
are chopped at 1 minute of simulation time and smooth-curves.gcode
took about 1.5 minutes.
Step pulse measurements remain about the same:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 17944 bytes 126% 59% 29% 14%
Data: 1920 bytes 188% 94% 47% 24%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 202 clock cycles.
LED on time maximum: 380 clock cycles.
LED on time average: 232.092 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 22589.
LED on time minimum: 194 clock cycles.
LED on time maximum: 423 clock cycles.
LED on time average: 254.425 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 220 clock cycles.
LED on time maximum: 380 clock cycles.
LED on time average: 245.575 clock cycles.
These values were queued up just for finding out individual axis
speeds in dda_find_crossing_speed(). Let's do this calculation
with other available movement properties and save 16 bytes of RAM
per movement queue entry.
First version of this commit forgot to take care of the feedrate
sign (prevF, currF). Lack of that found by @Wurstnase. Idea of
tweaking calculation of 'dv' to achieve this also by @Wurstnase.
It was tried to set the sign immediately after calculation of the
absolute values, but that resulted in larger ( = slower) code.
Binary size down 132 bytes, among that two loops. RAM usage down
256 bytes for the standard test case:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 17944 bytes 126% 59% 29% 14%
Data: 1920 bytes 188% 94% 47% 24%
EEPROM: 32 bytes 4% 2% 2% 1%
We calculate a safe join speed in dda_join_moves using data from
two source DDA movements. We ensure the DDA values we use are sane
by atomically copying them to local variables before beginning our
calculation. But later we discard all our results if the DDA went
live in the meantime, as evidenced by changes in `DDA->live` or
`DDA->id`.
Since we will not use the results of our calculations if either of
these change, we can safely reference all the other DDA values
non-atomically. Change the ATOMIC section to protect only the
`DDA->id` values at the start.
Added by Traumflug: this costs a negligible 4 bytes binary size:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 18082 bytes 127% 59% 29% 15%
Data: 2176 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
Following the resounding success on ARMs, let's try LTO on AVRs,
too. Advantage isn't all that well, binary size increases by 462
bytes and even an additional byte of RAM is needed.
According to @Wurstnase's research, this size increase is pretty
unique to the config.h.Profiling configuration. All other
configurations he tried actually showed a size drop.
Anyways, we have 15 to 17 clock cycles less on any step, so an
about 7% general stepping performance increase.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 18078 bytes 127% 59% 29% 15%
Data: 2176 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 202 clock cycles.
LED on time maximum: 380 clock cycles.
LED on time average: 232.092 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 220 clock cycles.
LED on time maximum: 423 clock cycles.
LED on time average: 255.22 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 220 clock cycles.
LED on time maximum: 380 clock cycles.
LED on time average: 245.575 clock cycles.
After researching this issue for the third time, I finally found
a proper solution: one can't keep an entire section without re-
writing the entire link script, but one can keep individual
symbols. That's what we do now, so we can use --gc-sections when
linking with SimulAVR support.
The problem came up again because -flto drops unused symbols, too.
This commit changes binary size drastically (1654 bytes less), so
let's take a new performance measurement snapshot:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 17616 bytes 123% 58% 28% 14%
Data: 2175 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 218 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 249.051 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 237 clock cycles.
LED on time maximum: 438 clock cycles.
LED on time average: 272.216 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 237 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 262.572 clock cycles.
Suggested by @Wurstnase. Apparently gcc got better, so it's
actually an advantage now.
Actually a pretty big advantage. While binary size decreases some
200 bytes, pulse length of the debug LED is a lot shorter
(measured on the scope):
without LTO: 4.59 us
with LTO: 3.65 us
That's a 25% performance increase by just turning on a flag!
Neither of them brought a performance improvement, so we revert
both. Commits as well as revert kept to preserve the knowledge
gained.
This reverts commits
"DDA, dda_start(): use mb_tail_dda directly." and
"DDA, dda_start(): don't pass mb_tail_dda as parameter."
Performance and binary size is back to what we had before:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19270 bytes 135% 63% 31% 15%
Data: 2179 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 218 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 249.051 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 237 clock cycles.
LED on time maximum: 438 clock cycles.
LED on time average: 272.216 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 237 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 262.572 clock cycles.
Just avoiding to pass mb_tail_dda as parameter didn't work out,
so how about using it directly? This is what this commit does.
Result: binary size another 32 bytes bigger, slowest step another
16 clock cycles slower. No dice.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19306 bytes 135% 63% 31% 15%
Data: 2179 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 218 clock cycles.
LED on time maximum: 414 clock cycles.
LED on time average: 249.436 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 237 clock cycles.
LED on time maximum: 457 clock cycles.
LED on time average: 272.256 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 237 clock cycles.
LED on time maximum: 414 clock cycles.
LED on time average: 262.595 clock cycles.
Instead, read the global variable directly.
The idea is that reading the global variable directly removes
the effort to build up a parameter stack, making things faster.
Actually, binary size increases by 4 bytes and the slowest step
takes 3 clock cycles longer. D'oh.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19274 bytes 135% 63% 31% 15%
Data: 2179 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 218 clock cycles.
LED on time maximum: 398 clock cycles.
LED on time average: 249.111 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 237 clock cycles.
LED on time maximum: 441 clock cycles.
LED on time average: 272.222 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 237 clock cycles.
LED on time maximum: 398 clock cycles.
LED on time average: 262.576 clock cycles.
As we have mb_tail_dda now, that's no longer necessary. Using
something like movebuffer[mb_tail] is more expensive than
dereferencing mb_tail_dda directly.
This is the first time we see a stepping performance improvement
since introducing mb_tail_dda. 13 clock cycles faster on the
slowest step, which is 9 cycles faster than before that
introduction.
Binary size also a nice 94 bytes down.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19270 bytes 135% 63% 31% 15%
Data: 2179 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 218 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 249.051 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 237 clock cycles.
LED on time maximum: 438 clock cycles.
LED on time average: 272.216 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 237 clock cycles.
LED on time maximum: 395 clock cycles.
LED on time average: 262.572 clock cycles.
For now, this costs 2 bytes RAM, 8 bytes binary size and slows
down the slowest step by 4 clock cycles. We expect opportunities
for improvements elsewhere, of course.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19434 bytes 136% 64% 31% 16%
Data: 2179 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 230 clock cycles.
LED on time maximum: 407 clock cycles.
LED on time average: 263.008 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 251 clock cycles.
LED on time maximum: 450 clock cycles.
LED on time average: 286.212 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 251 clock cycles.
LED on time maximum: 407 clock cycles.
LED on time average: 276.568 clock cycles.