Commit Graph

1697 Commits

Author SHA1 Message Date
Markus Hitter aa0ef9a3e0 AVR: turn on link time optimisation (LTO).
Following the resounding success on ARMs, let's try LTO on AVRs,
too. Advantage isn't all that well, binary size increases by 462
bytes and even an additional byte of RAM is needed.

According to @Wurstnase's research, this size increase is pretty
unique to the config.h.Profiling configuration. All other
configurations he tried actually showed a size drop.

Anyways, we have 15 to 17 clock cycles less on any step, so an
about 7% general stepping performance increase.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  18078 bytes      127%       59%       29%       15%
     Data:   2176 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 202 clock cycles.
  LED on time maximum: 380 clock cycles.
  LED on time average: 232.092 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 220 clock cycles.
  LED on time maximum: 423 clock cycles.
  LED on time average: 255.22 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 220 clock cycles.
  LED on time maximum: 380 clock cycles.
  LED on time average: 245.575 clock cycles.
2016-12-08 20:06:02 +01:00
Markus Hitter b7bd1ad3d7 Makefile-AVR: solve the .siminfo section problem properly.
After researching this issue for the third time, I finally found
a proper solution: one can't keep an entire section without re-
writing the entire link script, but one can keep individual
symbols. That's what we do now, so we can use --gc-sections when
linking with SimulAVR support.

The problem came up again because -flto drops unused symbols, too.

This commit changes binary size drastically (1654 bytes less), so
let's take a new performance measurement snapshot:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  17616 bytes      123%       58%       28%       14%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 218 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 249.051 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 438 clock cycles.
  LED on time average: 272.216 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 262.572 clock cycles.
2016-12-07 15:23:23 +01:00
Markus Hitter 877b9fae6f ARM: turn on link time optimisation (LTO).
Suggested by @Wurstnase. Apparently gcc got better, so it's
actually an advantage now.

Actually a pretty big advantage. While binary size decreases some
200 bytes, pulse length of the debug LED is a lot shorter
(measured on the scope):

  without LTO:  4.59 us
  with LTO:     3.65 us

That's a 25% performance increase by just turning on a flag!
2016-12-07 12:22:23 +01:00
Markus Hitter 39f66ef6b0 dda.c: pretty-format dda_start().
Formatting was messed up during all the recent changes.

Only whitespace and comment changes, no functional change.
2016-12-06 20:25:36 +01:00
Markus Hitter 7726b3179c DDA: revert recent dda_start() changes.
Neither of them brought a performance improvement, so we revert
both. Commits as well as revert kept to preserve the knowledge
gained.

This reverts commits

  "DDA, dda_start(): use mb_tail_dda directly." and
  "DDA, dda_start(): don't pass mb_tail_dda as parameter."

Performance and binary size is back to what we had before:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19270 bytes      135%       63%       31%       15%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 218 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 249.051 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 438 clock cycles.
  LED on time average: 272.216 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 262.572 clock cycles.
2016-12-06 20:24:38 +01:00
Markus Hitter e28afeca7d DDA, dda_start(): use mb_tail_dda directly.
Just avoiding to pass mb_tail_dda as parameter didn't work out,
so how about using it directly? This is what this commit does.

Result: binary size another 32 bytes bigger, slowest step another
16 clock cycles slower. No dice.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19306 bytes      135%       63%       31%       15%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 218 clock cycles.
  LED on time maximum: 414 clock cycles.
  LED on time average: 249.436 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 457 clock cycles.
  LED on time average: 272.256 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 414 clock cycles.
  LED on time average: 262.595 clock cycles.
2016-12-06 19:44:25 +01:00
Markus Hitter 480cc40618 DDA, dda_start(): don't pass mb_tail_dda as parameter.
Instead, read the global variable directly.

The idea is that reading the global variable directly removes
the effort to build up a parameter stack, making things faster.

Actually, binary size increases by 4 bytes and the slowest step
takes 3 clock cycles longer. D'oh.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19274 bytes      135%       63%       31%       15%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 218 clock cycles.
  LED on time maximum: 398 clock cycles.
  LED on time average: 249.111 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 441 clock cycles.
  LED on time average: 272.222 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 398 clock cycles.
  LED on time average: 262.576 clock cycles.
2016-12-06 19:44:23 +01:00
Markus Hitter d5eb8cd916 DDA: avoid looking up the movebuffer array.
As we have mb_tail_dda now, that's no longer necessary. Using
something like movebuffer[mb_tail] is more expensive than
dereferencing mb_tail_dda directly.

This is the first time we see a stepping performance improvement
since introducing mb_tail_dda. 13 clock cycles faster on the
slowest step, which is 9 cycles faster than before that
introduction.

Binary size also a nice 94 bytes down.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19270 bytes      135%       63%       31%       15%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 218 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 249.051 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 438 clock cycles.
  LED on time average: 272.216 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 237 clock cycles.
  LED on time maximum: 395 clock cycles.
  LED on time average: 262.572 clock cycles.
2016-12-06 15:33:26 +01:00
Markus Hitter eec0e00f85 dda_queue.c/.h: eliminate queue_current_movement().
Again no stepping performance improvement, but another 34 bytes
off the binary size:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19364 bytes      136%       64%       31%       16%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-12-06 15:08:50 +01:00
Markus Hitter 81cffde4e9 dda_queue.c/.h: eliminate queue_empty().
This is no longer needed, because mb_tail_dda gives the same
information, just faster. Wanted side effect: better encapsulation.

No stepping performance improvement, but binary size 36 bytes
smaller:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19398 bytes      136%       64%       31%       16%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-12-06 14:16:54 +01:00
Markus Hitter 2e13d2bc9d dda_queue.c/.h: introduce mb_tail_dda.
For now, this costs 2 bytes RAM, 8 bytes binary size and slows
down the slowest step by 4 clock cycles. We expect opportunities
for improvements elsewhere, of course.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19434 bytes      136%       64%       31%       16%
     Data:   2179 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 230 clock cycles.
  LED on time maximum: 407 clock cycles.
  LED on time average: 263.008 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 450 clock cycles.
  LED on time average: 286.212 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 407 clock cycles.
  LED on time average: 276.568 clock cycles.
2016-12-06 13:49:25 +01:00
Markus Hitter c181a813e7 dda_queue.c: take advantage of a special case.
No functional change or stepping performance improvement, but a
14 bytes smaller binary:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19426 bytes      136%       64%       31%       16%
     Data:   2177 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-12-06 13:20:41 +01:00
Markus Hitter 329dd14446 dda_queue.c: eliminate next_move() entirely.
All the simplifications before led to a simple three-line
function, one of which happened to duplicate a line of the calling
code. Also update comments mentioning this former function.

No stepping performance improvement, but cleaner code and 32 bytes
less binary size:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19440 bytes      136%       64%       31%       16%
     Data:   2177 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-12-06 13:19:48 +01:00
Markus Hitter 061924f448 dda_queue.c: inline a simplified version of next_move().
As we're in an interrupt already, we can simplify the test for an
empty queue. Slowest step down to 446 clock cycles, another 26
ticks less. Binary size only 36 bytes up:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19472 bytes      136%       64%       31%       16%
     Data:   2177 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 226 clock cycles.
  LED on time maximum: 403 clock cycles.
  LED on time average: 262.922 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 446 clock cycles.
  LED on time average: 286.203 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 403 clock cycles.
  LED on time average: 276.561 clock cycles.
2016-12-06 12:01:51 +01:00
Markus Hitter fc70e00ca2 DDA: don't queue up heater waits.
Not queuing up waits for the heaters in the movement queue removes
some code in performance critical paths. What a luck we just
implemented an alternative M116 functionality with the previous
commit :-)

Performance of the slowest step is decreased a nice 29 clock
cycles and binary size decreased by a whoppy 472 bytes. That's
still 210 bytes less than before implementing the alternative
heater wait.

Best of all, average step time is down some 21 clock cycles, too,
so we increased general stepping performance by no less than 5%.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19436 bytes      136%       64%       31%       16%
     Data:   2177 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 259 clock cycles.
  LED on time maximum: 429 clock cycles.
  LED on time average: 263.491 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 472 clock cycles.
  LED on time average: 286.259 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 251 clock cycles.
  LED on time maximum: 429 clock cycles.
  LED on time average: 276.616 clock cycles.
2016-12-05 21:36:03 +01:00
Markus Hitter fb49aef14d Make temperature waiting independent from the movement queue.
The plan is to remove this stuff from the movement queue.

We still accept additional G-code ... until a G0 or G1 appears.
This e.g. allows to do homing or read temperature reports while
waiting.

Keep messages exactly as they were before, perhaps some Host
applications try to parse this.

This needs 2 bytes RAM and 138 bytes binary size. Performance is
unchanged. Let's see how this compares to the size reduction when
we remove the temperature handling code from the movement queue.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19646 bytes      138%       64%       31%       16%
     Data:   2177 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 458 clock cycles.
  LED on time average: 284.653 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 501 clock cycles.
  LED on time average: 307.275 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 458 clock cycles.
  LED on time average: 297.625 clock cycles.
2016-12-05 13:56:09 +01:00
Markus Hitter 7875b50f80 gcode_process.c: remove G30.
This was "Go home via point". The RepRap community has apparently
decided for a super complex Z probing command with this number:

  http://reprap.org/wiki/G-code#G30:_Single_Z-Probe

This reduces binary size by 18 bytes:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19508 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-11-28 13:39:32 +01:00
Markus Hitter 974c4b7de8 dda.c: simplify copy of startpoint.
This reduces binary size by 26 bytes without drawback.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19526 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-11-27 22:16:58 +01:00
Markus Hitter 6d95620f32 DDA: don't queue up nullmoves.
Nullmoves are movements which don't actually move a stepper. For
example because it's a velocity change only or the movement is
shorter than a single motor step.

Not queueing them up removes the necessity to check for them,
which reduces code in critical areas. It also removes the
necessity to run dda_start() twice to get past a nullmove.

Best of this is, it also makes lookahead perform better. Before,
a nullmove just changing speed interrupted the lookahead chain,
now it no longer does. See straight-speeds.gcode and
...-Fsep.gcode, which produced different timings before, now
results are identical.

Also update the function description for dda_create().

Performance increase is impressive: another 75 clock cycles off
the slowest step, only 36 bytes binary size increase:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19652 bytes      138%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 458 clock cycles.
  LED on time average: 284.653 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 501 clock cycles.
  LED on time average: 307.275 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 458 clock cycles.
  LED on time average: 297.625 clock cycles.

Performance of straight-speeds{-Fsep}.gcode before:

  straight-speeds.gcode statistics:
  LED on occurences: 32000.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 586 clock cycles.
  LED on time average: 298.75 clock cycles.

  straight-speeds-Fsep.gcode statistics:
  LED on occurences: 32000.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 672 clock cycles.
  LED on time average: 298.79 clock cycles.

Now:

  straight-speeds.gcode statistics:
  LED on occurences: 32000.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 501 clock cycles.
  LED on time average: 298.703 clock cycles.

  straight-speeds-Fsep.gcode statistics:
  LED on occurences: 32000.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 501 clock cycles.
  LED on time average: 298.703 clock cycles.

There we save even 171 clock cycles :-)
2016-11-27 22:16:56 +01:00
Markus Hitter 766bd52337 Testcases: add straight-speeds-Fsep.gcode.
Distinction between straight-speeds.gcode and
straight-speeds-Fsep.gcode is that the latter has all speed
changes in a seperate line. If queueing works properly and
nullmoves get removed, both should produce identical results.
2016-11-27 16:05:30 +01:00
Markus Hitter d03305b989 Renew GtkWave save file.
GtkWave has apparently changed its save file format without
backwards compatibility. Content should be the same as before.
2016-11-27 16:05:30 +01:00
Markus Hitter bf0760f883 dda_queue.h: replace all tabs by spaces.
Another file 'clean' and only few changes, so likely not producing
a mess in future rebases. No functional changes.
2016-11-25 22:04:25 +01:00
Markus Hitter c594a2b995 dda_queue.c/.h: make mb_head local.
This doesn't change binary size or performance, but it increases
encapsulation.
2016-11-25 22:04:17 +01:00
Markus Hitter 721649e8ef dda_queue.c: eliminate another local variable.
This shaves off another 3 clock cycles without drawback. It
increases binary size by 8 bytes, but apparently only in places
where it doesn't matter.

Performance:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19616 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 545 clock cycles.
  LED on time average: 286.187 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 576 clock cycles.
  LED on time average: 307.431 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 535 clock cycles.
  LED on time average: 297.724 clock cycles.
2016-11-25 21:42:29 +01:00
Markus Hitter 9fe3855c3e dda_queue.c: eliminate a local variable.
This shaves off just 2 bytes binary size and saves only one clock
cycle for the slowest movement step. But heck, that's better than
nothing and comes without drawback, so let's keep this experiment.

Performance:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19608 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 548 clock cycles.
  LED on time average: 286.252 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 579 clock cycles.
  LED on time average: 307.437 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 538 clock cycles.
  LED on time average: 297.73 clock cycles.
2016-11-25 21:23:00 +01:00
Markus Hitter 3ba52e5906 Revert "DDA: use bitmask to track active axes [...]"
While this was an improvement of 9 clocks on AVRs, it had more
than the opposite effect on ARMs: 25 clocks slower on the slowest
step. Apparently ARMs aren't as efficient in reading and writing
single bits.

  https://github.com/Traumflug/Teacup_Firmware/issues/189#issuecomment-262837660

Performance on AVR is back to what we had before:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19610 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 549 clock cycles.
  LED on time average: 286.273 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 580 clock cycles.
  LED on time average: 307.439 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 539 clock cycles.
  LED on time average: 297.732 clock cycles.
2016-11-25 20:58:25 +01:00
Phil Hord 00a28cd502 DDA: use bitmask to track active axes for faster dda_step().
In dda_step instead of checking our 32-bit-wide delta[n] value,
just check a single bit in an 8-bit field.  Should be a tad faster.
It does make the code larger, but also about 10% faster, I think.

Performance:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19696 bytes      138%       65%       32%       16%
     Data:   2191 bytes      214%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 263 clock cycles.
  LED on time maximum: 532 clock cycles.
  LED on time average: 269.273 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 255 clock cycles.
  LED on time maximum: 571 clock cycles.
  LED on time average: 297.792 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 255 clock cycles.
  LED on time maximum: 522 clock cycles.
  LED on time average: 283.861 clock cycles.
2016-11-24 11:31:46 +01:00
Markus Hitter 1326db002f DDA, dda_step(): test for individual axes again.
This time we don't test for remaining steps, but wether the axis
moves at all. A much cheaper test, because this variable has to
be loaded into registers anyways.

Performance is now even better than without this test. Slowest
step down from 604 to 580 clock cycles.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19610 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 280 clock cycles.
  LED on time maximum: 549 clock cycles.
  LED on time average: 286.273 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 580 clock cycles.
  LED on time average: 307.439 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 272 clock cycles.
  LED on time maximum: 539 clock cycles.
  LED on time average: 297.732 clock cycles.
2016-11-23 12:47:46 +01:00
Markus Hitter 437cb08e42 dda.c: use muldiv in update_current_position().
Apparently gcc doesn't manage to sort nested calculations. Putting
all the muldiv()s into one line gives this error:

  dda.c: In function ‘update_current_position’:
  dda.c:969:1: error: unable to find a register to spill in class ‘POINTER_REGS’
   }
   ^
  dda.c:969:1: error: this is the insn:
  (insn 81 80 259 4 (set (reg:SI 82 [ D.3267 ])
          (mem:SI (post_inc:HI (reg:HI 2 r2 [orig:121 ivtmp.106 ] [121])) [4 MEM[base: _97, offset: 0B]+0 S4 A8])) dda.c:952 95 {*movsi}
       (expr_list:REG_INC (reg:HI 2 r2 [orig:121 ivtmp.106 ] [121])
          (nil)))
  dda.c:969: confused by earlier errors, bailing out

This problem was solved by doing the calculation step by step,
using intermediate variables. Glad I could help you, gcc :-)

Moving performance unchanged, M114 accuracy should have improved,
binary size 18 bytes bigger:

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19582 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%
2016-11-22 19:28:12 +01:00
Markus Hitter 20a0808887 DDA: don't count individual axis steps.
Using the Bresenham algorithm it's safe to assume that if the axis
with the most steps is done, all other axes are done, too.

This way we save a lot of variable loading in dda_step(). We also
save this very expensive comparison of all axis counters against
zero. Minor drawback: update_current_position() is now even slower.

About performance. The slowest step decreased from 719 to 604
clocks, which is quite an improvement. Average step time increased
for single axis movements by 16 clocks and decreased for multi-
axis movements. At the bottom line this should improve real-world
performance quite a bit, because a printer movement speed isn't
limited by average timings, but by the time needed for the slowest
step.

Along the way, binary size dropped by nice 244 bytes, RAM usage by
also nice 16 bytes.

  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19564 bytes      137%       64%       31%       16%
     Data:   2175 bytes      213%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 326 clock cycles.
  LED on time maximum: 595 clock cycles.
  LED on time average: 333.62 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 318 clock cycles.
  LED on time maximum: 604 clock cycles.
  LED on time average: 333.311 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 318 clock cycles.
  LED on time maximum: 585 clock cycles.
  LED on time average: 335.233 clock cycles.
2016-11-22 19:13:41 +01:00
Markus Hitter 92516b55ea run-in-simulavr.sh: enable size report.
Apparently this got forgotten earlier.
2016-11-21 19:56:19 +01:00
Markus Hitter 6e87ee5f96 Makefile-AVR: add a target for our standard performance test.
Our standard performance test is to run these three G-code files
in SimulAVR and recording step pulse timings. While this certainly
doesn't cover everything related to possible performance
measurements, it's a good basic standard to compare code changes.

Current performance:
  ATmega sizes               '168   '328(P)   '644(P)     '1280
  Program:  19808 bytes      139%       65%       32%       16%
     Data:   2191 bytes      214%      107%       54%       27%
   EEPROM:     32 bytes        4%        2%        2%        1%

  short-moves.gcode statistics:
  LED on occurences: 888.
  LED on time minimum: 308 clock cycles.
  LED on time maximum: 729 clock cycles.
  LED on time average: 317.393 clock cycles.

  smooth-curves.gcode statistics:
  LED on occurences: 23648.
  LED on time minimum: 308 clock cycles.
  LED on time maximum: 726 clock cycles.
  LED on time average: 354.825 clock cycles.

  triangle-odd.gcode statistics:
  LED on occurences: 1636.
  LED on time minimum: 308 clock cycles.
  LED on time maximum: 719 clock cycles.
  LED on time average: 336.327 clock cycles.
2016-11-21 19:54:59 +01:00
Markus Hitter 77a790e094 Makefile-AVR: add a target for running in SimulAVR.
This is useful for running in a standardised environment and also
demonstrates how to run an executable in SimulAVR.
2016-11-21 18:01:40 +01:00
Wurstnase 47dc3aed89 dda_lookahead.c: one more debug variable defined out by default.
Traumflug's note: if one uses #define LOOKAHEAD_DEBUG at line 177,
  one should use the same symbol in line 321. Edited the commit to
  do so.

This reduces binary size by 38 bytes and RAM usage by 4 bytes.
2016-11-20 15:18:37 +01:00
Matt Gilbert a691f2a8ab PCBScriber config files.
PCBScriber is a printer for the scratch 'n etch method, see

  http://reprap.org/wiki/PCBScriber

Commit reviewer Traumflug's note:

 - Rebased to current branch 'experimental', which adds
   USE_INTERNAL_PULLDOWNS.

 - Removed DEFINE_HOMING for now, this part isn't cooked, yet.
   For example, it doesn't pass regression tests.

 - Thank you very much for the contribution!
2016-11-20 14:45:45 +01:00
Markus Hitter 85fc7f86f4 Teacup_Firmware.ino: add a hint to Configtool. 2016-11-19 10:24:51 +01:00
Markus Hitter 417d519ca5 Rename the Arduino IDE sketch file from .pde to .ino.
Arduino IDE versions requiring .pde as extension are now gone for
quite a while.
2016-11-19 10:15:58 +01:00
Markus Hitter a0fc485fa6 Move branch 'issue-196' to the attic.
This was an attempt to make Teacup sources compatible with
Arduino IDE 1.6.0 - 1.6.9 and became obsolete as of 1.6.10. The
problem was fixed on the Arduino IDE side.
2016-11-19 10:15:01 +01:00
wurstnase cd66feb8d1 dda.c: let's save 3 divisions. 2016-11-14 21:49:44 +01:00
wurstnase 5b11a39155 DDA: make lookahead independent of X.
We calculate all steps from the fastest axis now. So X and Y
steps_per_m don't have to be the same anymore.

Traumflug's: another 16 bytes program size off on AVR, same size
on LPC1114.
2016-11-14 21:43:54 +01:00
wurstnase 2b1f3371c7 DDA: get rid of fast_spm.
We need the fastest axis instead of its steps.

Eleminates also an overflow when ACCELERATION > 596.

We save 118 bytes program and 2 bytes data.

Reviewer Traumflug's note: I see 100 bytes program and 32 bytes
RAM saving on ATmegas here. 16 and 32 on the LPC 1114. Either way:
great stuff!
2016-11-14 21:36:00 +01:00
Wurstnase 39cababb07 dda.c: don't apply feedrate multiplier when searching endstops.
With M220 we can increase the step rate while printing. But when
using this feature it could cause unexpected behaviour while
homing.
2016-11-08 21:21:34 +01:00
Phil Hord 9d42fa4ac1 Configtool: speed up startup with wx-tricks.
This should fix issue #235.

Recently ConfigTool has been very slow for me on Ubuntu Linux.
When I run the app there is a 15 second wait before the window is
first displayed.  I bisected the problem and found it was tied to
the number of pins in `pinNames`, and ultimately that it was
caused by a slow initializer in wx.Choice() when the choices are
loaded when the widget is created.  For some reason, moving the
load after the widget is created is significantly faster.  This
change reduces my startup time to just under 4 seconds.

Further speedup could be had by using lazy initialization of the
controls.  But the controls are too bound up in the loaded data
to make this simple.  Maybe I will attack it later.

There is still a significant delay when closing the window, but I
haven't tracked what causes it.  Maybe it is caused just by
destroying all these pin controls.

In the process of making this change, I wanted to simplify the
number of locations that bothered to copy the pinNames list and,
to support lazy loading, to try to keep the same list in all
pinChoice controls.  I noticed that all the pinChoice controls
already have the same parameters passed to the addPinChoice
function which makes them redundant and confusing.  I removed the
extra initializers and just rely on pinNames as the only list
option in addPinChoice for now.  Maybe this flexibility is needed
for some reason later, but I can't see a purpose for it now.

Notes by reviewer Traumflug:

First of all, which "trick"? That's an excellent code
simplification and if this happens to make startup faster (it
does), all the better.

Measured startup & shutdown time here (click window close as soon
as it appears):

  Before:                With this commit:
  real    0m4.222s       real    0m3.780s
  user    0m3.864s       user    0m3.452s
  sys     0m0.084s       sys     0m0.100s

As the speedup was far more significant on the commit author's
machine, it might be a memory consumption issue (leading to
swapping on a small RAM machine). Linux allows to view this in
/proc/<pid>/status.

         Before:          Now:
VmPeak:	  708360 kB     708372 kB
VmSize:	  658916 kB     658756 kB
VmHWM:	   73792 kB      73492 kB
VmRSS:	   73792 kB      73492 kB
VmData:	  402492 kB     402332 kB

Still no obvious indicator, but a 300 kB smaller memory footprint
is certainly nice.
2016-10-26 22:00:21 +02:00
Markus Hitter 36f54adb7f thermistortablefile.py: fix output parameter list.
If you attempt a Steinhart-Hart table in the configtool with
parameters (4700, 25, 100000, 209, 475, 256, 201) it fails with a:

...
 File "/Users/drf/2014/RepRap/GIT/Teacup_Firmware/configtool/
   thermistortablefile.py", line 169, in SteinhartHartTable
   (i, int(t * 4), int(delta * 4 * 256), c, int(t), int(round(r))),
  TypeError: not enough arguments for format string

Catched and fix provided by dr5fn, this should fix issue #246.
2016-10-26 20:32:45 +02:00
Markus Hitter ef94d0672d Configtool: don't assign values to tuples.
Heck, that's simply forbidden. A C compiler had catched this in a
split second at compile time, Python didn't until the faulty code
section was actually executed (a section of code for rare cases).

The simple fix is to replace the old tuple with a changed, new
tuple.

This resolved issue #242.
2016-10-21 22:03:34 +02:00
Wurstnase dc5c9656ed dda_lookahead.c: remove use_lookahead.
Use of this variable is gone 2013. No functional change, no
binary size change.
2016-10-21 21:55:11 +02:00
Markus Hitter e9b2bf45cb Config files: update comment/help text for USE_INTERNAL_PULLUPS. 2016-09-30 13:51:01 +02:00
Markus Hitter e49de09f58 Config files: introduce USE_INTERNAL_PULLDOWN. 2016-09-30 13:48:46 +02:00
Markus Hitter b47e625d58 pinio.h: support USE_INTERNAL_PULLDOWNS.
That's the counterpart to USE_INTERNAL_PULLUPS and needed on the
Gen7-ARM board.
2016-09-30 13:16:01 +02:00
Markus Hitter aed6f5a14b pinio.h: support pulldown resistors.
These are ARM only, because ATmegas have no such feature.
2016-09-30 13:15:31 +02:00