Commit Graph

954 Commits

Author SHA1 Message Date
Markus Hitter 2ad7517e27 preprocessor_math.h, SQRT(): take a better initial guess.
Now results are apparently accurate across the whole uint32 range.
At least, this test passes with all numbers being exact:

  #include "preprocessor_math.h"
  #include <math.h>
  ... in main() ...
  sersendf_P(PSTR("0:  %lu  %lu\n"), (uint32_t)SQRT(0), (uint32_t)sqrt(0));
  sersendf_P(PSTR("1:  %lu  %lu\n"), (uint32_t)SQRT(1), (uint32_t)sqrt(1));
  sersendf_P(PSTR("2:  %lu  %lu\n"), (uint32_t)SQRT(2), (uint32_t)sqrt(2));
  sersendf_P(PSTR("3:  %lu  %lu\n"), (uint32_t)SQRT(3), (uint32_t)sqrt(3));
  sersendf_P(PSTR("4:  %lu  %lu\n"), (uint32_t)SQRT(4), (uint32_t)sqrt(4));
  sersendf_P(PSTR("5:  %lu  %lu\n"), (uint32_t)SQRT(5), (uint32_t)sqrt(5));
  sersendf_P(PSTR("6:  %lu  %lu\n"), (uint32_t)SQRT(6), (uint32_t)sqrt(6));
  sersendf_P(PSTR("7:  %lu  %lu\n"), (uint32_t)SQRT(7), (uint32_t)sqrt(7));
  sersendf_P(PSTR("8:  %lu  %lu\n"), (uint32_t)SQRT(8), (uint32_t)sqrt(8));
  sersendf_P(PSTR("9:  %lu  %lu\n"), (uint32_t)SQRT(9), (uint32_t)sqrt(9));
  sersendf_P(PSTR("10:  %lu  %lu\n"), (uint32_t)SQRT(10), (uint32_t)sqrt(10));
  sersendf_P(PSTR("20:  %lu  %lu\n"), (uint32_t)SQRT(20), (uint32_t)sqrt(20));
  sersendf_P(PSTR("30:  %lu  %lu\n"), (uint32_t)SQRT(30), (uint32_t)sqrt(30));
  sersendf_P(PSTR("40:  %lu  %lu\n"), (uint32_t)SQRT(40), (uint32_t)sqrt(40));
  sersendf_P(PSTR("50:  %lu  %lu\n"), (uint32_t)SQRT(50), (uint32_t)sqrt(50));
  sersendf_P(PSTR("60:  %lu  %lu\n"), (uint32_t)SQRT(60), (uint32_t)sqrt(60));
  sersendf_P(PSTR("70:  %lu  %lu\n"), (uint32_t)SQRT(70), (uint32_t)sqrt(70));
  sersendf_P(PSTR("80:  %lu  %lu\n"), (uint32_t)SQRT(80), (uint32_t)sqrt(80));
  sersendf_P(PSTR("90:  %lu  %lu\n"), (uint32_t)SQRT(90), (uint32_t)sqrt(90));
  sersendf_P(PSTR("100:  %lu  %lu\n"), (uint32_t)SQRT(100), (uint32_t)sqrt(100));
  sersendf_P(PSTR("200:  %lu  %lu\n"), (uint32_t)SQRT(200), (uint32_t)sqrt(200));
  sersendf_P(PSTR("300:  %lu  %lu\n"), (uint32_t)SQRT(300), (uint32_t)sqrt(300));
  sersendf_P(PSTR("400:  %lu  %lu\n"), (uint32_t)SQRT(400), (uint32_t)sqrt(400));
  sersendf_P(PSTR("500:  %lu  %lu\n"), (uint32_t)SQRT(500), (uint32_t)sqrt(500));
  sersendf_P(PSTR("600:  %lu  %lu\n"), (uint32_t)SQRT(600), (uint32_t)sqrt(600));
  sersendf_P(PSTR("700:  %lu  %lu\n"), (uint32_t)SQRT(700), (uint32_t)sqrt(700));
  sersendf_P(PSTR("800:  %lu  %lu\n"), (uint32_t)SQRT(800), (uint32_t)sqrt(800));
  sersendf_P(PSTR("900:  %lu  %lu\n"), (uint32_t)SQRT(900), (uint32_t)sqrt(900));
  sersendf_P(PSTR("1000:  %lu  %lu\n"), (uint32_t)SQRT(1000), (uint32_t)sqrt(1000));
  sersendf_P(PSTR("2000:  %lu  %lu\n"), (uint32_t)SQRT(2000), (uint32_t)sqrt(2000));
  sersendf_P(PSTR("3000:  %lu  %lu\n"), (uint32_t)SQRT(3000), (uint32_t)sqrt(3000));
  sersendf_P(PSTR("4000:  %lu  %lu\n"), (uint32_t)SQRT(4000), (uint32_t)sqrt(4000));
  sersendf_P(PSTR("5000:  %lu  %lu\n"), (uint32_t)SQRT(5000), (uint32_t)sqrt(5000));
  sersendf_P(PSTR("6000:  %lu  %lu\n"), (uint32_t)SQRT(6000), (uint32_t)sqrt(6000));
  sersendf_P(PSTR("7000:  %lu  %lu\n"), (uint32_t)SQRT(7000), (uint32_t)sqrt(7000));
  sersendf_P(PSTR("8000:  %lu  %lu\n"), (uint32_t)SQRT(8000), (uint32_t)sqrt(8000));
  sersendf_P(PSTR("9000:  %lu  %lu\n"), (uint32_t)SQRT(9000), (uint32_t)sqrt(9000));
  sersendf_P(PSTR("10000:  %lu  %lu\n"), (uint32_t)SQRT(10000), (uint32_t)sqrt(10000));
  sersendf_P(PSTR("20000:  %lu  %lu\n"), (uint32_t)SQRT(20000), (uint32_t)sqrt(20000));
  sersendf_P(PSTR("30000:  %lu  %lu\n"), (uint32_t)SQRT(30000), (uint32_t)sqrt(30000));
  sersendf_P(PSTR("40000:  %lu  %lu\n"), (uint32_t)SQRT(40000), (uint32_t)sqrt(40000));
  sersendf_P(PSTR("50000:  %lu  %lu\n"), (uint32_t)SQRT(50000), (uint32_t)sqrt(50000));
  sersendf_P(PSTR("60000:  %lu  %lu\n"), (uint32_t)SQRT(60000), (uint32_t)sqrt(60000));
  sersendf_P(PSTR("70000:  %lu  %lu\n"), (uint32_t)SQRT(70000), (uint32_t)sqrt(70000));
  sersendf_P(PSTR("80000:  %lu  %lu\n"), (uint32_t)SQRT(80000), (uint32_t)sqrt(80000));
  sersendf_P(PSTR("90000:  %lu  %lu\n"), (uint32_t)SQRT(90000), (uint32_t)sqrt(90000));
  sersendf_P(PSTR("100000:  %lu  %lu\n"), (uint32_t)SQRT(100000), (uint32_t)sqrt(100000));
  sersendf_P(PSTR("200000:  %lu  %lu\n"), (uint32_t)SQRT(200000), (uint32_t)sqrt(200000));
  sersendf_P(PSTR("300000:  %lu  %lu\n"), (uint32_t)SQRT(300000), (uint32_t)sqrt(300000));
  sersendf_P(PSTR("400000:  %lu  %lu\n"), (uint32_t)SQRT(400000), (uint32_t)sqrt(400000));
  sersendf_P(PSTR("500000:  %lu  %lu\n"), (uint32_t)SQRT(500000), (uint32_t)sqrt(500000));
  sersendf_P(PSTR("600000:  %lu  %lu\n"), (uint32_t)SQRT(600000), (uint32_t)sqrt(600000));
  sersendf_P(PSTR("700000:  %lu  %lu\n"), (uint32_t)SQRT(700000), (uint32_t)sqrt(700000));
  sersendf_P(PSTR("800000:  %lu  %lu\n"), (uint32_t)SQRT(800000), (uint32_t)sqrt(800000));
  sersendf_P(PSTR("900000:  %lu  %lu\n"), (uint32_t)SQRT(900000), (uint32_t)sqrt(900000));
  sersendf_P(PSTR("1000000:  %lu  %lu\n"), (uint32_t)SQRT(1000000), (uint32_t)sqrt(1000000));
  sersendf_P(PSTR("2000000:  %lu  %lu\n"), (uint32_t)SQRT(2000000), (uint32_t)sqrt(2000000));
  sersendf_P(PSTR("3000000:  %lu  %lu\n"), (uint32_t)SQRT(3000000), (uint32_t)sqrt(3000000));
  sersendf_P(PSTR("4000000:  %lu  %lu\n"), (uint32_t)SQRT(4000000), (uint32_t)sqrt(4000000));
  sersendf_P(PSTR("5000000:  %lu  %lu\n"), (uint32_t)SQRT(5000000), (uint32_t)sqrt(5000000));
  sersendf_P(PSTR("6000000:  %lu  %lu\n"), (uint32_t)SQRT(6000000), (uint32_t)sqrt(6000000));
  sersendf_P(PSTR("7000000:  %lu  %lu\n"), (uint32_t)SQRT(7000000), (uint32_t)sqrt(7000000));
  sersendf_P(PSTR("8000000:  %lu  %lu\n"), (uint32_t)SQRT(8000000), (uint32_t)sqrt(8000000));
  sersendf_P(PSTR("9000000:  %lu  %lu\n"), (uint32_t)SQRT(9000000), (uint32_t)sqrt(9000000));
  sersendf_P(PSTR("10000000:  %lu  %lu\n"), (uint32_t)SQRT(10000000), (uint32_t)sqrt(10000000));
  sersendf_P(PSTR("20000000:  %lu  %lu\n"), (uint32_t)SQRT(20000000), (uint32_t)sqrt(20000000));
  sersendf_P(PSTR("30000000:  %lu  %lu\n"), (uint32_t)SQRT(30000000), (uint32_t)sqrt(30000000));
  sersendf_P(PSTR("40000000:  %lu  %lu\n"), (uint32_t)SQRT(40000000), (uint32_t)sqrt(40000000));
  sersendf_P(PSTR("50000000:  %lu  %lu\n"), (uint32_t)SQRT(50000000), (uint32_t)sqrt(50000000));
  sersendf_P(PSTR("60000000:  %lu  %lu\n"), (uint32_t)SQRT(60000000), (uint32_t)sqrt(60000000));
  sersendf_P(PSTR("70000000:  %lu  %lu\n"), (uint32_t)SQRT(70000000), (uint32_t)sqrt(70000000));
  sersendf_P(PSTR("80000000:  %lu  %lu\n"), (uint32_t)SQRT(80000000), (uint32_t)sqrt(80000000));
  sersendf_P(PSTR("90000000:  %lu  %lu\n"), (uint32_t)SQRT(90000000), (uint32_t)sqrt(90000000));
  sersendf_P(PSTR("100000000:  %lu  %lu\n"), (uint32_t)SQRT(100000000), (uint32_t)sqrt(100000000));
  sersendf_P(PSTR("200000000:  %lu  %lu\n"), (uint32_t)SQRT(200000000), (uint32_t)sqrt(200000000));
  sersendf_P(PSTR("300000000:  %lu  %lu\n"), (uint32_t)SQRT(300000000), (uint32_t)sqrt(300000000));
  sersendf_P(PSTR("400000000:  %lu  %lu\n"), (uint32_t)SQRT(400000000), (uint32_t)sqrt(400000000));
  sersendf_P(PSTR("500000000:  %lu  %lu\n"), (uint32_t)SQRT(500000000), (uint32_t)sqrt(500000000));
  sersendf_P(PSTR("600000000:  %lu  %lu\n"), (uint32_t)SQRT(600000000), (uint32_t)sqrt(600000000));
  sersendf_P(PSTR("700000000:  %lu  %lu\n"), (uint32_t)SQRT(700000000), (uint32_t)sqrt(700000000));
  sersendf_P(PSTR("800000000:  %lu  %lu\n"), (uint32_t)SQRT(800000000), (uint32_t)sqrt(800000000));
  sersendf_P(PSTR("900000000:  %lu  %lu\n"), (uint32_t)SQRT(900000000), (uint32_t)sqrt(900000000));
  sersendf_P(PSTR("1000000000:  %lu  %lu\n"), (uint32_t)SQRT(1000000000), (uint32_t)sqrt(1000000000));
  sersendf_P(PSTR("2000000000:  %lu  %lu\n"), (uint32_t)SQRT(2000000000), (uint32_t)sqrt(2000000000));
  sersendf_P(PSTR("3000000000:  %lu  %lu\n"), (uint32_t)SQRT(3000000000), (uint32_t)sqrt(3000000000));
  sersendf_P(PSTR("4000000000:  %lu  %lu\n"), (uint32_t)SQRT(4000000000), (uint32_t)sqrt(4000000000));
2014-08-31 19:10:07 +02:00
Markus Hitter 6f83519a1d Add preprocessor math.
For now this is a square root function which should solve entirely
in the preprocessor. Test results described in the file.

Test code for runtime results, inserted right before the main loop
in mendel.c:

  for (uint32_t i = 0; i < 10000000; i++) {
    uint32_t mathlib = (uint32_t)(sqrt(i) + .5);
    uint32_t preprocessor = (uint32_t)(SQRT(i) + .5);

    if (mathlib != preprocessor) {
      sersendf_P(PSTR("%lu: %lu %lu\n"), i, mathlib, preprocessor);
      break;
    }

    if ((i & 0x00001fff) == 0)
      sersendf_P(PSTR("%lu\n"), i);
  }
  sersendf_P(PSTR("Square root check done.\n"));

Test code for compile time results:

  sersendf_P(PSTR("10000000: %lu\n"), (uint32_t)SQRT(10000000));
  sersendf_P(PSTR("10000000: %lu\n"), (uint32_t)sqrt(10000000));
  sersendf_P(PSTR("20000000: %lu\n"), (uint32_t)SQRT(20000000));
  sersendf_P(PSTR("20000000: %lu\n"), (uint32_t)sqrt(20000000));
  sersendf_P(PSTR("30000000: %lu\n"), (uint32_t)SQRT(30000000));
  sersendf_P(PSTR("30000000: %lu\n"), (uint32_t)sqrt(30000000));
  sersendf_P(PSTR("40000000: %lu\n"), (uint32_t)SQRT(40000000));
  sersendf_P(PSTR("40000000: %lu\n"), (uint32_t)sqrt(40000000));
  sersendf_P(PSTR("50000000: %lu\n"), (uint32_t)SQRT(50000000));
  sersendf_P(PSTR("50000000: %lu\n"), (uint32_t)sqrt(50000000));
  sersendf_P(PSTR("60000000: %lu\n"), (uint32_t)SQRT(60000000));
  sersendf_P(PSTR("60000000: %lu\n"), (uint32_t)sqrt(60000000));
  sersendf_P(PSTR("70000000: %lu\n"), (uint32_t)SQRT(70000000));
  sersendf_P(PSTR("70000000: %lu\n"), (uint32_t)sqrt(70000000));
  sersendf_P(PSTR("80000000: %lu\n"), (uint32_t)SQRT(80000000));
  sersendf_P(PSTR("80000000: %lu\n"), (uint32_t)sqrt(80000000));
  sersendf_P(PSTR("90000000: %lu\n"), (uint32_t)SQRT(90000000));
  sersendf_P(PSTR("90000000: %lu\n"), (uint32_t)sqrt(90000000));
2014-08-31 19:09:59 +02:00
Phil Hord 76bf5ef75a Datalog: show traced data as signed ints, not unsigned. 2014-08-31 19:09:37 +02:00
Phil Hord 24f5416bba DDA: Rename confusing variable name.
'all_time' sounds like forever to me, but this variable really
tracks the last time we hit one of "all the axes".  It sticks
out more now in looping, so rename it to make sense.
2014-08-31 19:09:24 +02:00
Phil Hord bc4cf20341 Trivial cleanups.
Fix some formatting and hide a couple of variables when they're
not being used.
2014-08-31 19:09:15 +02:00
Phil Hord f9f068596d DDA: Move axis calculations into loops, part 9 (last part).
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 9 is, finally use this set_direction() thing. As a dessert
topping, it reduces binary size by another 122 bytes.

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 19988 bytes          140%       66%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%
2014-08-31 19:09:07 +02:00
Markus Hitter 96e9ae4dab dda.h: comment on these direction flags and other things. 2014-08-31 19:08:57 +02:00
Markus Hitter 41e76ca9fe dda.c: make update_current_position() even smaller.
Saves another 24 bytes.

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 20110 bytes          141%       66%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

Using muldiv() would be more accurate, but unfortunately, the
compiler bails out:

   static const axes_uint32_t PROGMEM steps_per_mm_P = {
                                                           ^
dda.c:889:1: error: unable to find a register to spill in class ‘POINTER_REGS’
 }
 ^
dda.c:889:1: error: this is the insn:
(insn 81 80 83 6 (set (reg:SI 77 [ D.3086 ])
        (mem:SI (post_inc:HI (reg:HI 2 r2 [orig:103 ivtmp.106 ] [103])) [3 MEM[base: _82, offset: 0B]+0 S4 A8])) dda.c:881 94 {*movsi}
     (expr_list:REG_INC (reg:HI 2 r2 [orig:103 ivtmp.106 ] [103])
        (nil)))
dda.c:889: confused by earlier errors, bailing out

Another one is, calculating this:

   (int32_t)get_direction(dda, i) *
   move_state.steps[i] * 1000 / pgm_read_dword(&steps_per_mm_P[i]);

produces nonsense values for negative returns from get_direction().
Apparently, the compiler doesn't want to divide negative values???
Odd. Anyways, sufficient parentheses solve the problem.
2014-08-31 19:08:49 +02:00
Phil Hord b552447789 DDA: Move axis calculations into loops, part 8.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 8 is, move remaining update_current_position() into a loop.

This makes the binary 134 bytes smaller. As it's not critical,
no performance test.

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 20134 bytes          141%       66%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%
2014-08-31 19:08:42 +02:00
Phil Hord 80b29b727b DDA: Move axis calculations into loops, part 7.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 7 is, turn update_current_position() in dda.c partially into
a loop. Surprise, surprise, this changes neither binary size nor
performance. Looking into the generated assembly, the loop is
indeed completely unrolled. Apparently that's smaller than a
real loop.

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 20270 bytes          142%       66%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 279945 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 722 clock cycles.
LED on time average: 315.253 clock cycles.

smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 3297806 clock cycles.
LED on time minimum: 311 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 361.443 clock cycles.

triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 546946 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 334.319 clock cycles.
2014-08-31 19:08:34 +02:00
David Forrest 32481e2799 debug.h: Align M111 debug bit codes with Repetier-Host.
No code changes, binary size and performance kept.
2014-08-31 19:08:26 +02:00
Markus Hitter cc9c9ff7b4 DDA: Revert move axis calculations into loops, part 6a-c.
Sad but true, this experiment didn't work out. Performance loss
due to looping in dda_step() is still at least 16% with the best
algorithm found.
2014-08-31 19:08:15 +02:00
Markus Hitter 1fc4a26ccd DDA: Move axis calculations into loops, part 6c.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 6c removes do_step(), but still tries to keep a loop. This
about the maximum of performance I (Traumflug) can think of.
Binary size is as good as with the former attempt, but performance
is actually pretty bad, 45% worse than without looping:

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 19876 bytes          139%       65%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 406041 clock cycles.
LED on time minimum: 448 clock cycles.
LED on time maximum: 864 clock cycles.
LED on time average: 457.253 clock cycles.

smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 4791132 clock cycles.
LED on time minimum: 453 clock cycles.
LED on time maximum: 867 clock cycles.
LED on time average: 525.113 clock cycles.

triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 800586 clock cycles.
LED on time minimum: 448 clock cycles.
LED on time maximum: 867 clock cycles.
LED on time average: 489.356 clock cycles.
2014-08-31 19:08:07 +02:00
Markus Hitter 808f5dcfca DDA: Move axis calculations into loops, part 6b.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 6b moves do_step() from the "tidiest" place into where it's
currently used, dda.c. Binary size goes down another 34 bytes, to
a total savings of 408 bytes and performance is much better, but
still 16% lower than without using loops:

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 19874 bytes          139%       65%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 320000 clock cycles.
LED on time minimum: 351 clock cycles.
LED on time maximum: 772 clock cycles.
LED on time average: 360.36 clock cycles.

smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 3875874 clock cycles.
LED on time minimum: 356 clock cycles.
LED on time maximum: 773 clock cycles.
LED on time average: 424.8 clock cycles.

triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 640357 clock cycles.
LED on time minimum: 351 clock cycles.
LED on time maximum: 773 clock cycles.
LED on time average: 391.416 clock cycles.
2014-08-31 19:07:59 +02:00
Phil Hord b83449d8c3 DDA: Move axis calculations into loops, part 6a.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 6a is putting stuff inside the step interrupt into a loop,
too. do_step() is put into the "tidiest" place. Binary size goes
down a remarkable 374 bytes, but stepping performance suffers by
almost 30%.

Traumflug's performance measurements:

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 19908 bytes          139%       65%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 354537 clock cycles.
LED on time minimum: 390 clock cycles.
LED on time maximum: 806 clock cycles.
LED on time average: 399.253 clock cycles.

smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 4268896 clock cycles.
LED on time minimum: 395 clock cycles.
LED on time maximum: 807 clock cycles.
LED on time average: 467.875 clock cycles.

triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 706846 clock cycles.
LED on time minimum: 390 clock cycles.
LED on time maximum: 807 clock cycles.
LED on time average: 432.057 clock cycles.
2014-08-31 19:07:51 +02:00
Markus Hitter ad82907b98 testcases: Add config.h.
There's nothing special about this config.h, it's just the one I
happened to use for first profiling investigations. To allow
everybody else to do the very same profiling runs, I add it here.

Doing profiling isn't too complicated:

  mv config.h config.h.backup
  ln -s testcases/config.h.Profiling config.h
  git checkout -b work
  git cherry-pick simulavr # add tweaks convenient for simulation runs
  make
  cd testcases
  ./run-in-simulavr.sh short-moves.gcode smooth-curves.gcode triangle-odd.gcode

After being done you can restore your config.h and delete this work branch.

Currently, performance is as following (with convenience commit applied):

    SIZES             ATmega...  '168    '328(P)    '644(P)    '1280
    FLASH : 20270 bytes          142%       66%        32%       16%
    RAM   :  2302 bytes          225%      113%        57%       29%
    EEPROM:    32 bytes            4%        2%         2%        1%

short-moves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 888.
Sum of all LED on time: 279945 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 722 clock cycles.
LED on time average: 315.253 clock cycles.

smooth-curves.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 9124.
Sum of all LED on time: 3297806 clock cycles.
LED on time minimum: 311 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 361.443 clock cycles.

triangle-odd.gcode
Statistics (assuming a 20 MHz clock):
LED on occurences: 1636.
Sum of all LED on time: 546946 clock cycles.
LED on time minimum: 306 clock cycles.
LED on time maximum: 712 clock cycles.
LED on time average: 334.319 clock cycles.
2014-08-31 19:07:39 +02:00
David Forrest 003697ee0f gcode_parse.c: Debug S with serwrite_int32. 2014-08-31 19:07:30 +02:00
David Forrest f046c013e3 sermesg.c: Add documentation tag for variable floating point. 2014-08-31 19:07:21 +02:00
Markus Hitter e7707ea275 config.*.h: extend DEBUG_LED_PIN comment to all config templates. 2014-08-31 19:07:13 +02:00
David Forrest f356f64bdb config.default.h: Add DEBUG_LED_PIN to the pinout section. 2014-08-31 19:07:01 +02:00
David Forrest b12157cb6f gcode_process.c: Add comment on units of P, I, and D parameters. 2014-08-31 19:06:52 +02:00
David Forrest 5b5c44b523 dda_lookahead.c: Eliminate debug crossF variable compile warning.
Fix:
  dda_lookahead.c:327:17: warning: 'crossF' may be used
  uninitialized in this function [-Wmaybe-uninitialized]
       sersendf_P(PSTR("Initial crossing speed: %lu\n"), crossF);
                 ^
2014-08-31 19:06:43 +02:00
David Forrest 2496a95c6f dda_maths.h: Add comment on units of C0. 2014-08-31 19:06:34 +02:00
David Forrest f3666fc43f heater_sim.c: Note that the heater isn't implemented in the simulator. 2014-08-31 19:06:23 +02:00
Markus Hitter fdfd202e5d run-in-simulavr.sh: add statistics output for LED On Time.
As it's still a bit cumbersome to go through the whole .vcd file
to find the highest delay between On and Off, do this search
automatically and output an statistics. Can look like this:

  Statistics (assuming a 20 MHz clock):
  LED on occurences: 838.
  Sum of all LED on time: 262055 clock cycles.
  LED on time minimum: 306 clock cycles.
  LED on time maximum: 717 clock cycles.
  LED on time average: 312.715 clock cycles.

This should give an reasonable overview of wether and roughly
how much a particular code change makes your code slower or
faster. It should also show up showblockers, like occasionally
huge delays.

BTW., the above data was collected timing the step interrupt when
running short-moves.gcode with the current firmware.
2014-08-31 19:06:13 +02:00
Markus Hitter da08c35edd run-in-simulavr.sh: add support for timing measurements.
The idea is simple: if you want to time a portion of code
precisely, turn on the Debug LED (see config.h for
DEBUG_LED_PIN) at the start of sequence and turn it off when
done. Running this in SimulAVR, you have two flanges precise
to the clock cycle which exactly reflect the time taken to
run this code sequence. Ideally, you run this code n a loop
to get a number of samples, if it doesn't run in a loop anyways.

Time taken can then be measured in GTKWave. For convenience and
for a better overview, run-in-simulavr.sh also extracts all the
delays into it's own signal, so it can be viewed as an ongoing
number.
2014-08-31 19:06:05 +02:00
Markus Hitter 4389e670bd run-in-simulavr.sh: start signals undefined.
Also a few aesthetical corrections.
2014-08-31 19:05:56 +02:00
Markus Hitter 35c4949965 run-in-simulavr.sh: run SimulAVR a bit more verbose.
SimulAVR doesn't always work exactly the way it should, so looking
at the command line it's started with is a first debugging step.
2014-08-31 19:05:47 +02:00
Markus Hitter 6250dbb9e0 Configuration: move DEBUG_LED definition.
Eventual debugging LEDs aren't part of the CPU, but part of the
electronics. Accordingly, define it in config.*.h, not in
arduino_*.h (which would be better named something like
"atmega_*.h).
2014-08-31 19:05:38 +02:00
Markus Hitter 9a08675576 Rename all these new PROGMEM variables to end in _P.
Should be done for temptable in ThermistorTable.h, too, but this
would mess up an existing users' configuration.

This tries to put emphasis on the fact that you have to read
these values with pgm_read_*() instead of just using the variable.
Unfortunately, gcc compiler neither inserts PROGMEM reading
instructions automatically when reading data stored in flash,
nor does it complain or warn about the missing read instructions.

As such it's very easy to accidently handle data stored in flash
just like normal data. It'll compile and work ... you just read
arbitrary data (often, but not always zeros) instead of what you
intend.
2014-08-31 19:05:25 +02:00
Phil Hord 74808610c7 DDA: Move axis calculations into loops, part 5.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 5 is move ACCELERATION_TEMPORAL's step delay calculations
into loops. Not tested, binary size change unknown.
2014-08-31 19:05:09 +02:00
Phil Hord 8d729d499d DDA: Move axis calculations into loops, part 4.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 4 is move ACCELERATION_TEMPORAL's maximum feedrate limitation
into a loop. Not tested, binary size change unknown.
2014-08-31 19:05:00 +02:00
Phil Hord cd0155b5f4 DDA: Move axis calculations into loops, part 3.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 3 is moving fast axis detection into a loop.
Binary size 84 bytes smaller.
2014-08-31 19:04:52 +02:00
Phil Hord d3beb21225 DDA: Move axis calculations into loops, part 2.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Part 2 is moving maximum speed limit calculations into loops.
Binary size another 160 bytes smaller.
2014-08-31 19:04:42 +02:00
Phil Hord 427d6637c3 dda_maths.h: remove now obsolete um_to_steps_[xyze]. 2014-08-31 19:04:33 +02:00
Phil Hord cec3c5f52e DDA: Move axis calculations into loops, part 1.
Clean up code to reduce duplication by consolidating code into
loops for per-axis actions.

Traumflug notes:

Split this once huge commit into smaller ones for ease of
reviewing and bisecting (in case something went wrong).

Part 1 is to put dda_create() distance calculations into loops.
This reduces binary size by another whopping 756 bytes.
2014-08-31 19:04:25 +02:00
Markus Hitter 1c19158bbc DDA: use new generic um_to_steps_* in dda_new_startpoint().
This was contributed by Phil Hord as part of another commit.

It saves 168 bytes, to it more than outweights the overhead of
introducing a generic implementation already.
2014-08-31 19:04:17 +02:00
Phil Hord 62bdbd86d6 DDA: convert um_to_steps_* to generic implementation.
A generic implementation here will allow callers to pass the
target axis in as a parameter so the callers can also be made more
generic.

Traumflug notes:

Split out application of the new implementation in dda.c into its
own commit.

This actually costs 128 bytes, but as we can access axes from within
a loop now, I expect to get more savings elsewhere.

Interestingly, binary size is raised by another 18 bytes if

  um_to_steps(int32_t, enum axis_e)

is changed to

  um_to_steps(enum axis_e, int32_t)

even on the 8-bit ATmega. While putting the axis number to the
front might be a bit more logical (think of additional parameters,
the axis number position would move), NXP application note
AN10963 states on page 10ff, 16-bit data should be 16-bit aligned
and 32-bit data should be 32-bit aligned for best performance.
Well, so let's do it this way.
2014-08-31 19:04:08 +02:00
Markus Hitter 84cbf2a42a home.c: no need to turn off Z axis here.
This is done in dda.c already, see dda.c, line 678.
2014-08-31 19:03:57 +02:00
Markus Hitter 94fa733ee8 home.c: don't move to zero after homing to max endstop.
This can be counterproductive if the actual zero point is
outside the available build room. For example, if an additional
bed probing is going to happen. It also costs quite some
time on the Z axis. If you actually  want this behaviour,
send a simple G0 XYZ after homing.
2014-08-31 19:03:45 +02:00
Phil Hord e2f793c2b3 DDA: Convert more axis variables to arrays.
Many places in the code use individual variables for int/uint values
for X, Y, Z, and E.  A tip from a comment suggests making these into
arrays for scalability in the future. Replace the discrete variables
with arrays so the code can be simplified in the future.
2014-08-31 19:03:31 +02:00
Phil Hord d3f49b3e95 DDA: Convert TARGET axis vars to array.
In preparation for more efficient and scalable code using axis-loops
for common operations, add two new array-types for signed and unsigned
32-bit values per axis. Make the TARGET type use this array instead of
its current X, Y, Z, and E variables.

Traumflug notes:

- Did the usual conversion to spaces for changed lines.

- Added X = 0 to the enum. Just for peace of mind.

- Excellent patch!

Initially I wanted to make the new array an anonymous union with the
old variables to allow accessing values both ways. This way it would
have been possible to do the transition in smaller pieces. But as
the patch worked so flawlessly and binary size is precisely the
same, I abandoned this idea. Maybe it's a good idea in other areas.
2014-08-31 19:03:17 +02:00
Markus Hitter e76bfa0d05 gcode_process.c: more preprocessor conditions for homing movements.
Well, optimizer isn't _that_ smart. It apparently removes
empty functions in the same compilation unit ( = source code file),
but not ones across units.

This saves 10 bytes binary size per endstop not used, so 30 bytes
in a standard configuration. All without any drawbacks.
2014-07-11 01:38:34 +02:00
Markus Hitter d53407bdc3 home.c: remove some redundant preprocessor stuff.
Binary size is exactly the same, to the optimizer apparently
manages to drop empty functions.
2014-07-11 01:38:25 +02:00
Markus Hitter dc84e4dfe0 home.c: adaptive homing feedrates for all axes. 2014-07-11 01:38:18 +02:00
Markus Hitter a7adc66ae5 config.*.h: distribute adaptive homing feedrate to all templates. 2014-07-11 01:38:08 +02:00
Markus Hitter b275bfcc32 Implement adaptive homing feedrates.
For now for X min only, but it works excellently already.
Tested quite a few combinations and raising acceleration
or endstop clearance raises homing feedrate just as expected.

Quite a chunk of the code is for testing the given configuration,
only. A thing which would ideally be done for every macro
used in each code file.
2014-07-11 01:37:57 +02:00
Markus Hitter 7611872baa Get rid of E_STARTSTOP_STEPS.
This meant to be a firmware-provided retract feature but was
never really supported by G-code generators. Without their support
(by issueing M101/M103), it's pretty hard to detect extrusion
pauses, so this feature simply has no future.

As this was on by default, it saves over 200 bytes binary size
in a default configuration.
2014-07-11 01:37:48 +02:00
David Forrest c35d1c1caf heater.c, config.default.h: Make PID_CONDITIONAL_INTEGRATION non-optional.
See
https://github.com/Traumflug/Teacup_Firmware/issues/74#issuecomment-38999466
2014-07-11 01:37:35 +02:00
David Forrest 23679855a0 heater.c: Enable more anti-windup with PID_CONDITIONAL_INTEGRATION. 2014-07-11 01:37:24 +02:00