Using the Bresenham algorithm it's safe to assume that if the axis
with the most steps is done, all other axes are done, too.
This way we save a lot of variable loading in dda_step(). We also
save this very expensive comparison of all axis counters against
zero. Minor drawback: update_current_position() is now even slower.
About performance. The slowest step decreased from 719 to 604
clocks, which is quite an improvement. Average step time increased
for single axis movements by 16 clocks and decreased for multi-
axis movements. At the bottom line this should improve real-world
performance quite a bit, because a printer movement speed isn't
limited by average timings, but by the time needed for the slowest
step.
Along the way, binary size dropped by nice 244 bytes, RAM usage by
also nice 16 bytes.
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19564 bytes 137% 64% 31% 16%
Data: 2175 bytes 213% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 326 clock cycles.
LED on time maximum: 595 clock cycles.
LED on time average: 333.62 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 318 clock cycles.
LED on time maximum: 604 clock cycles.
LED on time average: 333.311 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 318 clock cycles.
LED on time maximum: 585 clock cycles.
LED on time average: 335.233 clock cycles.
Our standard performance test is to run these three G-code files
in SimulAVR and recording step pulse timings. While this certainly
doesn't cover everything related to possible performance
measurements, it's a good basic standard to compare code changes.
Current performance:
ATmega sizes '168 '328(P) '644(P) '1280
Program: 19808 bytes 139% 65% 32% 16%
Data: 2191 bytes 214% 107% 54% 27%
EEPROM: 32 bytes 4% 2% 2% 1%
short-moves.gcode statistics:
LED on occurences: 888.
LED on time minimum: 308 clock cycles.
LED on time maximum: 729 clock cycles.
LED on time average: 317.393 clock cycles.
smooth-curves.gcode statistics:
LED on occurences: 23648.
LED on time minimum: 308 clock cycles.
LED on time maximum: 726 clock cycles.
LED on time average: 354.825 clock cycles.
triangle-odd.gcode statistics:
LED on occurences: 1636.
LED on time minimum: 308 clock cycles.
LED on time maximum: 719 clock cycles.
LED on time average: 336.327 clock cycles.
Traumflug's note: if one uses #define LOOKAHEAD_DEBUG at line 177,
one should use the same symbol in line 321. Edited the commit to
do so.
This reduces binary size by 38 bytes and RAM usage by 4 bytes.
PCBScriber is a printer for the scratch 'n etch method, see
http://reprap.org/wiki/PCBScriber
Commit reviewer Traumflug's note:
- Rebased to current branch 'experimental', which adds
USE_INTERNAL_PULLDOWNS.
- Removed DEFINE_HOMING for now, this part isn't cooked, yet.
For example, it doesn't pass regression tests.
- Thank you very much for the contribution!
This was an attempt to make Teacup sources compatible with
Arduino IDE 1.6.0 - 1.6.9 and became obsolete as of 1.6.10. The
problem was fixed on the Arduino IDE side.
We calculate all steps from the fastest axis now. So X and Y
steps_per_m don't have to be the same anymore.
Traumflug's: another 16 bytes program size off on AVR, same size
on LPC1114.
We need the fastest axis instead of its steps.
Eleminates also an overflow when ACCELERATION > 596.
We save 118 bytes program and 2 bytes data.
Reviewer Traumflug's note: I see 100 bytes program and 32 bytes
RAM saving on ATmegas here. 16 and 32 on the LPC 1114. Either way:
great stuff!
This should fix issue #235.
Recently ConfigTool has been very slow for me on Ubuntu Linux.
When I run the app there is a 15 second wait before the window is
first displayed. I bisected the problem and found it was tied to
the number of pins in `pinNames`, and ultimately that it was
caused by a slow initializer in wx.Choice() when the choices are
loaded when the widget is created. For some reason, moving the
load after the widget is created is significantly faster. This
change reduces my startup time to just under 4 seconds.
Further speedup could be had by using lazy initialization of the
controls. But the controls are too bound up in the loaded data
to make this simple. Maybe I will attack it later.
There is still a significant delay when closing the window, but I
haven't tracked what causes it. Maybe it is caused just by
destroying all these pin controls.
In the process of making this change, I wanted to simplify the
number of locations that bothered to copy the pinNames list and,
to support lazy loading, to try to keep the same list in all
pinChoice controls. I noticed that all the pinChoice controls
already have the same parameters passed to the addPinChoice
function which makes them redundant and confusing. I removed the
extra initializers and just rely on pinNames as the only list
option in addPinChoice for now. Maybe this flexibility is needed
for some reason later, but I can't see a purpose for it now.
Notes by reviewer Traumflug:
First of all, which "trick"? That's an excellent code
simplification and if this happens to make startup faster (it
does), all the better.
Measured startup & shutdown time here (click window close as soon
as it appears):
Before: With this commit:
real 0m4.222s real 0m3.780s
user 0m3.864s user 0m3.452s
sys 0m0.084s sys 0m0.100s
As the speedup was far more significant on the commit author's
machine, it might be a memory consumption issue (leading to
swapping on a small RAM machine). Linux allows to view this in
/proc/<pid>/status.
Before: Now:
VmPeak: 708360 kB 708372 kB
VmSize: 658916 kB 658756 kB
VmHWM: 73792 kB 73492 kB
VmRSS: 73792 kB 73492 kB
VmData: 402492 kB 402332 kB
Still no obvious indicator, but a 300 kB smaller memory footprint
is certainly nice.
If you attempt a Steinhart-Hart table in the configtool with
parameters (4700, 25, 100000, 209, 475, 256, 201) it fails with a:
...
File "/Users/drf/2014/RepRap/GIT/Teacup_Firmware/configtool/
thermistortablefile.py", line 169, in SteinhartHartTable
(i, int(t * 4), int(delta * 4 * 256), c, int(t), int(round(r))),
TypeError: not enough arguments for format string
Catched and fix provided by dr5fn, this should fix issue #246.
Heck, that's simply forbidden. A C compiler had catched this in a
split second at compile time, Python didn't until the faulty code
section was actually executed (a section of code for rare cases).
The simple fix is to replace the old tuple with a changed, new
tuple.
This resolved issue #242.
Similar to M221 which sets a variable flow rate percentage, add
support for M220 which sets a percentage modifier for the
feedrate, F.
It seems a little disturbing that the flow rate modifies the next
G1 command and does not touch the buffered commands, but this
seems like the only reasonable thing to do since the M221 setting
could be embedded in the source gcode for some use cases. Perhaps
an "immediate" setting using P1 could be considered later if
needed.
`target` is an input to dda_create, but we don't modify it. We
copy it into dda->endpoint and modify that instead, if needed.
Make `target` const so this treatment is explicit.
Rely on dda->endpoint to hold our "target" data so any decisions
we make leading up to using it will be correctly reflected in our
math.
In a test, the system worked fine even for a change in config.h,
which is #included by a variable (config_wrapper.h, line 20).
This should speed up repeated regression test, e.g. when doing a
'git regtest', substantially.
Disable it only when appropriate, of course.
The move of this code makes Teacup compiling with both,
ACCELERATION_REPRAP and LOOKAHEAD enabled. Such a configuration
makes no sense, but can happen anyways.
The flow rate is given as a percentage which is kept as
100 = 100% internally. But this means we must divide by 100 for
every movement which can be expensive. Convert the value to
256 = 100% so the compiler can optimize the division to a
byte-shift.
Also, avoid the math altogether in the normal case where the
flow rate is already 100% and no change is required.
Note: This also requires an increase in the size of e_multiplier
to 16 bits so values >= 100% can be stored. Previously flow
rates only up to 255% (2.5x) were supported which may have
surprised some users. Now the flow rate can be as high as
10000% (100x), at least internally.
Now it is possible to control the extruders flow.
M221 S100 = 100% of the extruders steps
M221 S90 = 90% of the extruders steps
M221 is also used in other firmwares for this. Also a lot of
hosts, like Octoprint and Pronterface using this M-Code for
this behaviour.
Note a performance improvement opportunity.
Review note by Traumflug: the original commit didn't add a
comment, but replaced the existing code with what's in the
comment now.
According to the comment in issue #223:
Pre-unroll:
LED on time minimum: 3138.44 clock cycles.
LED on time maximum: 5108.8 clock cycles.
LED on time average: 4590.58 clock cycles.
Unrolled:
LED on time minimum: 3016.92 clock cycles.
LED on time maximum: 4987.28 clock cycles.
LED on time average: 4469.06 clock cycles.
Thermistors and AD595 can be faster in that mode.
The new stategy is:
1. read the value
2. start the adc
3. return the result
- next cycle
instead of:
1. start the adc
- wait 10ms
2. read the value
3. return the result
- next cycle
Review changes by Traumflug: fixed the warnings appearing in some
configurations (case NEEDS_START_ADC undefined and case
NEEDS_START_ADC defined, but TEMP_READ_CONTINUOUS == 0)
This allows to use EWMA_ALPHA in an #if clause, which is needed
for the next commit.
Review changes by Traumflug: made changes to comments more
complete, added rounding ("+ 500") and also adjusted Configtool
for the change.
After firmware startup it's always in a valid range, even in the
unlikely case analog_init() is called twice.
This saves 4 bytes binary size without drawback.
If we have EMWA mode turned on, then the user wants to average
several samples from the temp sensors over time. But now we read
temp sensors only 4 times per second making this averaging take
much longer.
Read the temperatures continuously -- as fast as supported by the
probe type -- if we are using weight averaging (TEMP_EMWA < 1.0).
Heater PID loops must be called every 250ms, and temperature
probes do not need to be called any more often than that. Some
probes require some asynchronous operations to complete before
they're ready. Handle these in a state machine that first begins
the conversion and finally completes it on some future tick.
Signal it is complete by setting the new state variable to IDLE.
Kick off the heater PID loop by simply beginning the temperature
conversion on all the temperature probes. When each completes,
it will finish the process by calling its PID routine.
Remove the "next_read_time" concept altogether and just run each
temp conversion at fixed 250ms intervals.
Every type of temp sensor has its own special way to be read
and the once-simple loop has grown complex. Restore some sanity
by isolating the code for each sensor into its own inline
function.
In the process I noticed temp_flags is only ever set (never read)
and is used only in MAX6675. I don't understand what it was used
for, so for now, let's comment it out and revisit this later in
this series.
Integrate the next_read_time countdown into the loop as is common.
Check for start_adc() in the same loop -- before decrementing the
timer -- and call it when needed on the tick before we need the
results.
One concern I have still is that start_adc() may be called twice
within a few microseconds if two probes need to be read. I expect
it should only be called once, but I am not readily familiar with
the AVR ADC conversion protocol.
Extra clock next_start_adc_time was unnecessary. As @phord
observed, it was more understandable to explicitly call start_adc()
1 cycle ahead during temp_sensor_tick(), for sensors which use
analog_read().
As @phord observed, the conditions and the meaning of
next_read_time was not very intuitive. Changed that so that now
it represents the number of 10ms clock ticks before next sensor
reading, i.e. 1 is for 10ms, 2 for 20ms, etc.