Currently at a fixed frequency of 1 kHz and with a fixed duty
cycle of 10%, but PWM does work.
As it turns out, PIO0_11 is not usable for PWM, because its timer
is already in use for the Step timer, and had to be disabled for
Gen7-ARM.
Test: define a heater in board.gen7-arm.h and a square signal
of 1 kHz with 10% duty cycle should appear on the heater pin.
This also implements more of the FastIO infrastructure.
Unfortunately, definitions aren't exactly straightforward, so we
need lots of tabular data. For example, for the pin function I
I had to step through the user manual, pin by pin.
We also learned a lesson here: Cortex-M0 has a 4 word ( = 16 bytes)
prefetch engine. Loops not starting at such a boundary take
additional 4 clock cycles, making them slower. The tight loop used
for testing previously happened to be 16-byte aligne by accident.
Adding just one line of code in the SET_OUTPUT() macro misaligned
it, so loop repetition rate dropped from 5.3 MHz to 3.7 MHz.
There are many measures to align code to 16-byte boundaries:
- -falign-functions=16 as gcc flag.
- -falign-loops=16 as gcc flag, found to not work.
- -falign-labels=16 as gcc flag, worked for aligning the loop,
but also bloated the binary by 10%.
- __attribute__ ((aligned(16))) attached to functions (not
tested)
- Adding this just before the loop worked fine and increased the
binary by just 16 bytes:
__ASM (".balign 16");
Take care of this when relying on exact execution times, e.g. when
implementing delay_us()!
Only SET_OUTPUT() and WRITE() for now, reading follows later.
A loop like this:
SET_OUTPUT(PIO0_1);
for (;;) {
WRITE(PIO0_1, 0);
WRITE(PIO0_1, 1);
}
toggles a pin at about 5.3 MHz. The low period is 63 ns on the
scope, so 3 clock cycles. With this loop, the binary is 1648
bytes.
Assembly shows four instructions inside the loop, which is about
as good as it can get:
movs r2, #0
str r2, [r3, #8]
adds r2, #2
str r2, [r3, #8]
For comparison, using the MBED provided gpio routines give a
toggle frequency of about 300 kHz, with a low period of 72 clock
cycles. Microoptimisation isn't just the last few percent ...
Tested with this code before main():
static void delay(uint32_t delay) {
while (delay) {
__ASM volatile ("nop");
delay--;
}
}
... and in main():
SET_OUTPUT(PIO0_1);
SET_OUTPUT(PIO0_2);
SET_OUTPUT(PIO0_3);
SET_OUTPUT(PIO0_4);
__ASM (".balign 16");
while (1) {
// 1 pulse on pin 1, two pulses on pin 2, ...
WRITE(PIO0_1, 0);
WRITE(PIO0_1, 1);
WRITE(PIO0_2, 0);
WRITE(PIO0_2, 1);
WRITE(PIO0_2, 0);
WRITE(PIO0_2, 1);
WRITE(PIO0_3, 0);
WRITE(PIO0_3, 1);
WRITE(PIO0_3, 0);
WRITE(PIO0_3, 1);
WRITE(PIO0_3, 0);
WRITE(PIO0_3, 1);
// PIO0_4 needs a pullup 10k to 3.3V
// to show a visible signal.
WRITE(PIO0_4, 0);
delay(10);
WRITE(PIO0_4, 1);
delay(10);
WRITE(PIO0_4, 0);
delay(10);
WRITE(PIO0_4, 1);
delay(10);
WRITE(PIO0_4, 0);
delay(10);
WRITE(PIO0_4, 1);
delay(10);
WRITE(PIO0_4, 0);
delay(10);
WRITE(PIO0_4, 1);
delay(1000);
}
With a 10k pullup, PIO0_4 has a rise time of about 1 microsecond.