Commit Graph

5 Commits

Author SHA1 Message Date
Markus Hitter feeb411eec ARM: split timer.c into platform specific files.
AVR and simulator are kept together, because the simulator
apparently simulates much of the AVR timer infrastructure.

ARM variant is empty, so far.
2015-08-12 14:26:36 +02:00
Markus Hitter a7240523e1 ARM: rename mbed-LPC11xx.h to cmsis-lpc11xx.h.
Part of the effort to rename all CMSIS-provided files to "cmsis-".
2015-08-12 14:26:36 +02:00
Markus Hitter 52e2585f13 ARM: use arduino.h for UART pinout.
This removes another 36 bytes binary size and six(!) MBED files.

The mess of MBED files is now pretty much resolved, only a few
essential ones left.
2015-08-12 14:26:35 +02:00
Markus Hitter 072e3f8ae5 ARM: also set GPIO function and mode.
This also implements more of the FastIO infrastructure.
Unfortunately, definitions aren't exactly straightforward, so we
need lots of tabular data. For example, for the pin function I
I had to step through the user manual, pin by pin.

We also learned a lesson here: Cortex-M0 has a 4 word ( = 16 bytes)
prefetch engine. Loops not starting at such a boundary take
additional 4 clock cycles, making them slower. The tight loop used
for testing previously happened to be 16-byte aligne by accident.
Adding just one line of code in the SET_OUTPUT() macro misaligned
it, so loop repetition rate dropped from 5.3 MHz to 3.7 MHz.

There are many measures to align code to 16-byte boundaries:

 - -falign-functions=16 as gcc flag.

 - -falign-loops=16 as gcc flag, found to not work.

 - -falign-labels=16 as gcc flag, worked for aligning the loop,
   but also bloated the binary by 10%.

 - __attribute__ ((aligned(16))) attached to functions (not
   tested)

 - Adding this just before the loop worked fine and increased the
   binary by just 16 bytes:

     __ASM (".balign 16");

Take care of this when relying on exact execution times, e.g. when
implementing delay_us()!
2015-08-12 14:26:35 +02:00
Markus Hitter 2c90a2dfc7 ARM: get FastIO for writing into place.
Only SET_OUTPUT() and WRITE() for now, reading follows later.

A loop like this:

  SET_OUTPUT(PIO0_1);
  for (;;) {
    WRITE(PIO0_1, 0);
    WRITE(PIO0_1, 1);
  }

toggles a pin at about 5.3 MHz. The low period is 63 ns on the
scope, so 3 clock cycles. With this loop, the binary is 1648
bytes.

Assembly shows four instructions inside the loop, which is about
as good as it can get:

  movs  r2, #0
  str   r2, [r3, #8]
  adds  r2, #2
  str   r2, [r3, #8]

For comparison, using the MBED provided gpio routines give a
toggle frequency of about 300 kHz, with a low period of 72 clock
cycles. Microoptimisation isn't just the last few percent ...

Tested with this code before main():

static void delay(uint32_t delay) {
  while (delay) {
    __ASM volatile ("nop");
    delay--;
  }
}

... and in main():

  SET_OUTPUT(PIO0_1);
  SET_OUTPUT(PIO0_2);
  SET_OUTPUT(PIO0_3);
  SET_OUTPUT(PIO0_4);
  __ASM (".balign 16");
  while (1) {
    // 1 pulse on pin 1, two pulses on pin 2, ...
    WRITE(PIO0_1, 0);
    WRITE(PIO0_1, 1);
    WRITE(PIO0_2, 0);
    WRITE(PIO0_2, 1);
    WRITE(PIO0_2, 0);
    WRITE(PIO0_2, 1);
    WRITE(PIO0_3, 0);
    WRITE(PIO0_3, 1);
    WRITE(PIO0_3, 0);
    WRITE(PIO0_3, 1);
    WRITE(PIO0_3, 0);
    WRITE(PIO0_3, 1);
    // PIO0_4 needs a pullup 10k to 3.3V
    // to show a visible signal.
    WRITE(PIO0_4, 0);
    delay(10);
    WRITE(PIO0_4, 1);
    delay(10);
    WRITE(PIO0_4, 0);
    delay(10);
    WRITE(PIO0_4, 1);
    delay(10);
    WRITE(PIO0_4, 0);
    delay(10);
    WRITE(PIO0_4, 1);
    delay(10);
    WRITE(PIO0_4, 0);
    delay(10);
    WRITE(PIO0_4, 1);
    delay(1000);
  }

With a 10k pullup, PIO0_4 has a rise time of about 1 microsecond.
2015-08-12 14:26:34 +02:00