Computing stuff tied to the physical world

The need for speed

Let’s find out what our little LPC810 is capable of in terms of processing speed…

One of the interesting features of ARM chips, is that they have a phase-locked loop (PLL), which is a way to run at a higher clock frequency than the clock electronics itself provides. This is why the LPC8xx series can run at up to 30 Mhz without any external components, even though the internal clock runs at 12 MHz. That’s a 2.5x speedup, and all we need to get there, is to execute a few extra statements after power-up.

To verify this, and to find out how fast the LPC810 will actually run, we can make it toggle an I/O pin and then use a logic analyser or oscilloscope to look at the resulting signal.

The 50-line toggle demo in the jeelabs/embello area on GitHub uses the following code:

int main () {
    setMaxSpeed();
    ...
    SysTick_Config(12000000/50000);
    while (true)
        LPC_GPIO_PORT->NOT0 = 1<<2;
}

extern "C" void SysTick_Handler () {
    LPC_GPIO_PORT->B0[3] = 1;
    LPC_GPIO_PORT->B0[3] = 0;
    LPC_GPIO_PORT->B0[3] = 1;
    LPC_GPIO_PORT->B0[3] = 0;
}

It does two things:

  • toggle PIO0_2 in a very tight loop (this is pin 4 on the 8-DIP LPC810 package)
  • generate SysTick interrupts at 50 KHz, which toggles PIO0_3 (pin 3) four times

Here is the result, as seen on an oscilloscope – note that we’re running at 30 MHz:

SCR07

The yellow trace is the periodic interrupt (pin 3), the blue trace is the tight loop (pin 4). This screen was captured by triggering on the interrupt pulsing pin 3 every 20 µs.

We can make several observations:

  • the tight loop takes 0.3 µs per iteration, generating a 1.67 MHz square wave
  • the interrupt suspends the tight loop for an additional 1.71 – 0.3 = 1.41 µs
  • fast toggling without the loop overhead can generate pulses at 10 MHz

Looking back at the source code, we can deduce that those 4 lines inside SysTick_Handler require at least 0.2 µs (it’s in fact a bit more, due to some register setup), so the interrupt overhead of the LPC810 is at most 1.21 µs: i.e. the total interrupt routine entry + exit time.

One more factoid: the LPC810 draws 4.5 mA of current while running this test @ 30 MHz.

But there’s a puzzle hiding in these timings: when running at 30 MHz, each instruction requires 33 ns. How then can four instructions require six clock cycles? There’s no such thing as an instruction taking 1.5 cycles, it has to be an integral number of clock cycles!

The answer is quite instructive of how an ARM chip such as the LPC810 operates. Keep in mind that in this example, the µC fetches instructions from flash memory. In the LPC810 µC, flash memory is probably organised in units of 4 bytes, but on ARM Cortex chips, each instruction uses 2 bytes, so each flash memory fetch reads two instructions at a time.

On the LPC810, flash memory is configured with 2 “wait cycles”, so reading a word from flash requires three clock cycles. Which explains it all: 2 instructions per 3 clock cycles!

According to the LPC8xx data sheet, flash memory can be set to a single clock cycle, but only if the master clock is 20 MHz or less. Let’s ignore those limits for a moment and over-clock the flash memory access by reducing the wait cycles anyway, even at 30 MHz:

SCR06a

SCR06b

Bingo: now we are toggling pin 3 every 33 ns, i.e. one pin change for each instruction. That darn little 8-DIP chip is generating a 15 MHz square wave output … in software!

Don’t get too carried away with this, though: normal code will add loop overhead, interrupts occurring once in a while, and complex paths through the program logic. Generating a 1 MHz’ish signal in software is possible, but there will be glitches.

Luckily, the LPC810 also has hardware peripherals which can generate a clean 15 MHz.

[Back to article index]