This is an amalgamation of two recent articles: Turning a Blue Pill into a Z80 and Getting started with the F407. I want to use this as starting point for further retro Z80 explorations.

The first goal is very straightforward: get the same “ZEXALL” Z80 instruction exerciser working on the F407 µC as in the F103-based Blue Pill. It has more memory (enough to emulate the Z80’s entire 64K address space) and as a bonus, it’s also quite a bit faster.

I’m not going to use the USB serial driver as console for now, because it’s not convenient during development, i.e. during rapid edit - compile - upload cycles. With a serial UART connection, I can simply keep a terminal window open across uploads - whereas with on-board USB, the connection will be reset every time. USB can always be fitted in later.

Sooo, back to a Black Magic Probe setup, with the 6-wire power + SWD + serial hookup.

An easy port

It turns out that only two simple changes need to be made to make the F103 project work on an F407 µC. First, the obvious change, i.e. a different platformio.ini configuration:

[env:black407]
build_flags = -DSTM32F4 -I../common-z80
platform = ststm32
board = genericSTM32F407VET6
framework = stm32cube
upload_protocol = blackmagic
upload_port = /dev/cu.usbmodemDEE8C3AD
monitor_speed = 115200
lib_deps = jeeh

The other change is a design issue in JeeH which I haven’t been able to clear up yet. One of the first lines in main sets the console UART baud rate and needs to be changed:

    console.baud(115200, fullSpeedClock());

The problem is that on F407, the internal bus to which USART1 is attached runs at half the system clock speed, versus at full speed on F103 (i.e. half of 168 MHz vs. 72 MHz). The modification is to change the above line to:

    console.baud(115200, fullSpeedClock()/2);

With these tweaks, ZEXALL now also works on F407, producing the same output:

Z80 instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>....  OK
add hl,<bc,de,hl,sp>..........  OK
add ix,<bc,de,ix,sp>..........  OK
[...etc...]

It might seem like a very slow process, but it’s actually blindingly fast, compared to a “real” 4 MHz Z80 CPU chip. This emulator runs at the F407’s full rated speed, i.e. 168 MHz. My multimeter tells me that the LED blinks at 5.3 Hz, so the emulated speed is over 21 MHz. That’s very impressive, considering what’s going on under the hood!

Adding memory

So far, this was just a straight port to verify that everything still works.

What makes this port worthwhile, is that the F407 can easily emulate a 64 KB machine, instead of just 16 KB. In src/context.h, this line needs to be changed:

    uint8_t   mem [1<<14];  // size should be a power of two

… into a whopping:

    uint8_t   mem [1<<16];  // size should be a power of two

The resulting firmware still uses only a fraction of the F407’s memory resources:

DATA:    [=====     ]  50.7% (used 66480 bytes from 131072 bytes)
PROGRAM: [          ]   3.8% (used 19316 bytes from 514288 bytes)

Where did the RAM go?

The careful reader may have noticed that the RAM size reported in the PlatformIO build is “131072 bytes”. That’s 128 KB - what happened to the other 68 KB of the 196 KB total?

Well… the F407 does have 196 KB of SRAM, but it’s segmented into 4 separate sections:

Here is the whole story in a nice diagram (p.60 in the RM0090 documentation):

Not all memory accesses are identical:

The main detail to keep in mind, is that these memory areas are not all next to each other. The first two (112K+16K) are, so we really do have a base section of 128 KB to treat as SRAM for our applications to use, but the rest is in other parts of the ARM’s large memory space. And battery-backed SRAM needs extra attention to enable and use correctly.

The definition in PlatformIO for the genericSTM32F407VET6 platform defines RAM as that main 128 KB chunk. The rest must be defined manually in the code. For example as:

#define CCMEM ((uint8_t*) 0x10000000)  // available: 0x10000000..0x1000FFFF

It’s not hard at all, it’s just not automatic. With no DMA, up to 192 KB are easy to access. Here is the updated src/context.h, using the CCMEM as 64K emulated Z80 memory:

#include "z80emu.h"
#include <stdint.h>

#define CCMEM ((uint8_t*) 0x10000000)  // available: 0x10000000..0x1000FFFF

typedef struct {
    Z80_STATE state;
    uint8_t   done;
} Context;

inline uint8_t* mapMem (void* cp, uint16_t addr) {
    return CCMEM + addr;
}

extern void systemCall (Context *ctx, int request);

This firmware build now uses almost nothing from the F407’s “main” memory resources:

DATA:    [          ]   0.7% (used 944 bytes from 131072 bytes)
PROGRAM: [          ]   3.8% (used 19516 bytes from 514288 bytes)

Final results

That’s it. The time has come to patiently let the ZEXALL program run to completion:

Z80 instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>....  OK
add hl,<bc,de,hl,sp>..........  OK
add ix,<bc,de,ix,sp>..........  OK
add iy,<bc,de,iy,sp>..........  OK
aluop a,nn....................  OK
aluop a,<b,c,d,e,h,l,(hl),a>..  OK
aluop a,<ixh,ixl,iyh,iyl>.....  OK
aluop a,(<ix,iy>+1)...........  OK
bit n,(<ix,iy>+1).............  OK
bit n,<b,c,d,e,h,l,(hl),a>....  OK
cpd<r>........................  OK
cpi<r>........................  OK
<daa,cpl,scf,ccf>.............  OK
<inc,dec> a...................  OK
<inc,dec> b...................  OK
<inc,dec> bc..................  OK
<inc,dec> c...................  OK
<inc,dec> d...................  OK
<inc,dec> de..................  OK
<inc,dec> e...................  OK
<inc,dec> h...................  OK
<inc,dec> hl..................  OK
<inc,dec> ix..................  OK
<inc,dec> iy..................  OK
<inc,dec> l...................  OK
<inc,dec> (hl)................  OK
<inc,dec> sp..................  OK
<inc,dec> (<ix,iy>+1).........  OK
<inc,dec> ixh.................  OK
<inc,dec> ixl.................  OK
<inc,dec> iyh.................  OK
<inc,dec> iyl.................  OK
ld <bc,de>,(nnnn).............  OK
ld hl,(nnnn)..................  OK
ld sp,(nnnn)..................  OK
ld <ix,iy>,(nnnn).............  OK
ld (nnnn),<bc,de>.............  OK
ld (nnnn),hl..................  OK
ld (nnnn),sp..................  OK
ld (nnnn),<ix,iy>.............  OK
ld <bc,de,hl,sp>,nnnn.........  OK
ld <ix,iy>,nnnn...............  OK
ld a,<(bc),(de)>..............  OK
ld <b,c,d,e,h,l,(hl),a>,nn....  OK
ld (<ix,iy>+1),nn.............  OK
ld <b,c,d,e>,(<ix,iy>+1)......  OK
ld <h,l>,(<ix,iy>+1)..........  OK
ld a,(<ix,iy>+1)..............  OK
ld <ixh,ixl,iyh,iyl>,nn.......  OK
ld <bcdehla>,<bcdehla>........  OK
ld <bcdexya>,<bcdexya>........  OK
ld a,(nnnn) / ld (nnnn),a.....  OK
ldd<r> (1)....................  OK
ldd<r> (2)....................  OK
ldi<r> (1)....................  OK
ldi<r> (2)....................  OK
neg...........................  OK
<rrd,rld>.....................  OK
<rlca,rrca,rla,rra>...........  OK
shf/rot (<ix,iy>+1)...........  OK
shf/rot <b,c,d,e,h,l,(hl),a>..  OK
<set,res> n,<bcdehl(hl)a>.....  OK
<set,res> n,(<ix,iy>+1).......  OK
ld (<ix,iy>+1),<b,c,d,e>......  OK
ld (<ix,iy>+1),<h,l>..........  OK
ld (<ix,iy>+1),a..............  OK
ld (<bc,de>),a................  OK
Tests complete
Emulating zexall took 2160637 ms.

Excellent. All tests passed (in ≈ 35 min). The F407-as-Z80 emulator is ready for use.

References