This is an amalgamation of two recent articles: Turning a Blue Pill into a Z80 and Getting started with the F407. I want to use this as starting point for further retro Z80 explorations.
The first goal is very straightforward: get the same “ZEXALL” Z80 instruction exerciser working on the F407 µC as in the F103-based Blue Pill. It has more memory (enough to emulate the Z80’s entire 64K address space) and as a bonus, it’s also quite a bit faster.
I’m not going to use the USB serial driver as console for now, because it’s not convenient during development, i.e. during rapid edit - compile - upload cycles. With a serial UART connection, I can simply keep a terminal window open across uploads - whereas with on-board USB, the connection will be reset every time. USB can always be fitted in later.
Sooo, back to a Black Magic Probe setup, with the 6-wire power + SWD + serial hookup.
An easy port
It turns out that only two simple changes need to be made to make the F103
project work on an F407 µC. First, the obvious change, i.e. a different
platformio.ini
configuration:
[env:black407]
build_flags = -DSTM32F4 -I../common-z80
platform = ststm32
board = genericSTM32F407VET6
framework = stm32cube
upload_protocol = blackmagic
upload_port = /dev/cu.usbmodemDEE8C3AD
monitor_speed = 115200
lib_deps = jeeh
The other change is a design issue in JeeH which I haven’t
been able to clear up yet. One of the first lines in main
sets the console
UART baud rate and needs to be changed:
console.baud(115200, fullSpeedClock());
The problem is that on F407, the internal bus to which USART1 is attached runs at half the system clock speed, versus at full speed on F103 (i.e. half of 168 MHz vs. 72 MHz). The modification is to change the above line to:
console.baud(115200, fullSpeedClock()/2);
With these tweaks, ZEXALL now also works on F407, producing the same output:
Z80 instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>.... OK
add hl,<bc,de,hl,sp>.......... OK
add ix,<bc,de,ix,sp>.......... OK
[...etc...]
It might seem like a very slow process, but it’s actually blindingly fast, compared to a “real” 4 MHz Z80 CPU chip. This emulator runs at the F407’s full rated speed, i.e. 168 MHz. My multimeter tells me that the LED blinks at 5.3 Hz, so the emulated speed is over 21 MHz. That’s very impressive, considering what’s going on under the hood!
Adding memory
So far, this was just a straight port to verify that everything still works.
What makes this port worthwhile, is that the F407 can easily emulate a 64 KB
machine, instead of just 16 KB. In src/context.h
, this line needs to be
changed:
uint8_t mem [1<<14]; // size should be a power of two
… into a whopping:
uint8_t mem [1<<16]; // size should be a power of two
The resulting firmware still uses only a fraction of the F407’s memory resources:
DATA: [===== ] 50.7% (used 66480 bytes from 131072 bytes)
PROGRAM: [ ] 3.8% (used 19316 bytes from 514288 bytes)
Where did the RAM go?
The careful reader may have noticed that the RAM size reported in the PlatformIO build is “131072 bytes”. That’s 128 KB - what happened to the other 68 KB of the 196 KB total?
Well… the F407 does have 196 KB of SRAM, but it’s segmented into 4 separate sections:
- 112 KB of “ordinary” on-chip SRAM
- 16 KB of “auxiliary” SRAM, contiguous with the 112 KB
- 64 KB of “core-coupled” SRAM, not contiguous with either of the above
- 4 KB “battery backup” SRAM, again not contiguous
Here is the whole story in a nice diagram (p.60 in the RM0090 documentation):
Not all memory accesses are identical:
- ARM accesses through the “I-bus” and “D-bus” benefit from cache acceleration
- the 64 KB “core-coupled memory” is fast, but can’t be used for code or DMA
- lastly, the 4 KB battery-backed SRAM can retain its contents even when the µC
is powered down - as long as a backup battery supplies power to the µC’s
Vbat
pin
The main detail to keep in mind, is that these memory areas are not all next to each other. The first two (112K+16K) are, so we really do have a base section of 128 KB to treat as SRAM for our applications to use, but the rest is in other parts of the ARM’s large memory space. And battery-backed SRAM needs extra attention to enable and use correctly.
The definition in PlatformIO for the genericSTM32F407VET6
platform defines
RAM as that main 128 KB chunk. The rest must be defined manually in
the code. For example as:
#define CCMEM ((uint8_t*) 0x10000000) // available: 0x10000000..0x1000FFFF
It’s not hard at all, it’s just not automatic. With no DMA, up to 192 KB are
easy to access. Here is the updated src/context.h
, using the CCMEM as 64K
emulated Z80 memory:
#include "z80emu.h"
#include <stdint.h>
#define CCMEM ((uint8_t*) 0x10000000) // available: 0x10000000..0x1000FFFF
typedef struct {
Z80_STATE state;
uint8_t done;
} Context;
inline uint8_t* mapMem (void* cp, uint16_t addr) {
return CCMEM + addr;
}
extern void systemCall (Context *ctx, int request);
This firmware build now uses almost nothing from the F407’s “main” memory resources:
DATA: [ ] 0.7% (used 944 bytes from 131072 bytes)
PROGRAM: [ ] 3.8% (used 19516 bytes from 514288 bytes)
Final results
That’s it. The time has come to patiently let the ZEXALL program run to completion:
Z80 instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>.... OK
add hl,<bc,de,hl,sp>.......... OK
add ix,<bc,de,ix,sp>.......... OK
add iy,<bc,de,iy,sp>.......... OK
aluop a,nn.................... OK
aluop a,<b,c,d,e,h,l,(hl),a>.. OK
aluop a,<ixh,ixl,iyh,iyl>..... OK
aluop a,(<ix,iy>+1)........... OK
bit n,(<ix,iy>+1)............. OK
bit n,<b,c,d,e,h,l,(hl),a>.... OK
cpd<r>........................ OK
cpi<r>........................ OK
<daa,cpl,scf,ccf>............. OK
<inc,dec> a................... OK
<inc,dec> b................... OK
<inc,dec> bc.................. OK
<inc,dec> c................... OK
<inc,dec> d................... OK
<inc,dec> de.................. OK
<inc,dec> e................... OK
<inc,dec> h................... OK
<inc,dec> hl.................. OK
<inc,dec> ix.................. OK
<inc,dec> iy.................. OK
<inc,dec> l................... OK
<inc,dec> (hl)................ OK
<inc,dec> sp.................. OK
<inc,dec> (<ix,iy>+1)......... OK
<inc,dec> ixh................. OK
<inc,dec> ixl................. OK
<inc,dec> iyh................. OK
<inc,dec> iyl................. OK
ld <bc,de>,(nnnn)............. OK
ld hl,(nnnn).................. OK
ld sp,(nnnn).................. OK
ld <ix,iy>,(nnnn)............. OK
ld (nnnn),<bc,de>............. OK
ld (nnnn),hl.................. OK
ld (nnnn),sp.................. OK
ld (nnnn),<ix,iy>............. OK
ld <bc,de,hl,sp>,nnnn......... OK
ld <ix,iy>,nnnn............... OK
ld a,<(bc),(de)>.............. OK
ld <b,c,d,e,h,l,(hl),a>,nn.... OK
ld (<ix,iy>+1),nn............. OK
ld <b,c,d,e>,(<ix,iy>+1)...... OK
ld <h,l>,(<ix,iy>+1).......... OK
ld a,(<ix,iy>+1).............. OK
ld <ixh,ixl,iyh,iyl>,nn....... OK
ld <bcdehla>,<bcdehla>........ OK
ld <bcdexya>,<bcdexya>........ OK
ld a,(nnnn) / ld (nnnn),a..... OK
ldd<r> (1).................... OK
ldd<r> (2).................... OK
ldi<r> (1).................... OK
ldi<r> (2).................... OK
neg........................... OK
<rrd,rld>..................... OK
<rlca,rrca,rla,rra>........... OK
shf/rot (<ix,iy>+1)........... OK
shf/rot <b,c,d,e,h,l,(hl),a>.. OK
<set,res> n,<bcdehl(hl)a>..... OK
<set,res> n,(<ix,iy>+1)....... OK
ld (<ix,iy>+1),<b,c,d,e>...... OK
ld (<ix,iy>+1),<h,l>.......... OK
ld (<ix,iy>+1),a.............. OK
ld (<bc,de>),a................ OK
Tests complete
Emulating zexall took 2160637 ms.
Excellent. All tests passed (in ≈ 35 min). The F407-as-Z80 emulator is ready for use.
References
- The source code of this ZEXALL port to F407 can be found in this repository.