Diving deep into SPI
Sep 21, 2016

As a somewhat more substantial exercise to learn Verilog, I thought I’d write an SPI master controller - should be easy, right?

The Serial Peripheral Interface bus can deal with high speeds (at short distances), only uses 4 I/O pins, and it looks like quite an elegant bi-directional transfer mechanism:

Two shift registers, connected in a circle, with bits rotating through (MSB to LSB).

The devil is in the details, though:

  • the actual SCLK/MOSI/MISO wires introduce some delay, so the master and slave need to be clocked on different edges of the SCLK signal
  • there are four combinations of how this can be done, in terms of which edges the master and slave shift on

I’ve looked at a dozen implementations on the web, but they all appear to be more complex than necessary. My goal is to strip down this task to its essence and hopefully, that’ll translate into simple Verilog code.

My first attempt is spiSim/top.v on GitHub.

It re-uses a single 8-bit shift register for input and output, as in the above diagram. It also gates SCLK, i.e. pulses are sent to shift the data, but the clock is not a free-running divider of the main clock - it only increments while there is work to do.

One trick I’m using here, is that the internal counter is actually 5 bits large: the low bit is the (negated) SCLK signal, bits [3..1] are the 8-bit counter to shift out the byte, and bit 4 is the overflow, causing the driver to return to idle mode and stop shifting more bits. I’m assuming all flip-flops start at “0”.

Here is the simulation, with MISO being a slightly delayed copy of MOSI for this test:

(you can see more detail in this PDF)

If “wr” stays high, this will be treated as a continuous write, and the outgoing bytes will be sent out nearly back-to-back.

This is based on the timing specs of the W25Q16 serial flash memory chip:

Is it correct? - I don’t know. I’m not happy with the timing of the “done“ pulse yet …

It sure takes getting used to. Verilog may look a bit like C, but all this parallel stuff brings a very different approach to coding!

For comments, visit the forum.

Running a simulated FPGA
Sep 14, 2016

Who needs real hardware?

One of the very nice things you can do with Verilog (and VHDL), is to use it as basis for a simulator. Especially with synchronous designs, which do all their work in lock-step with a central clock, simulators can provide a good insight into what’s going on without having to load the design onto real silicon.

Verilator is a great tool for this, and does something really clever: the Verilog code is first translated into C++, which you then include in a small main loop of your own, compile, and run - preferably on a modern laptop or desktop machine. Such an RTL simulation runs surprisingly fast.

I’ve set up a minimal demo on GitHub to try this out and get started really quickly.

Here is the Verilog code I’ll be trying out:

module top (
    input clk,
    output out

// a little 0-to-7 counter
reg [3:0] counter;
always @(posedge clk)
    counter <= counter + 1;
wire in = counter[3];

// mystery circuit...
reg inPrev;
always @(posedge clk)
    inPrev <= in;
assign out = ~in & inPrev;


We need a few lines of C++ boilerplate code to “drive” the simulation (and stop at some point!), see the main.cpp file on GitHub.

Here is the Makefile I’m using:

CFLAGS = -Wno-undefined-bool-conversion

    verilator -cc top.v --trace --exe ../main.cpp -CFLAGS "$(CFLAGS)"
    make -C obj_dir -f Vtop.mk

    rm -rf obj_dir data.vcd

Once compiled and run, we can use the open-source GTKwave application to look at the results as a nice timing diagram.

BTW, I used “brew install verilator gtkwave” on my Mac laptop to get these nice tools. It sure is great to have such an effortless package installer, nowadays!

After typing in “gtkwave data.vcd”, GTKwave opens up in a window. If you now expand the “top” tree and then append all signals to the viewing window, you get this:

Note how all signals defined in Verilog can be inspected. It’s like having an oscilloscope which lets us peek inside the entire circuit.

GTKwave can also export the result to a PDF, which comes out looking like this:

Aha! This circuit is a falling-edge detector!

This is a multiplexed 7-segment LED display, showing “1234” - from this demo:

As you can imagine, simulation is extremely useful to check whether a circuit description is actually doing what you want. The only change to the original Verilog code, is that the clock was “speeded up” by using bits [4:3] instead of [19:18] (twice) from the counter, so that the simulation wouldn’t waste its cycles just to count down time.

For a considerably more advanced example, check out this SDRAM controller, copied from the very nice fpga4fun.com website:

(see this PDF for a detailed view)

It’s not a full-blown implementation, but still a great example of how to design and debug a circuit before building hardware.

For comments, visit the forum.

I never had an Intel 8080
Sep 7, 2016

… although I spent months tinkering with a Z80 chip in a previous life - long, long ago.

But with an FPGA, that can change, at least in the virtual sense: there are several “soft cores” implementing an 8080 chip in a HDL, including memory and peripherals.

The light8080 project at OpenCores is one such implementation (in Verilog), and it’s quite an interesting beast because it uses a microcode architecture. Microcode is like “the lowest of low-level stored-program execution models”: it sits between machine instructions and the gates, registers, and memory comprising its implementation.

There’s a repository on GitHub with the implementation, which I have modified slightly to set up my own demo (calling it “lite8080” to avoid confusion). I’ve also set up a lite8080.qpf (Quartus Project File) there, which will build the final bitstream for my EP4CE6-based FPGA board.

Here’s the result:

That’s 10% of the FPGA, with 2 KB used for the microcode ROM and 4 KB as actual 8080 memory (this particular FPGA could provide up to 28 KB). We could in fact fit multiple 8080’s in there, if we wanted!

According to the Quartus timing analysis, this design should work up to 100 MHz.

Which is all nice and well, but how do you get 8080 code into this thing?

It turns out to be quite simple. Part of the bitstream can contain initialisation data for RAM memory. This is in fact how ROM works in an FPGA (such as the microcode in this design): preset the memory with data and don’t implement write access at all.

In this case, I’ve taken the easy way out and kept the demo code already present in the original light8080 project: a small “hello world” and serial echo demo, written in C.

This can then be compiled to 8080 code by the legendary Small-C compiler, and turned into a hex file of just over 1 KB. Then it can be converted into Verilog code (there are also other ways to do this), as shown here.

There were two small problems I had to fix:

  • The code was written for an FPGA running at 30 MHz, whereas my setup runs at 50 MHz. This affects the baud rate calculation, so I ended up patching the hex values to run at 19,200 baud.

  • The external interrupts I connected to some outputs on an unused PMOD header were constantly generating interrupts. So I decided to completely disable interrupt handling for now.

But after that: bingo, success!

$ picocom -b 19200 /dev/cu.usbserial-00002414 
picocom v2.1
Terminal ready
Hello World!!!
Dec value: 5678
Hex value: 0x4D2
Echoing received bytes: 
Thanks for using picocom

There are more complete implementations on FPGAs, such as the MultiComp design I’ve built a while back, which runs CP/M and has a whopping 64 KB of ram. But this “light” implementation of the Intel 8080 chip is a nice illustration of how small these designs can be and how microcode works.

For comments, visit the forum.

Weblog © Jean-Claude Wippler. Generated by Hugo.