Computing stuff tied to the physical world

Serial bridge

by Thorsten von Eicken

The main purpose of the esp-bridge is to serve as a serial bridge: bridge a TCP connection to the serial port. This allows programs that reprogram AVR and ARM processors to do their work remotely by connecting to the processor via TCP instead of via an attached serial cable or a USB cable with a serial converter chip like an FTDI FT232R. The serial bridge also allows users to connect to the attached processor using a terminal program in order to see debug output or to send input keystrokes:

DSC 5129

Any of these use-cases starts by running an application on a laptop (or other linux or windows computer) that opens a TCP connection to the esp-bridge. For example, my Makefile for Arduino sketches ends up invoking avrdude like this:

/home/arduino-1.0.5/hardware/tools/avrdude \
    -C /home/arduino-1.0.5/hardware/tools/avrdude.conf -DV -patmega328p \
    -Pnet:esp-bridge:23 -carduino -b115200 -U flash:w:my_sketch.hex:i

Notice the -Pnet:esp-bridge:23 option which directs avrdude to open a network connection to port 23 of host ‘esp-bridge’ instead of a local serial port. For this to work, the first piece necessary on the esp-bridge is a listener, which is something that listens to TCP connections to port 23.

Before diving into the code a word about names: esp-bridge is the specific prototype board being described in these episodes, and esp-link, which we’ll come to in a moment, is the software/firmware running on esp-bridge that implements all the features. Esp-link can be used for other boards similar to the esp-bridge.

Creating a TCP server

The key lines of code to set-up a listener for port 23 are:

static struct espconn serbridgeConn;
serbridgeConn.type = ESPCONN_TCP;
serbridgeConn.state = ESPCONN_NONE;
serbridgeTcp.local_port = port;
serbridgeConn.proto.tcp = &serbridgeTcp;
espconn_regist_connectcb(&serbridgeConn, serbridgeConnectCb);
espconn_accept(&serbridgeConn);
espconn_tcp_set_max_con_allow(&serbridgeConn, MAX_CONN);
espconn_regist_time(&serbridgeConn, SER_BRIDGE_TIMEOUT, 0);

(The code can be found on GitHub, which points to the release 0.9.5 tag of esp-link. By the time you read this there are likely to be newer releases out.)

The first 5 lines set-up a socket descriptor for a TCP listener. The first call, to espconn_regist_connectcb, adds a callback function to the socket descriptor and then the call to espconn_accept creates the listener socket itself. Readers familiar with the BSD socket API will notice that this call in the Espressif library, as many others, is misnamed: it should really have been called listen *instead of *accept. Sigh.

The following two calls set the maximum number of simultaneous connections to MAX_CONN (which is set to 4 elsewhere) and the timeout for inactive connections.

To understand what this code does, a little detour into the threading model is needed. The difficulty with network communication and bidirectional TCP communication is that multiple tasks need to be handled in parallel. For example, packets with data may arrive at any time on the TCP connection and need to be received and pushed into the UART. While this is happening, additional packets with more data may arrive and need to be queued. In addition, data can start coming in on the UART at the same time. It needs to be collected, and sent over the TCP connection. Furthermore, additional TCP connections can be established at the same time and add more concurrent activity to the mix.

In many programming environments this concurrency can be solved by using multiple processes or multiple threads. However, that is not a good idea on the esp8266 because memory is very limited and the LwIP library is not multithreaded among other reasons. Instead a callback mechanism is used where all activity runs on a single thread and gets interleaved by the communication library calling back into the user code whenever something needs doing. Thus there isn’t really any “main loop” in the user program. Instead, the main program sets up callbacks, like the above, that the Espressif SDK communications code calls.

To come back to the above code snippet: it sets up one callback that gets invoked when a new TCP connection is established. That callback does a little book-keeping and sets up 4 additional callbacks:

  • one to receive an incoming packet (really an incoming byte buffer, since this is TCP)
  • one to be notified when the send buffer can accept more data
  • one to be notified when the connection is terminated cleanly
  • one to be notified if the connection is lost due to an error (this one is confusingly called recon by Espressif even though no reconnection occurs, presumably they mean “hey, you need to reconnect”)

The result of all these callbacks is that there is no main loop nor multiple threads. Instead the code is much more event-driven and the appropriate functions are called-back by the SDK at the right time. All the callbacks occur on the same thread, which does eliminate the need for locks or similar concurrency control mechanisms.

Bridging from TCP to the UART

It turns out that receiving is the simplest to implement. The receive callback is invoked when a buffer of characters is received. Its main task is to write the buffer character by character to the UART using programmed I/O. Nothing fancy: just wait until the UART transmit FIFO has a free slot and write the next character to it.

How can this possibly work and what about the concurrency? This activity indeed blocks the entire application on the task of feeding the UART at whatever rate it can consume because the SDK cannot make any callbacks for anything else: only one callback at a time. The reason this works is two-fold: the UART isn’t as slow as one might think and TCP has flow-control.

The typical UART rates are 115200 baud and higher, so the time to push up to 1400 bytes (the maximum size of a TCP packet) out is around 130 milliseconds, but the typical packets sent by avrdude are around 80 characters long which takes only 7 milliseconds. In addition, the esp8266 has a cool UART fifo of 126 characters, which means that such short packets can be pushed into the UART in microseconds without having to wait on any actually being transmitted.

The TCP flow-control argument is more tricky than one might think and worth spending some time on. Flow-control starts in the TCP SYN and SYN-ACK request & response packets that open a TCP connection. In those packets the two parties establish the maximum packet size and the number of bytes of receive buffer that each has allocated (this is called the window size in TCP). Thus, when a laptop running avrdude opens a connection to the esp-bridge the latter’s TCP stack responds with a window size. Here is how this looks on the wire using tcpdump:

18:39:49.475468 IP laptop.39944 > esp-bridge.telnet:
  Flags [S], seq 194149762, win 29200, options
  [mss 1460,sackOK,TS val 2851915984 ecr 0,nop,wscale 7], length 0

18:39:49.484865 IP esp-bridge.telnet > laptop.39944:
  Flags [S.], seq 623791, ack 194149763, win 5840, options
  [mss 1460], length 0

The first packet is the SYN “[S]” sent from the laptop to the esp-bridge indicating a max packet size of 1460 bytes “mss 1460” and a receive window size of 29200<<7 = 3737600 bytes (“win 29200” left shifted by “wscale 7”). The second packet shows that, unlike the laptop with plenty of memory, the esp8266’s network stack sings a different song and indicates the same maximum packet size but a receive window of only 5840 bytes. That’s simply because there just isn’t as much memory to go around. The 5840 byte window size means that each connection may require up to 5840 bytes of memory! Since this adds up quickly the code quoted above sets a limit of 4 simultaneous TCP connections.

But back to the flow-control: the window size means that if the laptop has oodles of data to send to the esp-bridge and the latter spends 130 milliseconds pushing the first 1460 bytes into the UART then the laptop must stop sending after 5840 bytes and wait for an acknowledgment (ACK), which the esp8266’s TCP stack only sends once the callback for the first packet returns. The net effect is that the rate at which the laptop sends data is gated by the ACKs which only happen at the rate at which the UART can actually transmit characters. But the esp8266 needs to have enough memory to store the bytes “in flight”.

Bridging from the UART to TCP

Bridging from the UART to TCP is a little more involved because something needs to produce callbacks! In the TCP-to-UART flow the SDK has the logic to invoke a callback when a packet is received. Now a mechanism is needed to do something similar when characters are received on the UART. The mechanism used in esp-link has three pieces: an interrupt routine, a UART-specific UART emptying callback, and a TCP-specific transmit callback.

The interrupt routine can be found in the uart driver and is kept as simple as possible: it checks the interrupt cause and if characters have indeed been received it calls the SDK to schedule a callback using system_os_post. This does the same thing the TCP stack does when it receives a packet: it tells the SDK to make a callback into user code whenever the prior callback finishes.

The callback goes to another routine in the uart driver which empties the UART receive fifo into a buffer and then calls another callback with that buffer. The two levels of callbacks are there to separate the code: the uart driver is in no way specific to the bridging function, it can be configured to call something completely different (the callback registration can be found here).

The second callback takes the buffer of received characters and transmits it on the TCP connection. *The *TCP connection? Not exactly! What happens if multiple clients connect to port 23? Is that even useful? It turns out to be useful for a number of reasons. It allows one connection to remain open for long periods as a terminal and other connections to be opened short-term to upload a fresh program. Or one connection may be stuck for a variety of reasons and then it’s nice to be able to continue working with another one and not being blocked until the first one actually closes or times out. As a result, the best course of action is for the second callback to send the received characters onto all currently open connections.

This whole design raises two concerns: how many interrupts and tiny packets does this generate and what happens with TCP flow-control?

On the interrupt front the esp8266’s UART comes to the rescue. It has a 128 byte receive FIFO and thus does not need to generate an interrupt for every character. In addition, it has a built-in timer that can trigger an interrupt when the FIFO is non-empty and nothing has been received for a while. This is used in the UART driver to set up two interrupts: one when the FIFO holds 80 characters (i.e. not quite full yet so there’s some headroom for the callback to occur) and one when the FIFO holds at least one character and nothing has been received for 4 character periods. What this means is that an interrupt only occurs when there is a decent number of characters in the fifo or the attached processor “has nothing further to say”. Generating an interrupt and sending a packet even with just one or two characters is important, for example, during the programming sequence: avrdude sends a block of bytes to be programmed and then waits until the bootloader sends a short ACK back.

Starting the programming sequence

There is a final piece that the earlier description of the TCP receive callback omitted, which is how to issue the reset to place an AVR processor into programming mode or an ISP-plus-reset sequence in the case of an attached ARM processor.

This is done by looking at the first few characters received on a fresh TCP connection and if they correspond to those first sent by avrdude or an ARM programmer then the appropriate GPIO pins are toggled.

The receive callback has further logic to allow the reset and ISP gpio pins to be toggled under direct control of the remote program. This uses the rather ancient telnet protocol, which defines a few simple escape sequences to send out-of-band commands for this purpose.

Wrap-up

The functionality of the serial UART to TCP bridge is deceptively simple. The code to implement it is also relatively straight-forward, but if one stops to think about why it actually stands a chance of working the details are less than simple! This is not to say that the esp-bridge code is a masterpiece, however! Far from it, it could definitely be improved and optimized!

The next episode will discuss the other major part of the firmware which is the web server that allows the esp-bridge to be configured.

[Back to article index]