Most (all?) STM32 microcontrollers have a built-in hardware checksum calculation unit. From Wikipedia:

A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption.

The STM32 hardware will calculate the standard CRC32 much faster than a software implementation: it can process 1 byte per system clock cycle (i.e. 168 MB/s max).

Hardware CRC

This STM32F407 example enables the CRC unit and calculates a CRC32 of 12 bytes:

void hardwareCrcDemo () {
    Periph::bit(Periph::rcc+0x30, 12) = 1;  // CRCEN, p.181
    constexpr uint32_t crc = 0x40023000;

    MMIO32(crc+0x08) = 1; // reset
    printf("%08x\n", MMIO32(crc));

    MMIO32(crc) = 0x12345678;
    MMIO32(crc) = 0xDEADBEEF;
    MMIO32(crc) = 0x87654321;
    printf("%08x\n", MMIO32(crc));
}

Sample output:

FFFFFFFF
BC462F4F

Some notes and observations:

This particular CRC32 (polynomial 0x04C11DB7) also has the interesting property that including the CRC itself always results in 0x00000000:

void hardwareCrcDemo2 () {
    Periph::bit(Periph::rcc+0x30, 12) = 1;  // CRCEN, p.181
    constexpr uint32_t crc = 0x40023000;

    MMIO32(crc+0x08) = 1; // reset
    printf("%08x\n", MMIO32(crc));

    MMIO32(crc) = 0x12345678;
    MMIO32(crc) = 0xDEADBEEF;
    MMIO32(crc) = 0x87654321;
    printf("%08x\n", MMIO32(crc));

    MMIO32(crc) = MMIO32(crc);
    printf("%08x\n", MMIO32(crc));

    MMIO32(crc) = 0;
    printf("%08x\n", MMIO32(crc));
}

Sample output:

BC462F4F
00000000
00000000

This is very convenient: a sender can send a packet of data followed by its checksum, and the receiver can simply re-calculate the checksum over both - if the result is zero, then that final checksum is correct.

But also note that just after a CRC calculation returns zero, additional zeros will not change that CRC value. That’s because appending the CRC so far (i.e. zero) to the end leads to a new CRC which will again be zero.

Software CRC

Software implementations of CRC32 are easy to find, e.g. this one from GNU Radio. It can be wrapped in a C++ class for a nice API (note the extra 256-word lookup table):

class CRC32 {
    uint32_t crc;

public:
    CRC32 () { reset(); }

    void reset () { crc = 0xFFFFFFFF; }

    operator uint32_t () const { return crc; }

    void update (uint32_t val) {
        static const uint32_t crcTab [256] = {
            0x00000000U, 0x04C11DB7U, 0x09823B6EU, 0x0D4326D9U,
            // etc...
            0xBCB4666DU, 0xB8757BDAU, 0xB5365D03U, 0xB1F740B4U,
        };

        for (int i = 0; i < 4; ++i) {
            crc = crcTab[(val^crc)>>24] ^ (crc<<8);
            val <<= 8;
        }
    }
};

Here is the same calculation, using the software approach:

void softwareCrcDemo () {
    CRC32 crc;

    crc.update(0x12345678);
    crc.update(0xDEADBEEF);
    crc.update(0x87654321);
    printf("%08x\n", (uint32_t) crc);

    crc.update(crc);
    printf("%08x\n", (uint32_t) crc);
}

And sure enough, the results are identical. It also takes 16 times as many clock cycles.

References