I have spent the last four years writing C for Linux — DPDK packet processing, SPDK storage, kernel networking. Memory is abundant, the OS handles scheduling, and if you need a buffer you just ask for one. Then I picked up an STM32 and everything I knew stopped applying.

The malloc shock

The first thing you notice is that malloc is gone. Not discouraged — gone. On a microcontroller with 20KB of RAM and no MMU, dynamic allocation is a reliability disaster waiting to happen. Heap fragmentation on a device that runs for months without a reset will eventually kill you.

The embedded answer is static allocation. Every buffer, every queue, every data structure is declared at compile time with a fixed size. You know exactly how much memory your program uses before you flash it.

// Linux mindset
char *buf = malloc(256);

// Embedded mindset
static char buf[256];  // lives forever, size known at compile time

The stack is everything

On Linux, if you blow the stack the OS catches it with a segfault. On STM32, you just corrupt whatever is below the stack in memory — usually your global variables — and the program silently misbehaves. There is no safety net.

This makes you think carefully about stack depth. Deep recursion is out. Large local arrays are suspicious. You develop an instinct for how much stack each function frame uses.

You are the scheduler

With no RTOS, your main loop IS the scheduler. Everything runs sequentially unless an interrupt fires. This is called a superloop and it looks embarrassingly simple:

int main(void) {
    hardware_init();
    while (1) {
        read_sensors();
        update_state();
        drive_outputs();
    }
}

The trap is thinking this is naive. For a lot of embedded applications it is exactly right. Complexity only pays when you need real preemption — when one task genuinely cannot wait for another to finish.

volatile is not a joke

In Linux C I used volatile maybe twice in four years. In embedded it is everywhere. When you write to a hardware register, the compiler has no idea that write has a physical effect. Without volatile it will optimise it away.

// Without volatile — compiler may eliminate this write
*(uint32_t *)0x40020018 = 0x1;

// With volatile — guaranteed to happen
*(volatile uint32_t *)0x40020018 = 0x1;

The same applies to variables shared between your main loop and an interrupt handler. The interrupt can fire at any time and modify a variable. Without volatile, the compiler caches the value in a register and your main loop never sees the update.

A real example — toggling a GPIO without HAL

HAL hides the registers. Let us not use it. On STM32F4, toggling pin PA5 (the onboard LED on a Nucleo board) looks like this:

#include <stdint.h>

#define RCC_BASE    0x40023800
#define RCC_AHB1ENR (*(volatile uint32_t *)(RCC_BASE + 0x30))

#define GPIOA_BASE  0x40020000
#define GPIOA_MODER (*(volatile uint32_t *)(GPIOA_BASE + 0x00))
#define GPIOA_ODR   (*(volatile uint32_t *)(GPIOA_BASE + 0x14))

void delay(volatile uint32_t n) {
    while (n--) __asm__("nop");
}

int main(void) {
    RCC_AHB1ENR |= (1 << 0);        // enable GPIOA clock
    GPIOA_MODER |= (1 << 10);       // PA5 as output (bits 11:10 = 01)
    GPIOA_MODER &= ~(1 << 11);

    while (1) {
        GPIOA_ODR ^= (1 << 5);      // toggle PA5
        delay(1000000);
    }
}

Every line does exactly one thing to one register. There is no abstraction layer. You learn more from 30 lines like this than from a week of HAL examples.

What transfers from Linux, what does not

What transfers: understanding of memory layout, comfort with pointers and bit manipulation, reading datasheets, thinking about cache and alignment, interrupt-driven thinking from signal handlers and epoll.

What does not transfer: the assumption that the OS has your back. On bare metal, nothing catches your mistakes. The hardware does exactly what you tell it — including the wrong things.

That is actually what makes it interesting.