Root Access
All posts
· Newsletter· Vignesh Kondapally

Works in Debug, Dies in Release

Part of the series: From Silicon to System — Microcontroller and SoC Architecture Deep Dive


I was working with a feature called Low Power Deep Sleep on a TI mmWave radar SoC. The idea was straightforward — clock gate portions of memory and various IP blocks between frames to hit aggressive power targets, then bring everything back up on schedule. Simple enough in theory. The problem was the debugger is disabled in LPDS mode, which made debugging very difficult. There was a bug I encountered where the debug image worked fine, everything looked right — the entry parameters checked out, the device would simulate the sleep cycle and wake cleanly. In release, it would enter LPDS and never come back. I never found the root cause. The schedule moved on, and so did I — but the question never left me. It turns out this class of bug has a name, a set of known causes, and a reason it hides so effectively in debug mode.

Why does this happen?

If you have ever seen the difference in size of a debug image and a release image, you know that debug is usually filled with debug symbols, unoptimized instructions, and additional metadata to make it easier to read and step through. When you debug a release image, the program counter jumps all over the place because the compiler has reordered instructions, inlined functions, and eliminated code it deemed unnecessary, all in the name of a faster and smaller binary. Debugging the debug version is like reading a book; you can follow the program sequentially from start to finish. The release version is the same story, rewritten by someone else. They share the same source, but they are fundamentally two different programs — and that gap is exactly where things go wrong.

The failures tend to fall into three categories: timing, optimization, and memory layout. Each one is subtle on its own. Together, they can produce bugs that are nearly impossible to reproduce — and invisible the moment you attach a debugger.

Timing

In the embedded systems time scale, the difference in how fast a release and debug build runs is massive — often measured in microseconds or nanoseconds, but on embedded hardware, that scale is everything. When you have a large number of peripherals running at high speeds, interrupt-driven architectures where ISR timing is unpredictable, DMA transfers happening concurrently with CPU execution, clock domain crossings between peripherals, bus contention when multiple peripherals are active simultaneously — the list goes on and on — you are easily prone to timing issues that are hard to find. When you do it slowly in debug mode it might be fine, but what happens when that runs for millions of cycles in the span of minutes? A peripheral flag that wasn't ready when the CPU checked it, a register write that hadn't settled before the next read — it might only happen once every million cycles. But it will happen.

Optimization

The compiler is not your friend in the way you think it is. Its job is to produce the fastest or smallest correct program it can — and "correct" by its definition is not the same as correct by yours. One of the most common ways this bites embedded developers is the volatile keyword. Peripheral registers change state based on hardware events the compiler knows nothing about. If you don't mark them volatile, the compiler sees a variable that never changes from its perspective and is free to cache it, eliminate the read entirely, or optimize away the loop waiting on it. In debug builds this often accidentally works because the optimizer is largely off. In release it does exactly what it's allowed to do, and your code breaks in ways that look like magic.

Beyond volatile, undefined behavior is the other silent killer. Signed integer overflow, uninitialized variables, out-of-bounds reads — in debug these tend to produce the result you expected by accident. In release the compiler treats undefined behavior as a license to do whatever produces the best optimization, including deleting code paths entirely. The check you wrote to protect against a bad condition might simply not exist in the binary.

Memory

Anyone who has worked in embedded systems knows that memory is never a luxury — it's a constraint you design around from day one. You are doing calculations, defining state machines, and running an entire system in a space so small that every byte has to earn its place. That tension never really goes away; it just becomes part of how you think. And that context is exactly what makes memory layout bugs in release builds so insidious.

Release builds are smaller — sometimes dramatically so. Dead code gets eliminated, functions get inlined, Link-Time Optimization (LTO) reshuffles everything. That might sound like a good thing, and it usually is, but it also means every variable, every function, every stack frame lands at a different address than it did in your debug build. On most microcontrollers there is no MMU, no page fault, nothing to catch you when the stack overflows into adjacent memory. In a debug build the larger binary might place that overflow somewhere harmless. In release the tighter layout puts it directly into your .bss section, your vector table, or a DMA buffer. The corruption is the same — the consequence is different. And it only has to happen once to bring everything down.

So when your release build crashes and your debug build doesn't, the question isn't whether something changed — it's which of these three forces is responsible. Figuring that out is where the real work begins.

Debugging Without a Net

Debugging a release-only crash is a lot of shots in the dark, hoping to catch a needle in a haystack. Your first move should be attaching the debugger and checking for error codes — sometimes you get lucky. When that's not an option, the next best thing is leaving a trail. Send output over UART or SPI at key points in your expected program flow and see where it goes silent. The last message you receive is your crash boundary. It's a slow process — you're recompiling every time you add or move an output statement — but it's non-invasive and it works on the actual release build without changing its behavior the way a debugger would.

Beyond that, external tools like oscilloscopes and logic analyzers can give you visibility into what the hardware is doing at a signal level, though setting up meaningful test points takes time of its own. On the software side, changing compiler flags and walking back optimization level by module can help you narrow down whether the issue is optimization-related at all. The honest truth is that there is no clean methodology here. You go back to the design docs, look at each individual part of the implementation, and vet them one by one. The documentation becomes your map when the debugger can't be. It comes down to staying systematic, being thorough, and accepting that the answer will show itself eventually — you just have to eliminate everything it isn't first.

Why It's Worth Understanding

Microcontroller and SoC architecture is so incredibly complex that you never know what you might run into. Despite decades of development and accumulated knowledge, we still can't say with certainty whether a design will work until you go from silicon to system and see it for yourself. It's not until you go through something like this that you start paying closer attention to every detail — the kind of attention that doesn't come from reading about it. You could go through hell and back only to find that the culprit the whole time was a missing volatile keyword, a misread datasheet footnote, or some minute detail you didn't think mattered. That's embedded systems. And that's exactly why it's worth understanding.