{"id":3953,"date":"2018-05-09T09:30:53","date_gmt":"2018-05-09T16:30:53","guid":{"rendered":"https:\/\/visualgdb.com\/w\/?p=3953"},"modified":"2018-05-09T09:30:53","modified_gmt":"2018-05-09T16:30:53","slug":"diagnosing-complex-memory-corruption-problems-with-segger-j-trace","status":"publish","type":"post","link":"https:\/\/visualgdb.com\/tutorials\/arm\/tracing\/traceback\/","title":{"rendered":"Diagnosing Complex Memory Corruption Problems with Segger J-Trace"},"content":{"rendered":"<p>This\u00a0tutorial shows how to diagnose complex memory corruption problems using the ARM ETM tracing with VisualGDB and Segger J-Trace.<\/p>\n<p>ETM tracing is a powerful debug technology that allows recording\u00a0each and every instruction executed by the ARM processor, so you can conveniently step back in time and understand\u00a0the events that lead to a strange unexpected crash.<\/p>\n<p>In this tutorial we will\u00a0create a basic FreeRTOS-based program with incompatible floating point settings (leading to a very tough-to-diagnose\u00a0memory\u00a0corruption under certain circumstances), will reproduce the problem and then show how to use the\u00a0tracing\u00a0functionality with Segger J-Trace to\u00a0help\u00a0pinpoint the problem better.<\/p>\n<p>Before you begin, install VisualGDB 5.4 or later and ensure\u00a0you have a Segger J-Trace Pro and a board with a trace connector (in this tutorial we will use the reference tracing board\u00a0that comes with the J-Trace).<\/p>\n<ol>\n<li>Start Visual Studio and open the VisualGDB Embedded Project Wizard:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/01-prjname1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3954\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/01-prjname1.png\" alt=\"01-prjname\" width=\"936\" height=\"572\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/01-prjname1.png 936w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/01-prjname1-300x183.png 300w\" sizes=\"(max-width: 936px) 100vw, 936px\" \/><\/a><\/li>\n<li>Select &#8220;New Project -&gt; Embedded binary&#8221;:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/02-newprj1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3955\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/02-newprj1.png\" alt=\"02-newprj\" width=\"856\" height=\"693\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/02-newprj1.png 856w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/02-newprj1-300x243.png 300w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><\/a><\/li>\n<li>Pick your toolchain and the device. The reference trace board comes with the STM32F407VE\u00a0microcontroller,\u00a0however if you are using a different board, select a device that matches it:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/03-device1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3956\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/03-device1.png\" alt=\"03-device\" width=\"856\" height=\"693\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/03-device1.png 856w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/03-device1-300x243.png 300w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><\/a><\/li>\n<li>On the\u00a0next page select the &#8220;LEDBlink (FreeRTOS)&#8221; sample.\u00a0You can try changing the LED numbers to match the Segger board layout, however this is not necessary as we won&#8217;t rely on the LEDs in this tutorial:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/04-rtos.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3957\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/04-rtos.png\" alt=\"04-rtos\" width=\"856\" height=\"693\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/04-rtos.png 856w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/04-rtos-300x243.png 300w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><\/a><\/li>\n<li>Connect your J-Trace to your board and plug both into the USB ports of your computer:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/hardware.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3982\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/hardware.jpg\" alt=\"hardware\" width=\"700\" height=\"509\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/hardware.jpg 700w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/hardware-300x218.jpg 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/a><\/li>\n<li>VisualGDB should automatically recognize the J-Trace\u00a0(shown as J-Link in the GUI) and pick the correct settings. Ensure you use the Segger software package v6.32 or later, as the previous versions don&#8217;t include the functionality used by VisualGDB:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/05-jlink.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3958\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/05-jlink.png\" alt=\"05-jlink\" width=\"856\" height=\"693\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/05-jlink.png 856w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/05-jlink-300x243.png 300w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><\/a><\/li>\n<li>Press &#8220;Finish&#8221; to create the project. Now we will\u00a0introduce the bug. Switch the <strong>Floating Point Support<\/strong> setting to <strong>Hardware<\/strong>:<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/06-hardware.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3959\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/06-hardware.png\" alt=\"06-hardware\" width=\"816\" height=\"656\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/06-hardware.png 816w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/06-hardware-300x241.png 300w\" sizes=\"(max-width: 816px) 100vw, 816px\" \/><\/a><\/li>\n<li>On the Embedded Frameworks page set the FreeRTOS CPU Core to <strong>ARM Cortex M3 or M4 with Software FP<\/strong>:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/07-softfp.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3960\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/07-softfp.png\" alt=\"07-softfp\" width=\"816\" height=\"656\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/07-softfp.png 816w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/07-softfp-300x241.png 300w\" sizes=\"(max-width: 816px) 100vw, 816px\" \/><\/a><\/li>\n<li>Replace the main part of the source file with the following code:\n<pre class=\"\">osSemaphoreDef(s_Semaphore);\r\nosSemaphoreId(s_SemaphoreId);\r\n\r\nint main(void)\r\n{\r\n    HAL_Init(); \r\n \r\n    __GPIOD_CLK_ENABLE();\r\n    GPIO_InitTypeDef GPIO_InitStructure;\r\n\r\n    GPIO_InitStructure.Pin = GPIO_PIN_12 | GPIO_PIN_13;\r\n\r\n    GPIO_InitStructure.Mode = GPIO_MODE_OUTPUT_PP;\r\n    GPIO_InitStructure.Speed = GPIO_SPEED_HIGH;\r\n    GPIO_InitStructure.Pull = GPIO_NOPULL;\r\n    HAL_GPIO_Init(GPIOD, &amp;GPIO_InitStructure);\r\n\r\n    \/* Thread 1 definition *\/\r\n    osThreadDef(LED1, LED_Thread1, osPriorityNormal, 0, configMINIMAL_STACK_SIZE);\r\n \r\n    \/* Thread 2 definition *\/\r\n    osThreadDef(LED2, LED_Thread2, osPriorityNormal, 0, configMINIMAL_STACK_SIZE);\r\n \r\n    \/* Start thread 1 *\/\r\n    LEDThread1Handle = osThreadCreate(osThread(LED1), NULL);\r\n \r\n    \/* Start thread 2 *\/\r\n    LEDThread2Handle = osThreadCreate(osThread(LED2), NULL);\r\n \r\n    s_SemaphoreId = osSemaphoreCreate(osSemaphore(s_Semaphore), 32);\r\n\r\n \r\n    \/* Start scheduler *\/\r\n    osKernelStart();\r\n\r\n    \/* We should never get here as control is now taken by the scheduler *\/\r\n    for (;;)\r\n        ;\r\n}\r\n\r\nvoid SysTick_Handler(void)\r\n{\r\n    osSystickHandler();\r\n}\r\n\r\n#include &lt;math.h&gt;\r\n\r\nstatic void LED_Thread1(void const *argument)\r\n{\r\n    for (;;)\r\n    {\r\n        osDelay(100);\r\n \r\n#ifdef CRASH\r\n        volatile float arg = 3.14;\r\n        volatile float result = sinf(arg);\r\n#endif\r\n \r\n        osSemaphoreRelease(s_SemaphoreId);\r\n    }\r\n}\r\n\r\nstatic void LED_Thread2(void const *argument)\r\n{\r\n    for (;;)\r\n    {\r\n        osSemaphoreWait(s_SemaphoreId, osWaitForever);\r\n    }\r\n}<\/pre>\n<\/li>\n<li>Open VisualGDB Project Properties and enable real-time tracing to support collecting\u00a0and analyzing trace data from the CPU:<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3966\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/13-trace.png\" alt=\"13-trace\" width=\"888\" height=\"656\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/13-trace.png 888w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/13-trace-300x222.png 300w\" sizes=\"(max-width: 888px) 100vw, 888px\" \/><\/li>\n<li>The current version of the code doesn&#8217;t trigger the crash yet and is expected to run normally. Verify this by pressing F5 to build and start the program.\u00a0The easiest way to see\u00a0what the program is doing without stopping it is to enable recent code highlighting in the Live Tracing window (although this is not related to\u00a0the advanced troubleshooting that will be described later):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3983\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live.png\" alt=\"live\" width=\"1187\" height=\"832\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live-300x210.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live-1024x718.png 1024w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/live-130x90.png 130w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>You can also set a breakpoint in one of the\u00a0thread functions and wait for it to hit to ensure that the thread is running as expected:<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/08-works.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3961\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/08-works.png\" alt=\"08-works\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/08-works.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/08-works-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/08-works-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>Now we will trigger the crash. Open VisualGDB Project Properties and add &#8220;CRASH&#8221; to the &#8220;preprocessor macros&#8221; session. This will\u00a0enable the call to sinf() from the\u00a0LED1 thread:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/09-enablecode.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3962\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/09-enablecode.png\" alt=\"09-enablecode\" width=\"816\" height=\"656\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/09-enablecode.png 816w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/09-enablecode-300x241.png 300w\" sizes=\"(max-width: 816px) 100vw, 816px\" \/><\/a><\/li>\n<li>Remove all breakpoints, build and run the program again. The Live Coverage view can easily show you that the program is now stuck in the Default_Handler() function and no other code is begin executed:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/stuck.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3984\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/stuck.png\" alt=\"stuck\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/stuck.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/stuck-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/stuck-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>Select &#8220;Debug-&gt;Break All&#8221; to stop\u00a0the program and confirm that it is stuck in the Default_Handler():<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/10-handler.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3963\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/10-handler.png\" alt=\"10-handler\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/10-handler.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/10-handler-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/10-handler-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>The Default_Handler()\u00a0function is shared between all unimplemented interrupt\u00a0handlers, making it harder to understand which\u00a0interrupt triggered it. As the comment in the default handler suggests, add &#8220;DEBUG_DEFAULT_INTERRUPT_HANDLERS&#8221; to Preprocessor Macros to\u00a0define\u00a0a separate handler for each unused interrupt (at\u00a0a slight additional memory cost):<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/11-debugisr.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3964\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/11-debugisr.png\" alt=\"11-debugisr\" width=\"816\" height=\"656\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/11-debugisr.png 816w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/11-debugisr-300x241.png 300w\" sizes=\"(max-width: 816px) 100vw, 816px\" \/><\/a><\/li>\n<li>If you your the program now, it will almost immediately get stopped in the\u00a0HardFault_handler() function that\u00a0is called when the ARM core tries executing an invalid instruction or reading from an invalid address:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/12-crashed.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3965\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/12-crashed.png\" alt=\"12-crashed\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/12-crashed.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/12-crashed-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/12-crashed-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a>In most of the cases, the Call Stack window would be helpful in\u00a0identifying the caller of the current function, however in this case it is not very helpful, as it lists the <strong>prvPortStartFirstTask()<\/strong> as the caller, that doesn&#8217;t make much sense. If we were limited to conventional debugging methods, pinpointing the cause of the crash would be very tough.<\/li>\n<li>As we are using J-Trace, we don&#8217;t have to rely on the Call Stack to reconstruct the preceding events. Instead we can\u00a0simply look through executed the instruction list reported by the ARM CPU via the trace interface. Open the Debug-&gt;Windows-&gt;Live Tracing window and switch it to the Recent Instructions view:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/14-break.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3967\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/14-break.png\" alt=\"14-break\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/14-break.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/14-break-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/14-break-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>Switch from the &#8220;Lines&#8221; subview to &#8220;Instructions&#8221;\u00a0and look through the recent instructions.\u00a0You\u00a0will see that last function running before HardFault_Handler()\u00a0was\u00a0<strong>xQueueGenericReceive()<\/strong>.\u00a0Although the ETM trace will show the\u00a0previously executed\u00a0instructions, it won&#8217;t reconstruct the\u00a0register and memory values from the past, so we would\u00a0still need to use\u00a0conventional breakpoints\u00a0to try\u00a0stopping at the right moment in time and doing more analysis. Click on the last instruction of this function in the trace view and set a breakpoint there. Then press the &#8220;Reset Embedded Device&#8221; button and hit F5 to continue:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/15-reset.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3968\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/15-reset.png\" alt=\"15-reset\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/15-reset.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/15-reset-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/15-reset-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>Once the program restarts and the breakpoint triggers, check the value of the <strong>$sp<\/strong> register and the <strong>((unsigned *)$sp)[1]<\/strong> memory slot (containing the\u00a0return address). The current return address (0x08003afb in this example) looks\u00a0like a valid FLASH memory address (you can check it by running &#8220;info\u00a0symbol &lt;address&gt;&#8221; in the GDB Session window):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/16-goodsp.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3969\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/16-goodsp.png\" alt=\"16-goodsp\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/16-goodsp.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/16-goodsp-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/16-goodsp-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>To try catching the\u00a0moment with\u00a0the incorrect address, add the following condition to the breakpoint:\n<pre class=\"\">((unsigned *)$sp)[1] != &lt;correct value shown currently&gt;<\/pre>\n<p><a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/17-cond.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3970\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/17-cond.png\" alt=\"17-cond\" width=\"786\" height=\"232\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/17-cond.png 786w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/17-cond-300x89.png 300w\" sizes=\"(max-width: 786px) 100vw, 786px\" \/><\/a><\/li>\n<li>Resume the program. The\u00a0breakpoint will now hit, showing\u00a0that\u00a0the stack pointer is still the same, although the return value stored after it is wrong:<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/18-badvalue.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3971\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/18-badvalue.png\" alt=\"18-badvalue\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/18-badvalue.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/18-badvalue-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/18-badvalue-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>We can\u00a0easily identify the instruction responsible for setting this value by adding a data breakpoint (via the Breakpoints window). Set it on the address\u00a0of the return\u00a0address slot (add 4 to current $sp value) and\u00a0make it conditional so it only breaks\u00a0when the\u00a0written value is\u00a0actually incorrect:<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/19-databp.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3972\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/19-databp.png\" alt=\"19-databp\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/19-databp.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/19-databp-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/19-databp-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>Restart the program again. The breakpoint will now trigger in the prologue of the <strong>prbCopyDataFromQueue()<\/strong> function. Note that due to the internal logic of the ARM core, it will stop 2 instructions after the actual memory writing instruction:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/20-badwrite.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3973\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/20-badwrite.png\" alt=\"20-badwrite\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/20-badwrite.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/20-badwrite-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/20-badwrite-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>The instruction is a part of the regular function prologue responsible for saving\u00a0registers into the stack, so\u00a0it would only overwrite the return address if the stack pointer was set incorrectly when the function was called. Using\u00a0\u00a0conventional\u00a0debugging methods would require looking through the related code and\u00a0trying to\u00a0find parts responsible for stack pointer manipulation, however with\u00a0the J-Trace you can simply see the\u00a0list of recently executed functions or instructions and search there:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/21-history.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3974\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/21-history.png\" alt=\"21-history\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/21-history.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/21-history-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/21-history-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>The code after the <strong>PendSV_Handler()<\/strong>\u00a0doesn&#8217;t do anything special with the stack pointer, however the <strong>PendSV_Handler()<\/strong> itself is\u00a0responsible for\u00a0restoring the stack pointer after a thread switch. Locate the first instruction called after\u00a0the\u00a0return from <strong>PendSV_Handler()\u00a0<\/strong>and set a breakpoint there:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/22-change.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3975\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/22-change.png\" alt=\"22-change\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/22-change.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/22-change-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/22-change-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a>Also set a breakpoint on the instruction that requests a PendSV interrupt. The\u00a0normal sequence of events should look as follows:\n<ol style=\"list-style-type: lower-alpha;\">\n<li>xQueueGenericReceive (running in thread #2) requests a PendSV interrupt.<\/li>\n<li>The PendSV interrupt handler stores the current stack pointer and registers inside the thread object and switches the thread context to thread #1.<\/li>\n<li>Eventually the thread #1 enters a wait state (or is preempted).<\/li>\n<li>The PendSV handler would then restore the context of thread #2 and continue executing it.<\/li>\n<\/ol>\n<\/li>\n<li>Once the first breakpoint hits, take a note of the $sp value. Normally it should stay the same once the thread context is\u00a0restored:<br \/>\n<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/24-before.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3977\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/24-before.png\" alt=\"24-before\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/24-before.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/24-before-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/24-before-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>However in this case it doesn&#8217;t happen &#8211; the $sp value gets increased by 0x48 bytes after the\u00a0return from PendSV_Handler(), making the subsequent code overwrite\u00a0some of the stored values. Eventually the stack pointer is restored based on the frame pointer, but the damage done to the saved values triggers a crash later:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/25-after.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3978\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/25-after.png\" alt=\"25-after\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/25-after.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/25-after-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/25-after-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>The only remaining step for solving this puzzle is to understand why PendSV corrupts the stack pointer.\u00a0Restart the program again and let the first breakpoint (with the correct stack value) hit. Then set a breakpoint inside the PendSV handler and press F5 (don&#8217;t use single-stepping as it would suppress interrupts):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/26-stored.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3979\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/26-stored.png\" alt=\"26-stored\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/26-stored.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/26-stored-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/26-stored-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a>See\u00a0how the\u00a0$lr register contains a special value of 0xfffffd. This indicates the the processor automatically saved all registers to the stack and switched from the thread mode to the handler mode (that has its own stack).\u00a0The &#8220;process mode&#8221; stack pointer can be\u00a0observed via the $psp register (that is equal to the previously\u00a0observed $sp value minus 0x20).<\/li>\n<li>Take a note of the pxCurrentTCB value, set a breakpoint\u00a0at the exit from PendSV_Handler() and continue\u00a0the\u00a0program. First time the breakpoint is hit, the pxCurrentTCB will point to a different thread:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/27-exit.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3980\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/27-exit.png\" alt=\"27-exit\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/27-exit.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/27-exit-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/27-exit-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a><\/li>\n<li>As we are\u00a0only\u00a0interested in the\u00a0thread #2, continue the program until you see the old pxCurrentTCB value again (you can\u00a0set a breakpoint condition, or simply press F5 a couple of times):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/28-bxlr.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3981\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/28-bxlr.png\" alt=\"28-bxlr\" width=\"1187\" height=\"724\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/28-bxlr.png 1187w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/28-bxlr-300x183.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2018\/05\/28-bxlr-1024x625.png 1024w\" sizes=\"(max-width: 1187px) 100vw, 1187px\" \/><\/a>See how the $psp register got restored correctly (0x20000598 in this example), however the $lr register contains\u00a00xffffffed that\u00a0corresponds to a thread-to-handler mode switch with FP data preservation. If you let the &#8220;bx lr&#8221; instruction run now, it will &#8220;restore&#8221; the FP register values from the stack,\u00a0that never contained them. As a result, it will free more space from the stack than it should, corrupting the stack pointer.<\/li>\n<\/ol>\n<p>The reconstructed\u00a0sequence of events looks like this:<\/p>\n<ol style=\"list-style-type: lower-alpha;\">\n<li>Before running any FP-related code,\u00a0thread #2 triggers the PendSV interrupt that saves the non-FP\u00a0registers on the stack and invokes <strong>PendSV_Handler()<\/strong>.<\/li>\n<li>The PendSV handler\u00a0switches the context to\u00a0thread #1 and returns.<\/li>\n<li>Thread #1 invokes the sinf() function that uses some FP registers and then triggers the PendSV interrupt again.<\/li>\n<li>As the FP registers have non-zero values, the\u00a0ARM processor\u00a0saves them to the stack and sets $lr to a special\u00a0value of 0xffffffed that indicates\u00a0the presence of the FP registers on the stack.<\/li>\n<li>The PendSV handler restores the context of thread #2 (that has not saved any FP registers).<\/li>\n<li>The PendSV returns using the 0xffffffed value, causing the ARM CPU to\u00a0free more stack than was originally\u00a0used for saving the non-FP registers.<\/li>\n<li>Subsequent code overwrites the return address stored in the stack before the context switch.<\/li>\n<li>Eventually the stack pointer is restored to the correct value, but the return address stored in the stack is already wrong.<\/li>\n<li>When xQueueGenericReceive() returns, it uses the incorrect return address,\u00a0triggering the HardFault exception.<\/li>\n<\/ol>\n<p>The problem can be easily solved by switching back to an FP-aware port of FreeRTOS that will handle this case correctly, however as the problem involved several instances of register\/memory corruption before the final crash took place, finding the root cause of the crash\u00a0was not easy and was greatly simplified by using Segger J-Trace that allowed reconstructing the preceding events\u00a0despite\u00a0the fact that any evidence on the stack was long gone.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This\u00a0tutorial shows how to diagnose complex memory corruption problems using the ARM ETM tracing with VisualGDB and Segger J-Trace. ETM<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27],"tags":[53,165,164,61],"_links":{"self":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3953"}],"collection":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/comments?post=3953"}],"version-history":[{"count":1,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3953\/revisions"}],"predecessor-version":[{"id":3985,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3953\/revisions\/3985"}],"wp:attachment":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media?parent=3953"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/categories?post=3953"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/tags?post=3953"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}