Using VisualGDB FreeRTOS Tracing to Optimize Real-time Code
This tutorial shows how to use the FreeRTOS tracing feature of VisualGDB to optimize a simple FreeRTOS UART driver based on queues. We will create a basic project using queues to buffer the incoming and outgoing UART data, measure the delays in various components of our setup and show how to optimize them.
Before you begin, install VisualGDB 5.2 or later.
- Ensure that you are using a board that can communicate with your computer via the COM port. We will use the STM32F410-Nucleo board that has an on-board ST-Link 2.1 that provides a virtual COM port connected to UART2 on the device, however the steps below will work for any other USB-to-UART bridge as long as it is connected to the device. Plug in the board and take a note of the COM port number in Device Manager:
- Start Visual Studio and open the VisualGDB Embedded Project Wizard:
- Select “Create a new project -> Embedded Binary”:
- Select the ARM toolchain and your device. We will use the STM32F410RB device installed on the STM32F410-Nucleo board:
- Select the “LEDBlink (FreeRTOS)” sample and ensure that the FreeRTOS CPU core matches your CPU core and the floating point setting on the previous page:
- Select OpenOCD as the debug method and click “Detect” to automatically detect the rest of the settings:
- Press “Finish” to generate the project. The first thing we will do is test that the UART connection is working. Change the main source file extension to .cpp and replace its contents with the following code:
#include <stm32f4xx_hal.h> #ifdef __cplusplus extern "C" #endif void SysTick_Handler(void) { HAL_IncTick(); HAL_SYSTICK_IRQHandler(); } UART_HandleTypeDef g_UART; extern "C" void USART2_IRQHandler() { HAL_UART_IRQHandler(&g_UART); } int main(void) { HAL_Init(); __USART2_CLK_ENABLE(); __GPIOA_CLK_ENABLE(); g_UART.Instance = USART2; g_UART.Init.BaudRate = 115200; g_UART.Init.WordLength = UART_WORDLENGTH_8B; g_UART.Init.StopBits = UART_STOPBITS_1; g_UART.Init.Parity = UART_PARITY_NONE; g_UART.Init.HwFlowCtl = UART_HWCONTROL_NONE; g_UART.Init.Mode = UART_MODE_TX_RX; if (HAL_UART_Init(&g_UART) != HAL_OK) asm("bkpt 255"); GPIO_InitTypeDef GPIO_InitStruct; GPIO_InitStruct.Pin = GPIO_PIN_3; GPIO_InitStruct.Mode = GPIO_MODE_AF_PP; GPIO_InitStruct.Pull = GPIO_PULLUP; GPIO_InitStruct.Speed = GPIO_SPEED_HIGH; GPIO_InitStruct.Alternate = GPIO_AF7_USART2; HAL_GPIO_Init(GPIOA, &GPIO_InitStruct); GPIO_InitStruct.Pin = GPIO_PIN_2; GPIO_InitStruct.Alternate = GPIO_AF7_USART2; HAL_GPIO_Init(GPIOA, &GPIO_InitStruct); for (;;) { uint8_t tmp; HAL_UART_Receive(&g_UART, &tmp, 1, HAL_MAX_DELAY); tmp++; HAL_UART_Transmit(&g_UART, &tmp, 1, HAL_MAX_DELAY); } }
If you are using a different board, you will need to find out which UART is connected to the COM port you are using and change the following parts of the code:
Value New value __USART2_CLK_ENABLE() __USARTx_CLK_ENABLE() where X is your USART number USART2 USARTx where X is your USART number __GPIOA_CLK_ENABLE() __GPIOx_CLK_ENABLE() where X is the GPIO port that has your USART pins GPIO_PIN_3 and GPIO_PIN_2 GPIO pins that correspond to the RX and TX signals for your USART GPIO_AF7_USART2 Alternate function numbers that switch the RX and TX pins to USART mode GPIO2 GPIOx where X is the GPIO port that has your USART pins - Build the project and ensure it succeeds:
- Open VisualGDB Project Properties and enable the raw terminal on the COM port connected to your device:
- Press F5 to start the program. Try typing some characters in the COM port window in Visual Studio to see that they are echoed back with 1 added to them:
- Now we will modify our code to use FreeRTOS. We will create 2 threads:
- The first thread will read numbers from the COM port and sleep for the amount of milliseconds specified by each number
- The second thread will send the messages generated by the first thread character-by-character while the first one is sleeping
First modify the SysTickHandler to call the handler from FreeRTOS when the scheduler is active:
#include <FreeRTOS.h> #include <task.h> #include <queue.h> #include <semphr.h> #include <stdarg.h> extern "C" void xPortSysTickHandler(void); extern "C" void SysTick_Handler(void) { HAL_IncTick(); HAL_SYSTICK_IRQHandler(); if (xTaskGetSchedulerState() != taskSCHEDULER_NOT_STARTED) { xPortSysTickHandler(); } }
Then declare 2 queues that will be used to store sent and received data:
QueueHandle_t g_InQueue, g_OutQueue; static uint8_t s_ReceivedByte; SemaphoreHandle_t g_SendReadySemaphore;
Then add a main thread that will read characters from g_InQueue, interpret them as numbers, wait and print status messages via UART_Printf():
void MainThread(void *) { HAL_UART_Receive_IT(&g_UART, &s_ReceivedByte, 1); NVIC_SetPriority(USART2_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY); NVIC_EnableIRQ(USART2_IRQn); for (;;) { int value = 0; for (;;) { char buf; asm("nop"); while (xQueueReceive(g_InQueue, &buf, portMAX_DELAY) != pdPASS) { asm("nop"); } if (buf >= '0' && buf <= '9') { value *= 10; value += (buf - '0'); } else break; } if (!value) continue; UART_Printf("WAIT %d\r\n", value); vTaskDelay(value / portTICK_PERIOD_MS); UART_Printf("done\r\n"); } }
Add a UART_Printf() function that will send the output to g_OutQueue:
void UART_Printf(const char *pFormat, ...) { char buffer[128]; va_list lst; va_start(lst, pFormat); vsnprintf(buffer, sizeof(buffer), pFormat, lst); va_end(lst); for (int i = 0; i < sizeof(buffer) && buffer[i]; i++) xQueueSend(g_OutQueue, &buffer[i], portMAX_DELAY); }
Add a handler for the “UART character received” event that will put the received byte to g_InQueue:
extern "C" void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) { BaseType_t higherPriorityTaskWoken = pdFALSE; if (xQueueSendFromISR(g_InQueue, &s_ReceivedByte, &higherPriorityTaskWoken) != pdPASS) asm("bkpt 255"); if (HAL_UART_Receive_IT(&g_UART, &s_ReceivedByte, 1) != HAL_OK) { asm("bkpt 255"); } }
Add a similar handler to the “UART character sent” event that will notify the sending thread by posting an event to g_SendReadySemaphore:
extern "C" void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) { BaseType_t higherPriorityTaskWoken = pdFALSE; xSemaphoreGiveFromISR(g_SendReadySemaphore, &higherPriorityTaskWoken); }
Add the sending thread that will wait until the UART hardware is ready (signaled through g_SendReadySemaphore) and then send the next byte queued in g_OutQueue:
void SendThread(void *) { for (;;) { uint8_t buf; xSemaphoreTake(g_SendReadySemaphore, portMAX_DELAY); xQueueReceive(g_OutQueue, &buf, portMAX_DELAY); HAL_UART_Transmit_IT(&g_UART, &buf, 1); } }
Finally put this all together by creating the threads, queues and the semaphore in the main() function after the last HAL_GPIO_Init():
TaskHandle_t mainTask, sendTask; xTaskCreate(MainThread, "main", 1024, 0, tskIDLE_PRIORITY + 1, &mainTask); xTaskCreate(SendThread, "send", 1024, 0, tskIDLE_PRIORITY + 1, &sendTask); g_InQueue = xQueueCreate(1024, 1); g_OutQueue = xQueueCreate(1024, 1); g_SendReadySemaphore = xSemaphoreCreateCounting(1000, 1); vTaskStartScheduler();
- Build your program, run it and try entering small numbers to get the main thread to sleep for some milliseconds. Ensure that the program responds properly:
- Now we will use the VisualGDB FreeRTOS tracing to get see whether the driver could be optimized. Open the Dynamic Analysis page of VisualGDB Project Properties and enable the ‘Allow tracing various RTOS events’ checkbox. Then click at the ‘add reference’ link to automatically reference the profiling framework:
- On STM32F410 device the profiler framework will not build because it is missing the TIM2 timer used by the sampling profiler:
- You can easily fix this by adding SAMPLING_PROFILER_TIMER_INSTANCE=5 to preprocessor macros on the MSBuild Settings page:
- Build and start your program; wait until it loads and press ‘break all’. Use the Threads window to display the list of active RTOS threads:Note that having the window open will cause VisualGDB to analyze the threads after each stop and will reduce the performance. It is recommended to close the Threads window when not actively using it.
- Open the Real-Time watch window and add the threads to the watch. VisualGDB will automatically suggest known thread names:
- Next add the g_InQueue and g_OutQueue queues. You can either type in their names, or right-click on them in the source code and select ‘Add to real-time watch’:
- Resume the program and type ‘100<ENTER>’ in the COM port window. Note how the real-time watch will show several bursts of activity on the input queue (one for each received character) followed by a longer burst on the output queue (when the ‘WAIT’ message is printed):
- Zoom in on the ‘out’ queue activity. You will see how the queue is quickly filled while the main thread is running and then is slowly emptied by the send thread. Measure the time between the maximum queue size (start of transmission) and the time when it is fully emptied (end of transmission):
- To make things clearer we will use a custom real-time watch to plot the times of the ‘Receive complete’ and ‘Send complete’ events on the same scale. Include the <CustomRealTimeWatches.h> file, declare an instance of EventStreamWatch and post events there from both HAL_UART_RxCpltCallback() and HAL_UART_TxCpltCallback():
- Run the program again, add g_UARTInterrupts and g_SendReadySemaphore to real-time watch, then resume it, type ‘100<ENTER>’ in the COM port window and observe the timing:The graph clearly shows that out of almost 1 ms spent between subsequent timer interrupts most of the time is spent by the IDLE thread while the send thread is queued despite the g_SendReadySemaphore being signaled.
- This can be fixed by adding the following line to the end of HAL_UART_RxCpltCallback() and HAL_UART_TxCpltCallback():
portYIELD_FROM_ISR(higherPriorityTaskWoken);
- This forces the thread switch immediately after the interrupt handler returns, makes the send thread get activated sooner and considerably reduces the delay between the interrupt and reading the output queue:
- The 140 microsecond delay caused by the context switch is still relatively large. We can reduce it further by trying to read the queue directly from the ‘send complete’ callback and sending it immediately:
extern "C" void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) { g_UARTInterrupts.ReportEvent("TX"); BaseType_t higherPriorityTaskWoken = pdFALSE; uint8_t tmp; if (xQueueReceiveFromISR(g_OutQueue, &tmp, &higherPriorityTaskWoken)) HAL_UART_Transmit_IT(&g_UART, &tmp, 1); else xSemaphoreGiveFromISR(g_SendReadySemaphore, &higherPriorityTaskWoken); portYIELD_FROM_ISR(higherPriorityTaskWoken); }
- This eliminates the extra context switches: g_OutQueue is now read by the interrupt handler from the context of the IDLE thread and once all data is sent, the send thread is activated:Congratulations! By analyzing the timings and reducing the unnecessary delays you have managed to make the UART driver almost 5x faster.