{"id":4799,"date":"2019-06-11T20:41:37","date_gmt":"2019-06-12T03:41:37","guid":{"rendered":"https:\/\/visualgdb.com\/w\/?p=4799"},"modified":"2020-05-05T17:22:24","modified_gmt":"2020-05-06T00:22:24","slug":"using-the-dma-controller-on-stm32-devices","status":"publish","type":"post","link":"https:\/\/visualgdb.com\/tutorials\/arm\/stm32\/dma\/","title":{"rendered":"Using the DMA Controller on STM32 Devices"},"content":{"rendered":"<p>This tutorial shows how to use the DMA controller on the STM32 devices, letting it perform background memory operations without consuming any CPU cycles. We will show how to use DMA to copy data between different buffers in RAM and also between RAM and the peripherals.<\/p>\n<p>Before you begin, install VisualGDB 5.4 or later.<\/p>\n<ol>\n<li>Start Visual Studio and open the VisualGDB Embedded Project Wizard:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/01-newprj-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4800\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/01-newprj-5.png\" alt=\"\" width=\"880\" height=\"555\" \/><\/a><\/li>\n<li>On the first page of the wizard select &#8220;Embedded Binary -&gt; MSBuild&#8221; and press &#8220;Next&#8221; to proceed to the next page:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/02-msbuild.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4801\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/02-msbuild.png\" alt=\"\" width=\"886\" height=\"693\" \/><\/a><\/li>\n<li>On the Device Selection page pick the ARM toolchain and select your device from the list. In this tutorial we will use the Nucleo-F410RB board, so we select the STM32F410RB device, however the steps shown in this tutorial will work for most of the other STM32 devices as well:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/03-library.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4802\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/03-library.png\" alt=\"\" width=\"886\" height=\"693\" \/><\/a><\/li>\n<li>In this tutorial we will add the DMA-related code from scratch, so select the most simple LEDBlink (HAL) tutorial on the next page:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/04-blink-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4803\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/04-blink-1.png\" alt=\"\" width=\"886\" height=\"693\" \/><\/a><\/li>\n<li>Finally, choose the debug settings that will work with your board. For most of the STM32 Discovery and Nucleo boards, simply connect the board to a USB port and VisualGDB will detect the settings automatically. Once the debug settings are configured, press &#8220;Finish&#8221; to generate a basic project.<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/05-debug-4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4804\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/05-debug-4.png\" alt=\"\" width=\"886\" height=\"693\" \/><\/a><\/li>\n<li>Once the project has been created, ensure you can debug and and take a note of the <strong>SystemCoreClock<\/strong> variable after the call to <strong>HAL_Init()<\/strong> returns:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/06-clock.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4805\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/06-clock.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>One last step before we begin experimenting with various DMA modes would be to enable the Chronometer on the Embedded Debug Tweaking page of VisualGDB Project Properties (requires Custom edition or higher). It will automatically record the CPU cycles elapsed between different debug events, making it easier to understand the timings:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/07-chrono-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4811\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/07-chrono-1.png\" alt=\"\" width=\"942\" height=\"669\" \/><\/a><strong>Warning: Ensure you enable the Chronometer for both Debug and Release configurations, as we will be using both of them in this tutorial.<\/strong><\/li>\n<li>Now we will create a basic program that will run 3 memory-intense operations:\n<ul>\n<li>Fill a memory buffer with a sequence of numbers: N<sub>i<\/sub> = i * 3.<\/li>\n<li>Copy the contents of the buffer to another buffer.<\/li>\n<li>Fill the third buffer with the first 1024 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fibonacci_number\">Fibonacci numbers.<\/a><\/li>\n<\/ul>\n<p>Then we will and will then show how using the DMA improves its performance. Replace the main() function in your project with the following code:<\/p>\n<pre class=\"\">#include &lt;memory.h&gt;\r\n\r\nstatic int s_Buffer1[1024], s_Buffer2[1024], s_Buffer3[1024];\r\n\r\n#define ARRAY_SIZE(x) (sizeof(x) \/ sizeof((x)[0]))\r\n\r\nvoid __attribute__((noinline)) FillMemory()\r\n{\r\n    for (int i = 0; i &lt; ARRAY_SIZE(s_Buffer1); i++)\r\n        s_Buffer1[i] = i * 3;\r\n}\r\n\r\nvoid __attribute__((noinline)) CopyMemory()\r\n{\r\n    memcpy(s_Buffer2, s_Buffer1, sizeof(s_Buffer2));\r\n}\r\n\r\nvoid __attribute__((noinline)) CalculateFibonacci()\r\n{\r\n    s_Buffer3[0] = 0;\r\n    s_Buffer3[1] = 1;\r\n\r\n    for (int i = 2; i &lt; ARRAY_SIZE(s_Buffer3); i++)\r\n    {\r\n        s_Buffer3[i] = s_Buffer3[i - 1] + s_Buffer3[i - 2];\r\n    }\r\n}\r\n\r\nint main(void)\r\n{\r\n    HAL_Init();\r\n\r\n    FillMemory();\r\n    CopyMemory();\r\n    CalculateFibonacci();\r\n\r\n    volatile int x = s_Buffer2[4];\r\n}\r\n<\/pre>\n<\/li>\n<li>To minimize the impact of unoptimized code on the measured numbers, switch the active configuration to Release, then set a breakpoint at the call to FillMemory() and start debugging:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/08-step1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4807\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/08-step1.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>Once the breakpoint hits, step over the <strong>FillMemory()<\/strong>, <strong>CopyMemory()<\/strong> and <strong>CalculateFibonacci()<\/strong> calls and check the Chronometer window for the timings:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/09-timings.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4808\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/09-timings.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a>In this example, filling the first and the third buffer took the exactly the same time, while copying the first buffer to the second one took slightly less time:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4812\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing1.png\" alt=\"\" width=\"1553\" height=\"339\" \/><\/a><\/li>\n<li>While the DMA cannot be used to compute Fibonacci numbers, or initialize arrays with non-constant values, it can be used for copying data between 2 memory locations. We will now demonstrate this by replacing the <strong>CopyMemory()<\/strong> function with the following one:\n<pre class=\"\">#include &lt;stm32f4xx_hal_dma.h&gt;\r\n\r\nstatic DMA_HandleTypeDef s_DMAHandle;\r\n\r\nvoid __attribute__((noinline)) CopyMemoryWithDMA()\r\n{\r\n    s_DMAHandle.Instance = DMA2_Stream0;\r\n    s_DMAHandle.Init.Channel = DMA_CHANNEL_0;\r\n\r\n    s_DMAHandle.Init.Direction = DMA_MEMORY_TO_MEMORY;\r\n    s_DMAHandle.Init.PeriphInc = DMA_PINC_ENABLE;\r\n    s_DMAHandle.Init.MemInc = DMA_MINC_ENABLE;\r\n    s_DMAHandle.Init.Mode = DMA_NORMAL;\r\n    s_DMAHandle.Init.Priority = DMA_PRIORITY_VERY_HIGH;\r\n\r\n    s_DMAHandle.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;\r\n    s_DMAHandle.Init.MemDataAlignment = DMA_MDATAALIGN_WORD;\r\n\r\n    s_DMAHandle.Init.FIFOMode = DMA_FIFOMODE_ENABLE;\r\n    s_DMAHandle.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_HALFFULL;\r\n\r\n    __DMA2_CLK_ENABLE();\r\n\r\n    HAL_StatusTypeDef status = HAL_DMA_Init(&amp;s_DMAHandle);\r\n    if (status != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    status = HAL_DMA_Start(&amp;s_DMAHandle, (uint32_t)s_Buffer1, (uint32_t)s_Buffer2, sizeof(s_Buffer1) \/ sizeof(s_Buffer1[0]));\r\n    if (status != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    HAL_DMA_PollForTransfer(&amp;s_DMAHandle, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);\r\n}\r\n<\/pre>\n<p>The function above enables the DMA controller #2 (according to the STM32F410RB documentation, DMA#1 cannot be used for memory-to-memory transfers), and performs a single memory-to-memory transfer operation between <strong>s_Buffer1<\/strong> and <strong>s_Buffer2<\/strong>.<\/li>\n<li>Run the new program and step over the function calls in <strong>main()<\/strong> to obtain the updated timings:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/10-dma2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4809\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/10-dma2.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a>Using DMA instead actually required more time than calling <strong>memcpy()<\/strong>, due to the initial setup (20 uS extra), however the CPU spent most of that time looping inside\u00a0<strong>HAL_DMA_PollForTransfer()<\/strong> waiting for the DMA transfer to finish.<\/li>\n<li>As the DMA transfers do not actually involve the CPU, we can easily change the program to compute the Fibonacci numbers in parallel with the DMA transfer:\n<pre class=\"\">    status = HAL_DMA_Start(&amp;s_DMAHandle, (uint32_t)s_Buffer1, (uint32_t)s_Buffer2, sizeof(s_Buffer1) \/ 4);\r\n    if (status != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    CalculateFibonacci();\r\n\r\n    HAL_DMA_PollForTransfer(&amp;s_DMAHandle, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);<\/pre>\n<p>&nbsp;<\/li>\n<li>Run the modified version of the program and observe the new timings:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/11-parallel.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4810\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/11-parallel.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>Now the DMA operation ran in parallel with the CalculateFibonacci() function, reducing the overall program time by 21%:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4814\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing2.png\" alt=\"\" width=\"1553\" height=\"341\" \/><\/a><\/li>\n<li>The DMA can be especially useful to optimize data transfers between the memory and various on-chip peripherals. E.g. we can modify the example above to output the contents of <strong>s_Buffer1<\/strong> via the UART interface. Before we can do that, add the following function to your program to initialize the on-board UART peripheral:\n<pre class=\"\">#include &lt;stm32f4xx_hal_uart.h&gt;\r\nstatic UART_HandleTypeDef s_UARTHandle;\r\n\r\nstatic void SetupUART()\r\n{\r\n    __USART2_CLK_ENABLE();\r\n    s_UARTHandle.Instance = USART2;\r\n\r\n    s_UARTHandle.Init.BaudRate = 115200;\r\n    s_UARTHandle.Init.WordLength = UART_WORDLENGTH_8B;\r\n    s_UARTHandle.Init.StopBits = UART_STOPBITS_1;\r\n    s_UARTHandle.Init.Parity = UART_PARITY_NONE;\r\n    s_UARTHandle.Init.HwFlowCtl = UART_HWCONTROL_NONE;\r\n    s_UARTHandle.Init.Mode = UART_MODE_TX_RX;\r\n    s_UARTHandle.Init.OverSampling = UART_OVERSAMPLING_16;\r\n\r\n    if (HAL_UART_Init(&amp;s_UARTHandle) != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    GPIO_InitTypeDef GPIO_InitStruct;\r\n    __GPIOA_CLK_ENABLE();\r\n\r\n    GPIO_InitStruct.Pin = GPIO_PIN_2;\r\n    GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;\r\n    GPIO_InitStruct.Pull = GPIO_PULLUP;\r\n    GPIO_InitStruct.Speed = GPIO_SPEED_FAST;\r\n    GPIO_InitStruct.Alternate = GPIO_AF7_USART2;\r\n\r\n    HAL_GPIO_Init(GPIOA, &amp;GPIO_InitStruct);\r\n\r\n    GPIO_InitStruct.Pin = GPIO_PIN_3;\r\n    GPIO_InitStruct.Alternate = GPIO_AF7_USART2;\r\n\r\n    HAL_GPIO_Init(GPIOA, &amp;GPIO_InitStruct);\r\n}<\/pre>\n<p>Then call it from main() and try outputting a test string by calling HAL_UART_Transmit():<\/p>\n<pre class=\"\">    SetupUART();\r\n    uint8_t test[] = \"test\\n\";\r\n    HAL_UART_Transmit(&amp;s_UARTHandle, test, 4, HAL_MAX_DELAY);<\/pre>\n<p>Note that if you are using a different board than Nucleo-STM32F410RB, you may need to use a different UART (the one that is actually connected to the on-board ST-Link&#8217;s COM port) and different GPIO pins. You can find out the UART\/GPIO configuration for your board by cloning one of the ST&#8217;s UART examples via the VisualGDB Embedded Project Wizard.<\/li>\n<li>Verify that the &#8220;test&#8221; output is printed to the COM port:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/12-com.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4816\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/12-com.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>If you step into the <strong>HAL_UART_Transmit()<\/strong> function, you will see that the data is sent to the UART peripheral by writing it byte-by-byte into the <strong>UART2-&gt;DR<\/strong> register:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/13-stepin.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4817\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/13-stepin.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>The DMA could do exactly that &#8211; copy a given buffer to the UART1-&gt;DR register byte-by-byte, although there would be several important differences compared to the memory-to-memory operation:\n<ol>\n<li>We cannot use any arbitrary DMA controller. Instead, we need to pick the controller, stream and channel that are connected to the UART2 TX function. This will ensure that the DMA controller will not start transferring another byte until the UART controller is ready to accept it (i.e. has finished physically transmitting the previous one).<\/li>\n<li>The DMA mode will need to be changed from <strong>DMA_MEMORY_TO_MEMORY<\/strong> to <strong>DMA_MEMORY_TO_PERIPH<\/strong>.<\/li>\n<li>Unlike the memory-to-memory transfer, where we need to move the write pointer after each transferred word (to avoid overwriting the previous one), the memory-to-UART transfers should always end up at the same address (address of <strong>UART1-&gt;DR<\/strong>). This is achieved by changing <strong>DMA_PINC_ENABLE <\/strong>to <strong>DMA_PINC_DISABLE<\/strong>.<\/li>\n<li>Because the UART transfers one byte at a time, both peripheral and memory data alignment need to be set at <strong>BYTE<\/strong> instead of <strong>WORD<\/strong>.<\/li>\n<\/ol>\n<p>You can find out the DMA stream and channel connected to the UART peripheral in the <strong>DMAx request mapping<\/strong> section of your STM32 device&#8217;s reference manual (not datasheet):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/port.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4818\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/port.png\" alt=\"\" width=\"880\" height=\"341\" \/><\/a><\/p>\n<p>The updated <strong>CopyMemoryWithDMA()<\/strong> function outputing the buffer contents to UART will look as follows:<\/p>\n<pre class=\"\">void __attribute__((noinline)) CopyMemoryWithDMAAndCalclateFibonacci()\r\n{\r\n    s_DMAHandle.Instance = DMA1_Stream6;\r\n    s_DMAHandle.Init.Channel = DMA_CHANNEL_4;\r\n\r\n    s_DMAHandle.Init.Direction = DMA_MEMORY_TO_PERIPH;\r\n    s_DMAHandle.Init.PeriphInc = DMA_PINC_DISABLE;\r\n    s_DMAHandle.Init.MemInc = DMA_MINC_ENABLE;\r\n    s_DMAHandle.Init.Mode = DMA_NORMAL;\r\n    s_DMAHandle.Init.Priority = DMA_PRIORITY_VERY_HIGH;\r\n\r\n    s_DMAHandle.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;\r\n    s_DMAHandle.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;\r\n\r\n    s_DMAHandle.Init.FIFOMode = DMA_FIFOMODE_ENABLE;\r\n    s_DMAHandle.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_HALFFULL;\r\n\r\n    __DMA1_CLK_ENABLE();\r\n\r\n    HAL_StatusTypeDef status = HAL_DMA_Init(&amp;s_DMAHandle);\r\n    if (status != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    status = HAL_DMA_Start(&amp;s_DMAHandle, (uint32_t)s_Buffer1, (uint32_t)&amp;USART2-&gt;DR, sizeof(s_Buffer1));\r\n    if (status != HAL_OK)\r\n        asm(\"bkpt 255\");\r\n\r\n    SET_BIT(USART2-&gt;CR3, USART_CR3_DMAT);\r\n    HAL_DMA_PollForTransfer(&amp;s_DMAHandle, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);\r\n}\r\n<\/pre>\n<\/li>\n<li>Run the new code and verify that the data transferred to the COM port matches the contents of <strong>s_Buffer1<\/strong>: <a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/14-bufs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4820\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/14-bufs.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<li>We can also replace the call to <strong>HAL_DMA_Start()<\/strong> that manually specifies the address of <strong>USART2-&gt;DR<\/strong> and the manual update of the <strong>USART2-&gt;CR3<\/strong> register with a higher-level call to <strong>HAL_UART_Transmit_DMA()<\/strong> that will do the necessary setup automatically:\n<pre class=\"\">    __HAL_LINKDMA(&amp;s_UARTHandle, hdmatx, s_DMAHandle);\r\n    status = HAL_UART_Transmit_DMA(&amp;s_UARTHandle, (uint8_t *)s_Buffer1, sizeof(s_Buffer1));<\/pre>\n<\/li>\n<li>Finally, we will show how to use the DMA in a scenario that is often impossible with CPU-driven data transfers &#8211; generating (or capturing) uninterrupted streams of data. We will update our example to continuously output the stream of Fibonacci numbers to UART without making any breaks by handling the DMA half-transfer interrupts. The DMA controller will be used to repeatedly transfer the same buffer to the UART peripheral over and over, while the CPU will be computing the next batch of values and placing them into the part of the buffer that has already been transferred:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4822\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/timing-1.png\" alt=\"\" width=\"1695\" height=\"1004\" \/><\/a><\/li>\n<li>To do this, change the DMA mode assigned to s_DMAHandle.Init.Mode from <strong>DMA_NORMAL<\/strong> to <strong>DMA_CIRCULAR<\/strong>, enable the interrupt for the DMA channel you are using (in this example, by calling <strong>HAL_NVIC_EnableIRQ(DMA1_Stream6_IRQn)<\/strong>) and add the following handlers to your main source file:\n<pre class=\"\">extern \"C\" {\r\n    void DMA1_Stream6_IRQHandler()\r\n    {\r\n        HAL_DMA_IRQHandler(&amp;s_DMAHandle);\r\n    }\r\n\r\n    void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart)\r\n    {\r\n        asm(\"nop\");\r\n    }\r\n\r\n    void HAL_UART_TxHalfCpltCallback(UART_HandleTypeDef *huart)\r\n    {\r\n        asm(\"nop\");\r\n    }\r\n}<\/pre>\n<\/li>\n<li>Set a breakpoint in <strong>HAL_UART_TxHalfCpltCallback()<\/strong>, let it trigger and check the call stack:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/15-bkpt.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4823\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/15-bkpt.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a>As the call stack shows, the DMA controller has raised an IRQ that invoked the <strong>DMA1_Stream6_IRQHandler()<\/strong> we created. In turn, our IRQ handler invoked the standard <strong>HAL_DMA_IRQHandler()<\/strong> function that sorted it out as a &#8220;Half of DMA buffer transferred&#8221; event, invoking the corresponding handler in the UART driver that finally called our <strong>HAL_UART_TxHalfCpltCallback() <\/strong>function.<\/li>\n<li>We can now modify the <strong>HAL_UART_TxHalfCpltCallback()<\/strong> and <strong>HAL_UART_TxCpltCallback()<\/strong> to fill the second and first half of <strong>s_Buffer1<\/strong> respectively with the next batch of Fibonacci numbers. First of all, add an UpdateFibonacci() function shown below:\n<pre class=\"\">void __attribute__((noinline)) UpdateFibonacci(int *pBuf, size_t count)\r\n{\r\n    static int s_Num1 = 0, s_Num2 = 1;\r\n    \r\n    int tmp1 = s_Num1, tmp2 = s_Num2;\r\n\r\n    for (int i = 0; i &lt; (count - 1); i+= 2)\r\n    {\r\n        pBuf[i] = tmp1;\r\n        pBuf[i + 1] = tmp2;\r\n\r\n        tmp1 += tmp2;\r\n        tmp2 += tmp1;\r\n    }\r\n\r\n    s_Num1 = tmp1;\r\n    s_Num2 = tmp2;\r\n}<\/pre>\n<p>Then, update the UART callbacks as follows:<\/p>\n<pre class=\"\">    void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart)\r\n    {\r\n        UpdateFibonacci(s_Buffer1, sizeof(s_Buffer1) \/ sizeof(s_Buffer1[0]) \/ 2);\r\n    }\r\n\r\n    void HAL_UART_TxHalfCpltCallback(UART_HandleTypeDef *huart)\r\n    {\r\n        const int ElementCount = sizeof(s_Buffer1) \/ sizeof(s_Buffer1[0]);\r\n        UpdateFibonacci(s_Buffer1 + ElementCount \/ 2, ElementCount \/ 2);\r\n    }<\/pre>\n<\/li>\n<li>Now you can run the final version of the project and observe an continuous stream of numbers being sent to the COM port:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/16-nonstop.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4824\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2019\/06\/16-nonstop.png\" alt=\"\" width=\"1131\" height=\"807\" \/><\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial shows how to use the DMA controller on the STM32 devices, letting it perform background memory operations without<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[180,61],"_links":{"self":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/4799"}],"collection":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/comments?post=4799"}],"version-history":[{"count":4,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/4799\/revisions"}],"predecessor-version":[{"id":6004,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/4799\/revisions\/6004"}],"wp:attachment":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media?parent=4799"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/categories?post=4799"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/tags?post=4799"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}