{"id":2149,"date":"2016-11-09T10:28:54","date_gmt":"2016-11-09T18:28:54","guid":{"rendered":"http:\/\/visualgdb.com\/w\/?p=2149"},"modified":"2024-02-28T16:21:25","modified_gmt":"2024-02-29T00:21:25","slug":"optimizing-stm32-usb-performance-with-real-time-watch","status":"publish","type":"post","link":"https:\/\/visualgdb.com\/tutorials\/profiler\/realtime\/usb\/","title":{"rendered":"Optimizing STM32 USB performance with Real-time watch"},"content":{"rendered":"<p>This tutorial shows how to analyze and optimize the performance of a USB device based on the STM32 microcontroller using the VisualGDB real-time watch feature.<\/p>\n<p>We will create a basic firmware that receives the data over USB and sends it back in chunks, will measure the throughput and use the real-time watch to analyze what exactly happens on the device and how to improve the USB performance.<\/p>\n<p>Before you begin, install VisualGDB 5.2 or later.<\/p>\n<ol>\n<li>Start Visual Studio and open the VisualGDB Embedded Project Wizard:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/01-prjname4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2177\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/01-prjname4.png\" alt=\"01-prjname\" width=\"843\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/01-prjname4.png 843w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/01-prjname4-300x220.png 300w\" sizes=\"(max-width: 843px) 100vw, 843px\" \/><\/a><\/li>\n<li>Proceed with the default &#8220;Embedded binary&#8221; setting:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/02-binary.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2178\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/02-binary.png\" alt=\"02-binary\" width=\"766\" height=\"609\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/02-binary.png 766w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/02-binary-300x239.png 300w\" sizes=\"(max-width: 766px) 100vw, 766px\" \/><\/a><\/li>\n<li>Select the ARM toolchain and choose your device. In this tutorial we will use the STM32F4Discovery board that has the STM32F407VG chip:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/03-device2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2179\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/03-device2.png\" alt=\"03-device\" width=\"765\" height=\"609\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/03-device2.png 765w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/03-device2-300x239.png 300w\" sizes=\"(max-width: 765px) 100vw, 765px\" \/><\/a><\/li>\n<li>On the Sample Selection page choose the &#8220;USB Communications Device&#8221; sample and press &#8220;Next&#8221;:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/04-usbcomm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2180\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/04-usbcomm.png\" alt=\"04-usbcomm\" width=\"768\" height=\"609\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/04-usbcomm.png 768w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/04-usbcomm-300x238.png 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/a><\/li>\n<li>Finally select your debugging method. The easiest way to get it to work is to select OpenOCD, plug in your board and click &#8220;Detect&#8221; to automatically detect the necessary settings:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/05-debug2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2181\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/05-debug2.png\" alt=\"05-debug\" width=\"768\" height=\"609\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/05-debug2.png 768w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/05-debug2-300x238.png 300w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/a><\/li>\n<li>Press &#8220;Finish&#8221; to generate your project. Then replace the loop inside main() with the following loop:\n<pre class=\"\">\u00a0\u00a0\u00a0 char buffer[4096];\r\n\u00a0\u00a0 \u00a0for (;;)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int total = 0;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0while (total &lt; sizeof(buffer))\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int done = VCP_read(buffer + total, sizeof(buffer) - total);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0total += done;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (VCP_write(buffer, total) != total)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0asm(\"bkpt 255\");\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0}<\/pre>\n<p>This code will read data in 4KB chunks and immediately send it back. For simplicity we don&#8217;t apply any transformation to the received data and just send it back as is.<\/li>\n<li>Before you can try out the code, go to the definition of VCP_write() and check that the loop at the beginning of it looks as shown below. If not, correct it:\n<pre class=\"\">\u00a0\u00a0\u00a0 if (size &gt; CDC_DATA_HS_OUT_PACKET_SIZE)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0for (int offset = 0; offset &lt; size; )\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int todo = MIN(CDC_DATA_HS_OUT_PACKET_SIZE,\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0size - offset);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int done = VCP_write(((char *)pBuffer) + offset, todo);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (done != todo)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return offset + done;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0offset += done;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return size;\r\n\u00a0\u00a0 \u00a0}<\/pre>\n<\/li>\n<li>Switch the configuration to &#8216;release&#8217;, build it and run with F5:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/07-main.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2182\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/07-main.png\" alt=\"07-main\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/07-main.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/07-main-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/07-main-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Locate the USB device in the Device Manager and note its COM port number:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/devicemgr.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2205\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/devicemgr.png\" alt=\"devicemgr\" width=\"779\" height=\"439\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/devicemgr.png 779w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/devicemgr-300x169.png 300w\" sizes=\"(max-width: 779px) 100vw, 779px\" \/><\/a><\/li>\n<li>We will use the following benchmark program to measure the USB performance. Build it in another Visual Studio instance and run it:\n<pre class=\"\">using System;\r\nusing System.IO.Ports;\r\nusing System.Threading;\r\n\r\nnamespace USBBenchmark\r\n{\r\n\u00a0\u00a0\u00a0 class Program\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 static void Main(string[] args)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SerialPort port = new SerialPort(\"COM6\");\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 port.Open();\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 var buf = new byte[65536];\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (int i = 0; i &lt; buf.Length; i++)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 buf[i] = (byte)i;\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 new Thread(() =&gt; ReadThread(port)).Start();\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DateTime start = DateTime.Now;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 long bytesWritten = 0;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (;;)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 port.Write(buf, 0, buf.Length);\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 bytesWritten += buf.Length;\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double msec = (DateTime.Now - start).TotalMilliseconds;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double bytesPerSecond = (bytesWritten * 1000) \/ msec;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double kbPerSecond = bytesPerSecond \/ 1024;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Console.Write($\"\\rWritten: {bytesWritten \/ 1024}KB; time: {msec \/ 1000:f1} sec; Average speed: {kbPerSecond:f0} KB\/sec\");\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 static void ReadThread(SerialPort port)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 var buf = new byte[65536];\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 for (;;)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int done = port.Read(buf, 0, buf.Length);\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0 }\r\n}<\/pre>\n<\/li>\n<li>Note down the speed it shows. In this example we have measured 309 KB\/sec:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/08-benchmark.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2183\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/08-benchmark.png\" alt=\"08-benchmark\" width=\"599\" height=\"253\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/08-benchmark.png 599w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/08-benchmark-300x127.png 300w\" sizes=\"(max-width: 599px) 100vw, 599px\" \/><\/a><\/li>\n<li>Now we will use the real-time watch to obtain some measurements and check for optimization possibilities. Before we do that, do a quick check that the oscillator speed specified in the HAL configuration file matches the one on your board. The easiest way to verify it is to insert the following loop in <strong>main()<\/strong> and check that the value of <strong>g_Counter<\/strong> increases exactly each second:\n<pre class=\"\">\u00a0\u00a0\u00a0 for (;;)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0HAL_Delay(1000);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0g_Counter++;\r\n\u00a0\u00a0 \u00a0}<\/pre>\n<\/li>\n<li>For STM32F4Discovery the default value in <strong>stm32f4x_hal_config.h<\/strong> is incorrect and the 1-second delay will actually take 3 seconds:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/09-speed.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2184\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/09-speed.png\" alt=\"09-speed\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/09-speed.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/09-speed-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/09-speed-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Adjust the value if needed (on STM32F4Discovery the HSE_VALUE should be set to 8 MHz):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/10-hse.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2185\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/10-hse.png\" alt=\"10-hse\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/10-hse.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/10-hse-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/10-hse-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Now we can finally take some measurements. Go to the Dynamic Analysis page of VisualGDB Project Properties and check the &#8220;Allow tracing function calls&#8221; checkbox:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/11-profiler.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2186\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/11-profiler.png\" alt=\"11-profiler\" width=\"1179\" height=\"777\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/11-profiler.png 1179w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/11-profiler-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/11-profiler-1024x675.png 1024w\" sizes=\"(max-width: 1179px) 100vw, 1179px\" \/><\/a>If you have not referenced the profiler framework before, click the &#8220;Add Reference Manually&#8221; link and build your project.<\/li>\n<li>Run the project again and measure the speed. Instrumenting the functions slows them down, so the overall performance will reduce by around 10%. When you compare different measurements when instrumentation turned on, ensure you use the reduced value as your baseline:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/12-bench2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2187\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/12-bench2.png\" alt=\"12-bench2\" width=\"781\" height=\"394\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/12-bench2.png 781w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/12-bench2-300x151.png 300w\" sizes=\"(max-width: 781px) 100vw, 781px\" \/><\/a><\/li>\n<li>The first thing we will measure is the timing of VCP_read() and VCP_write(). Stop your program once the benchmark is running and, those functions to the real-time watch window and resume your program to collect some data. VisualGDB will show many calls to <strong>VCP_read()<\/strong> and a few calls to <strong>VCP_write()<\/strong> interrupted by profiling buffer overflows:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/14-overflows.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2189\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/14-overflows.png\" alt=\"14-overflows\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/14-overflows.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/14-overflows-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/14-overflows-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>The buffer overflows happen when too much real-time data is captured and the program needs to wait for VisualGDB to read it. Most of the time <strong>VCP_read()<\/strong> immediately returns 0, so it is generating huge amounts of real-time data and quickly fills the buffer. In order to get a precise picture of what is happening inside the loop we need to capture at least one one entire loop (i.e. the time between 2 calls to <strong>VCP_write()<\/strong>) without any buffer overflows. We can do this by either reducing the amount of measured data (e.g. by measuring some block inside VCP_read() that does not get invoked too often) or by simply increasing the buffer size. In this example we will use the second approach and increase the buffer size to 32KB:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/15-buffer.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2190\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/15-buffer.png\" alt=\"15-buffer\" width=\"1179\" height=\"777\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/15-buffer.png 1179w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/15-buffer-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/15-buffer-1024x675.png 1024w\" sizes=\"(max-width: 1179px) 100vw, 1179px\" \/><\/a><\/li>\n<li>Build and run your program, start the benchmarking and put a breakpoint at the beginning of the loop. Then add <strong>VCP_read()<\/strong> and <strong>VCP_write()<\/strong> to real-time watch and enable the &#8216;stop on overflow&#8217; option:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/16-autostop.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2191\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/16-autostop.png\" alt=\"16-autostop\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/16-autostop.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/16-autostop-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/16-autostop-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Then resume your program and wait until it stops on an overflow event. Now we have enough data to see what is going on. It quickly becomes obvious from the graph that the <strong>VCP_write()<\/strong> function does not do any buffering and the next loop iteration does not start until the write() returns:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/17-reads.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2192\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/17-reads.png\" alt=\"17-reads\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/17-reads.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/17-reads-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/17-reads-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>You could zoom in to see that most of the time <strong>VCP_read()<\/strong> returns immediately and sometimes it takes more time (when it has data to return):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/18-zoomread.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2193\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/18-zoomread.png\" alt=\"18-zoomread\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/18-zoomread.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/18-zoomread-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/18-zoomread-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>The time between 2 calls to <strong>VCP_write()<\/strong> should be a good measure of how much time it takes to transfer 4 KB of data. In our example it was 14 msec that corresponds to 4K * 1000 \/ 14 = 285 KB\/sec that is 2% more than the average throughput measured before (slight variance in measurements is normal due to USB bus events):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/19-cycle.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2194\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/19-cycle.png\" alt=\"19-cycle\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/19-cycle.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/19-cycle-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/19-cycle-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>The easiest way to see what is happening on the USB bus is to modify the USB interrupt handler to generate custom events each time a &#8216;read complete&#8217; or &#8216;write complete&#8217; interrupt arrives:\n<pre class=\"\">#include &lt;CustomRealTimeWatches.h&gt;\r\nEventStreamWatch g_IN, g_OUT;\r\n\r\nvoid OTG_FS_IRQHandler(void)\r\n{\r\n\u00a0\u00a0 \u00a0if (__HAL_PCD_GET_FLAG(&amp;hpcd, USB_OTG_GINTSTS_IEPINT))\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0g_IN.ReportEvent(\"in\");\r\n\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0if (__HAL_PCD_GET_FLAG(&amp;hpcd, USB_OTG_GINTSTS_OEPINT))\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0g_OUT.ReportEvent(\"out\");\r\n\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0HAL_PCD_IRQHandler(&amp;hpcd);\r\n}<\/pre>\n<\/li>\n<li>Build the new code and add <strong>g_IN<\/strong> and <strong>g_OUT<\/strong> to the real-time watches. This confirms that reading and writing does not happen in opening and suggests a way to optimize it:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/21-events.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2195\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/21-events.png\" alt=\"21-events\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/21-events.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/21-events-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/21-events-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>If you look into the <strong>VCP_write()<\/strong> function, you will see that it waits for the <strong>TxState<\/strong> field of the CDC state structure to be 0 before it returns. Quickly checking for references to TxState (write references are shown in read) shows that it&#8217;s assigned from the <strong>USB_CDC_DataIn()<\/strong> function that is called when a USB packet is successfully sent:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/22-datain.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2196\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/22-datain.png\" alt=\"22-datain\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/22-datain.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/22-datain-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/22-datain-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>We will now modify the <strong>VCP_write()<\/strong> function to be asynchronous, i.e. to start a multi-packet transfer and return immediately.\u00a0 First of all, add a TxTotalRemainingLength field to USBD_CDC_HandleTypeDef:\n<pre class=\"\">\u00a0\u00a0\u00a0 typedef struct\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint32_t data[CDC_DATA_HS_MAX_PACKET_SIZE \/ 4];\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint8_t\u00a0 CmdOpCode;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint8_t\u00a0 CmdLength;\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint8_t\u00a0 *RxBuffer; \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint8_t\u00a0 *TxBuffer;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint32_t RxLength;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint32_t TxLength;\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0uint32_t TxTotalRemainingLength;\r\n\u00a0 \r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0__IO uint32_t TxState;\u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0__IO uint32_t RxState;\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0USBD_CDC_HandleTypeDef;<\/pre>\n<p>Then change USBD_CDC_SetTxBuffer() as follows:<\/p>\n<pre class=\"\">uint8_t\u00a0 USBD_CDC_SetTxBuffer(USBD_HandleTypeDef\u00a0\u00a0 *pdev,\r\n\u00a0\u00a0 \u00a0uint8_t\u00a0 *pbuff,\r\n\u00a0\u00a0 \u00a0int packet_length,\r\n\u00a0\u00a0 \u00a0int total_length)\r\n{\r\n\u00a0\u00a0 \u00a0USBD_CDC_HandleTypeDef\u00a0\u00a0 *hcdc = (USBD_CDC_HandleTypeDef*) pdev-&gt;pClassData;\r\n\u00a0 \r\n\u00a0\u00a0 \u00a0hcdc-&gt;TxBuffer = pbuff;\r\n\u00a0\u00a0 \u00a0hcdc-&gt;TxLength = MIN(packet_length, total_length);\r\n\u00a0\u00a0 \u00a0hcdc-&gt;TxTotalRemainingLength = total_length;\r\n\u00a0 \r\n\u00a0\u00a0 \u00a0return USBD_OK; \u00a0\r\n}<\/pre>\n<p>Then update USBD_CDC_DataIn() to immediately start another packet it TxTotalRemainingLength is not 0:<\/p>\n<pre class=\"\">static uint8_t\u00a0 USBD_CDC_DataIn(USBD_HandleTypeDef *pdev, uint8_t epnum)\r\n{\r\n\u00a0\u00a0 \u00a0USBD_CDC_HandleTypeDef\u00a0\u00a0 *hcdc = (USBD_CDC_HandleTypeDef*) pdev-&gt;pClassData;\r\n\u00a0 \r\n\u00a0\u00a0 \u00a0if (pdev-&gt;pClassData != NULL)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0hcdc-&gt;TxTotalRemainingLength -= hcdc-&gt;TxLength;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0hcdc-&gt;TxBuffer += hcdc-&gt;TxLength;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (hcdc-&gt;TxTotalRemainingLength)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (hcdc-&gt;TxLength &gt; hcdc-&gt;TxTotalRemainingLength)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0hcdc-&gt;TxLength = hcdc-&gt;TxTotalRemainingLength;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0hcdc-&gt;TxState = 2;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (USBD_CDC_TransmitPacket(pdev) != USBD_OK)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0asm(\"bkpt 255\");\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0else\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0hcdc-&gt;TxState = 0;\r\n\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return USBD_OK;\r\n\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0else\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return USBD_FAIL;\r\n\u00a0\u00a0 \u00a0}\r\n}<\/pre>\n<p>Finally rename VCP_write() to VCP_write_async() and change its contents as follows:<\/p>\n<pre class=\"\">int VCP_write_async(const void *pBuffer, int size)\r\n{\r\n\u00a0\u00a0 \u00a0USBD_CDC_HandleTypeDef *pCDC = (USBD_CDC_HandleTypeDef *)USBD_Device.pClassData;\r\n\u00a0\u00a0 \u00a0while (pCDC-&gt;TxState) {} \/\/Wait for previous transfer\r\n\r\n\u00a0\u00a0 \u00a0USBD_CDC_SetTxBuffer(&amp;USBD_Device, (uint8_t *)pBuffer, CDC_DATA_FS_OUT_PACKET_SIZE, size);\r\n\u00a0\u00a0 \u00a0if (USBD_CDC_TransmitPacket(&amp;USBD_Device) != USBD_OK)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return 0;\r\n\r\n\u00a0\u00a0 \u00a0return size;\r\n}<\/pre>\n<\/li>\n<li>Now we can change the loop inside main() to use 2 buffers: one for sending the previous data chunk and another one for receiving a new one:\n<pre class=\"\">    const int bufferSize = 4096;\r\n\u00a0\u00a0 \u00a0char buffer1[bufferSize], buffer2[bufferSize];\r\n\u00a0\u00a0 \u00a0for (int iter = 0;;iter++)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0char *pBuffer = (iter % 2) ? buffer1 : buffer2;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int total = 0;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0while (total &lt; bufferSize)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int done = VCP_read(pBuffer + total, bufferSize - total);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0total += done;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if (VCP_write_async(pBuffer, total) != total)\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0asm(\"bkpt 255\");\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0}<\/pre>\n<\/li>\n<li>Run the updated program and get a measurement of one full cycle in real-time watch:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/23-cycle2.png\"> <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2197\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/23-cycle2.png\" alt=\"23-cycle2\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/23-cycle2.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/23-cycle2-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/23-cycle2-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Reading and writing interrupts now overlap and the time between calls to VCP_write_async() was reduced to 11 msec that corresponds to 4K * 1000 \/ 11 = 363KB\/sec (~1.3x faster). Confirm this by pausing real-time watches and running the benchmark:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/24-speed2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2198\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/24-speed2.png\" alt=\"24-speed2\" width=\"705\" height=\"273\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/24-speed2.png 705w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/24-speed2-300x116.png 300w\" sizes=\"(max-width: 705px) 100vw, 705px\" \/><\/a>Note that running the benchmark with real-time watches active will result in much slower speed as each time the real-time data buffer is filled, the program is delayed until it is read by VisualGDB.<\/li>\n<li>Another observation clearly visible from the real-time watch window is that sending data to the computer (IN endpoint) takes less time than receiving it (OUT endpoint):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/25-imbalance.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2199\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/25-imbalance.png\" alt=\"25-imbalance\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/25-imbalance.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/25-imbalance-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/25-imbalance-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>If you compare the reading and writing code, you will see that we are sending data in 512-byte packets and receiving it in 64-byte ones. 512-byte packets supported in the USB 2.0 High Speed (but not in Full Speed mode), and are causing strange timing side effects. Adjust the <strong>VCP_write_async()<\/strong> function to use the 64-byte packets (CDC_DATA_FS_OUT_PACKET_SIZE):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/26-fspacket.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2200\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/26-fspacket.png\" alt=\"26-fspacket\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/26-fspacket.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/26-fspacket-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/26-fspacket-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>This balances reading and writing times and reduces the cycle time to ~8 msec:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/27-newcycle.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2201\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/27-newcycle.png\" alt=\"27-newcycle\" width=\"1124\" height=\"760\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/27-newcycle.png 1124w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/27-newcycle-300x203.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/27-newcycle-1024x692.png 1024w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/a><\/li>\n<li>Disable the function tracing on the Dynamic Analysis page to get the maximum possible speed and run the benchmark again:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/28-notrace.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2202\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/28-notrace.png\" alt=\"28-notrace\" width=\"1179\" height=\"777\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/28-notrace.png 1179w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/28-notrace-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/28-notrace-1024x675.png 1024w\" sizes=\"(max-width: 1179px) 100vw, 1179px\" \/><\/a><\/li>\n<li>In our measurements parallelizing reading and writing and balancing the IN\/OUT packets raised the speed to 445 KB\/sec (1.44x):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/29-fastest.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2203\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/29-fastest.png\" alt=\"29-fastest\" width=\"754\" height=\"273\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/29-fastest.png 754w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/29-fastest-300x109.png 300w\" sizes=\"(max-width: 754px) 100vw, 754px\" \/><\/a><\/li>\n<li>You can also quickly check how consistent is the transfer speed over time by adding a scalar real-time watch:\n<pre class=\"\">int g_BytesTransferred;\r\nScalarRealTimeWatch g_BytesTransferredWatch;\r\n\r\nint main(void)\r\n{\r\n\u00a0\u00a0 \u00a0\/\/...\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0const int bufferSize = 4096;\r\n\u00a0\u00a0 \u00a0char buffer1[bufferSize], buffer2[bufferSize];\r\n\u00a0\u00a0 \u00a0for (int iter = 0;;iter++)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0int total = 0;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\/\/...\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0g_BytesTransferred += total;\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0g_BytesTransferredWatch.ReportValue(g_BytesTransferred);\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\/\/...\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0}\r\n}<\/pre>\n<\/li>\n<li>Adding <strong>g_BytesTransferredWatch<\/strong> to the real-time watch will show how quickly the amount of transferred data grows over time:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/linear.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2206\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/linear.png\" alt=\"linear\" width=\"1234\" height=\"441\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/linear.png 1234w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/linear-300x107.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2016\/10\/linear-1024x366.png 1024w\" sizes=\"(max-width: 1234px) 100vw, 1234px\" \/><\/a>Measuring this will not introduce any significant slowdown as the real-time events will only be produced once per the 4KB block of data and VisualGDB will have plenty of time to process them without stopping your program.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial shows how to analyze and optimize the performance of a USB device based on the STM32 microcontroller using<\/p>\n","protected":false},"author":1,"featured_media":2207,"comment_status":"closed","ping_status":"closed","sticky":true,"template":"","format":"standard","meta":{"footnotes":""},"categories":[135],"tags":[109],"_links":{"self":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/2149"}],"collection":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/comments?post=2149"}],"version-history":[{"count":2,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/2149\/revisions"}],"predecessor-version":[{"id":2208,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/2149\/revisions\/2208"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media\/2207"}],"wp:attachment":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media?parent=2149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/categories?post=2149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/tags?post=2149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}