{"id":1208,"date":"2015-12-07T18:52:31","date_gmt":"2015-12-08T02:52:31","guid":{"rendered":"http:\/\/visualgdb.com\/w\/?p=1208"},"modified":"2015-12-07T19:06:16","modified_gmt":"2015-12-08T03:06:16","slug":"profiling-a-basic-stm32-application-with-visual-studio","status":"publish","type":"post","link":"https:\/\/visualgdb.com\/tutorials\/profiler\/embedded\/sampling\/","title":{"rendered":"Profiling a basic STM32 application with Visual Studio"},"content":{"rendered":"<p>This tutorial shows how to analyze the performance of an embedded application running on the STM32 board using the sampling profiler included in the Custom and Ultimate editions of VisualGDB 5.1 Preview 1. Before you begin, install the latest preview version of VisualGDB. Also check for updates via the Embedded Tools Manager as the profiling functionality requires the latest versions of the debug method packages.<\/p>\n<ol>\n<li>Start creating a new Embedded project using the VisualGDB Embedded Project Wizard:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/01-prj.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1209\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/01-prj.png\" alt=\"01-prj\" width=\"786\" height=\"471\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/01-prj.png 786w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/01-prj-300x180.png 300w\" sizes=\"(max-width: 786px) 100vw, 786px\" \/><\/a><\/li>\n<li>Select &#8220;Create a new project&#8221; -&gt; &#8220;Embedded binary&#8221;:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/02-prjtype.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1210\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/02-prjtype.png\" alt=\"02-prjtype\" width=\"688\" height=\"565\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/02-prjtype.png 688w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/02-prjtype-300x246.png 300w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/><\/a><\/li>\n<li>On the next page select the ARM toolchain and choose your device from the list. In this tutorial we will use the STM32F4-discovery board that has the STM32F407VG device:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/03-device.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1211\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/03-device.png\" alt=\"03-device\" width=\"688\" height=\"565\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/03-device.png 688w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/03-device-300x246.png 300w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/><\/a><\/li>\n<li>In this tutorial we will demonstrate profiling by studying the time spent on various tasks of a basic USB device, so we select the USB Communications Device. However the profiler will work with any other sample as well:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/04-sample.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1212\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/04-sample.png\" alt=\"04-sample\" width=\"688\" height=\"565\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/04-sample.png 688w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/04-sample-300x246.png 300w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/><\/a><\/li>\n<li>Select the OpenOCD as your debug method. Note that the profiler will only work with the debug methods that support live memory evaluation (OpenOCD and Segger J-Link):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/05-openocd.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1213\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/05-openocd.png\" alt=\"05-openocd\" width=\"688\" height=\"565\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/05-openocd.png 688w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/05-openocd-300x246.png 300w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/><\/a><\/li>\n<li>Press Finish to create your project. Then put a breakpoint somewhere in main() and hit F5 to build and start it. Ensure that the project can be debugged:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/06-debug.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1214\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/06-debug.png\" alt=\"06-debug\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/06-debug.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/06-debug-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>Now we will start analyzing the performance. Select Analyze-&gt;Analyze Performance with VisualGDB (in VS2005-2008 the command will be in the Debug menu instead):<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/07-analyze.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1215\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/07-analyze.png\" alt=\"07-analyze\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/07-analyze.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/07-analyze-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>VisualGDB will suggest installing and referencing the embedded profiling framework. Click &#8220;Yes&#8221;: <a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/08-confirm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1216\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/08-confirm.png\" alt=\"08-confirm\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/08-confirm.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/08-confirm-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>Before you can actually begin profiling is to initialize the profiler from your code. Include the &lt;SysprogsProfiler.h&gt; file from your main file and call the InitializeSamplingProfiler() function after the call to SystemClock_Config():\n<pre class=\"\">#include &lt;SysprogsProfiler.h&gt;\r\n\r\nint main(void)\r\n{\r\n\u00a0\u00a0 \u00a0HAL_Init();\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0SystemClock_Config();\r\n\u00a0\u00a0 \u00a0InitializeSamplingProfiler();\r\n    \/\/...\r\n}<\/pre>\n<p>Note that the <strong>InitializeSamplingProfiler()<\/strong> will not do anything unless you explicitly begin profiling using the corresponding command. Hence you can keep it in your code even when you are not profiling.<\/li>\n<li>Since it makes more sense to profile optimized release code, select the release configuration in the configuration manager and build your project:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/09-build.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1217\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/09-build.png\" alt=\"09-build\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/09-build.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/09-build-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>If you encounter any build errors, open VisualGDB Project Properties and ensure that the profiler framework is referenced:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/10-frameworks.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1218\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/10-frameworks.png\" alt=\"10-frameworks\" width=\"794\" height=\"640\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/10-frameworks.png 794w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/10-frameworks-300x242.png 300w\" sizes=\"(max-width: 794px) 100vw, 794px\" \/><\/a><\/li>\n<li>Now you can start your profiling session by selecting Analyze-&gt;Analyze Performance with VisualGDB. Proceed with the default settings:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/11-sample.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1219\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/11-sample.png\" alt=\"11-sample\" width=\"546\" height=\"394\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/11-sample.png 546w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/11-sample-300x216.png 300w\" sizes=\"(max-width: 546px) 100vw, 546px\" \/><\/a><\/li>\n<li>VisualGDB will load your program into the device and begin profiling. Observe the Live Profiling window showing the most frequently encountered functions:\u00a0 <a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/13-vcp-read.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1221\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/13-vcp-read.png\" alt=\"13-vcp-read\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/13-vcp-read.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/13-vcp-read-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>You can click the file icon on the toolbar to switch from the function-level view to the line-level view. Then VisualGDB will show specific lines in your source code that take the most time:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/14-linelevel.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1222\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/14-linelevel.png\" alt=\"14-linelevel\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/14-linelevel.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/14-linelevel-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>The sampling profiler works by configuring one of your hardware timers to fire periodic interrupts. At each interrupt it quickly determines the currently executing line of code and searches the stack for clues about the previous frames. Then it compresses this information and stores in an internal buffer that is later read by the debugger without stopping your program. The rate is automatically adjusted to minimize buffer overruns. The default implementation uses the TIM2 timer, however you can easily adjust it by changing the SAMPLING_PROFILER_TIMER_INSTANCE definition in the <strong>SamplingProfiler_&lt;platform name&gt;.cpp<\/strong> file:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/instance.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1229\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/instance.png\" alt=\"instance\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/instance.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/instance-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>Now we will do some more meaningful exploration using the profiler. Modify your main() function as follows:\n<pre class=\"\">\u00a0\u00a0\u00a0 char byte;\r\n\u00a0\u00a0 \u00a0while (VCP_read(&amp;byte, 1) != 1)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0}\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0 \u00a0for (;;)\r\n\u00a0\u00a0 \u00a0{\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0static char buf[512];\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0memset(buf, 'Z', sizeof(buf));\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0VCP_write(buf, sizeof(buf));\r\n\u00a0\u00a0 \u00a0}<\/pre>\n<p>We will use the profiler to see whether running memset() takes considerable time compared to sending the data via USB.<\/li>\n<li>Begin a profiling session, connect to the virtual COM port created by our device using SmarTTY and type some character. The device will reply with a stream of &#8216;Z&#8217;-s:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/16-zzz.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1224\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/16-zzz.png\" alt=\"16-zzz\" width=\"674\" height=\"440\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/16-zzz.png 674w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/16-zzz-300x196.png 300w\" sizes=\"(max-width: 674px) 100vw, 674px\" \/><\/a><\/li>\n<li>Once that happens, click the &#8220;reset content&#8221; button in the Live Profiler window to remove the previous records. You will see that memset() is running much less than 1% of the time and is by no means a bottleneck:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/17-memset.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1225\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/17-memset.png\" alt=\"17-memset\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/17-memset.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/17-memset-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>If some of the stack frames you see in the Live Profiling window do not make sense, try adding the <strong>-fno-omit-frame-pointer<\/strong> flag to common flags. This will give more context to the stack unwinding logic resulting in more consistent stack frame records:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/frameptr.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1233\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/frameptr.png\" alt=\"frameptr\" width=\"729\" height=\"594\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/frameptr.png 729w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/frameptr-300x244.png 300w\" sizes=\"(max-width: 729px) 100vw, 729px\" \/><\/a><\/li>\n<li>Now let&#8217;s see what happens if we run memset() 1000 times per cycle and not just once. Add the for() loop before memset(), build your project and start another session:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/18-x1k.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1226\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/18-x1k.png\" alt=\"18-x1k\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/18-x1k.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/18-x1k-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>You will see that memset() now takes around 17% of time and is becoming to affect performance.<\/li>\n<li>You can view the history of the profiling sessions for each project via the View-&gt;Profiling reports command:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/19-reports.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1227\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/19-reports.png\" alt=\"19-reports\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/19-reports.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/19-reports-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<li>Simply double-click on a report in the Profiling Reports window and VisualGDB will open it in a separate tab:<a href=\"http:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/20-viewreport.png\"> <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1228\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/20-viewreport.png\" alt=\"20-viewreport\" width=\"783\" height=\"619\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/20-viewreport.png 783w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2015\/12\/20-viewreport-300x237.png 300w\" sizes=\"(max-width: 783px) 100vw, 783px\" \/><\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial shows how to analyze the performance of an embedded application running on the STM32 board using the sampling<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[108,107],"tags":[109,61],"_links":{"self":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/1208"}],"collection":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/comments?post=1208"}],"version-history":[{"count":4,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/1208\/revisions"}],"predecessor-version":[{"id":1234,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/1208\/revisions\/1234"}],"wp:attachment":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media?parent=1208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/categories?post=1208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/tags?post=1208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}