{"id":3454,"date":"2017-10-31T21:24:48","date_gmt":"2017-11-01T04:24:48","guid":{"rendered":"https:\/\/visualgdb.com\/w\/?p=3454"},"modified":"2017-10-31T22:21:48","modified_gmt":"2017-11-01T05:21:48","slug":"profiling-linux-c-code-with-visual-studio","status":"publish","type":"post","link":"https:\/\/visualgdb.com\/tutorials\/profiler\/linux\/","title":{"rendered":"Profiling Linux C++ Code with Visual Studio"},"content":{"rendered":"<p>This tutorial shows how to profile C++ code using Visual Studio, valgrind and VisualGDB. We will show how to import the <a href=\"https:\/\/github.com\/nlohmann\/json\">JSON for Modern C++<\/a> parser (as of 31 October 2017) into a Visual Studio project, build it under Linux, run a benchmark, quickly identify the code consuming most of the time and optimize it. We will increase the performance of the release build by over 1.5x by locating and eliminating a few bottlenecks.<\/p>\n<p>The techniques shown in this tutorial can work with any other C\/C++ code as well; we are using the JSON parser to simply demonstrate the functionality on a real-world project rather than on a synthetic example.<\/p>\n<p>Before you begin, install Visual Studio and VisualGDB 5.3 or later. Also prepare a Linux machine that will be used to build and profile the JSON benchmark.<\/p>\n<ol>\n<li>Start Visual Studio and open the VisualGDB Linux Project Wizard:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/01-newproj.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3455\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/01-newproj.png\" alt=\"01-newproj\" width=\"855\" height=\"489\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/01-newproj.png 855w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/01-newproj-300x172.png 300w\" sizes=\"(max-width: 855px) 100vw, 855px\" \/><\/a><\/li>\n<li>On the first page of the wizard select &#8220;Import a project -&gt; Import a CMake project -&gt; Use the advanced CMake Project Subsystem&#8221;:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/02-import.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3456\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/02-import.png\" alt=\"02-import\" width=\"822\" height=\"662\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/02-import.png 822w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/02-import-300x242.png 300w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/a><\/li>\n<li>On the next page select the Linux machine you want to use for building:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/03-machine.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3457\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/03-machine.png\" alt=\"03-machine\" width=\"822\" height=\"662\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/03-machine.png 822w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/03-machine-300x242.png 300w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/a><\/li>\n<li>Connect to your Linux machine over SSH and clone the https:\/\/github.com\/nlohmann\/json repository to a local directory:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/04-clone.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3458\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/04-clone.png\" alt=\"04-clone\" width=\"1031\" height=\"367\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/04-clone.png 1031w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/04-clone-300x107.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/04-clone-1024x365.png 1024w\" sizes=\"(max-width: 1031px) 100vw, 1031px\" \/><\/a><\/li>\n<li>Get back to VisualGDB Project Wizard and specify the directory where you cloned the repository:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/05-dir.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3459\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/05-dir.png\" alt=\"05-dir\" width=\"822\" height=\"662\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/05-dir.png 822w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/05-dir-300x242.png 300w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/a><\/li>\n<li>Although VisualGDB can access the remote files directly over SSH, it is recommended to copy them locally if you want to use the profiler. This will help VisualGDB adjust the function locations reported by valgrind based on the actual function declarations:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/06-access.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3460\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/06-access.png\" alt=\"06-access\" width=\"822\" height=\"662\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/06-access.png 822w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/06-access-300x242.png 300w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/a><\/li>\n<li>Proceed with the default settings on the last page:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/07-cmake.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3461\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/07-cmake.png\" alt=\"07-cmake\" width=\"822\" height=\"662\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/07-cmake.png 822w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/07-cmake-300x242.png 300w\" sizes=\"(max-width: 822px) 100vw, 822px\" \/><\/a><\/li>\n<li>Press &#8220;Finish&#8221; to create the project. VisualGDB will automatically detect the targets declared in the original CMakeLists.txt file and show them in Solution Explorer:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/08-prj.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3462\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/08-prj.png\" alt=\"08-prj\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/08-prj.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/08-prj-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/08-prj-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>The original CMakeLists.txt files don&#8217;t contain the targets for the benchmark. You can add it manually by right-clicking on the .vgdbcmake project and selecting &#8220;Add-&gt;New Items-&gt;Executable&#8221;. Add an executable called &#8220;benchmark&#8221; in the &#8220;benchmarks&#8221; subdirectory:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/10-bench.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3463\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/10-bench.png\" alt=\"10-bench\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/10-bench.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/10-bench-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/10-bench-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>VisualGDB will automatically create the CMakeLists.txt file in that subdirectory. Remove the auto-generated source file and reference the benchmarks.cpp file instead:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/11-simple.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3464\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/11-simple.png\" alt=\"11-simple\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/11-simple.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/11-simple-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/11-simple-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Add &#8220;thirdparty\/benchpress&#8221; and &#8220;thirdparty\/cxxopts&#8221; to the include directories for the new target:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/12-dirs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3465\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/12-dirs.png\" alt=\"12-dirs\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/12-dirs.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/12-dirs-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/12-dirs-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Add &#8220;pthread&#8221; to the linked library names:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/13-pthread.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3466\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/13-pthread.png\" alt=\"13-pthread\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/13-pthread.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/13-pthread-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/13-pthread-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>To make profiling faster, we will comment out all benchmarks except &#8220;parse canada.json&#8221; and focus on optimizing the parser for this set. Build the target by selecting &#8220;Build Target(s)&#8221; in Solution Explorer:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/15-build1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3467\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/15-build1.png\" alt=\"15-build\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/15-build1.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/15-build1-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/15-build1-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Ensure you have the <strong>RelWithDebInfo<\/strong> configuration selected. The regular <strong>Release<\/strong> configuration doesn&#8217;t generate debug symbols, so VisualGDB won&#8217;t be able to highlight performance of individual code lines:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/16-built.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3468\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/16-built.png\" alt=\"16-built\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/16-built.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/16-built-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/16-built-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Open Debug Properties for the target and set the working directory to the &#8220;benchmarks&#8221; subdirectory:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/17-workdir.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3469\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/17-workdir.png\" alt=\"17-workdir\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/17-workdir.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/17-workdir-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/17-workdir-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Run the program without debugger by pressing Ctrl-F5 and take note of the parsing time:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/18-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3470\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/18-time.png\" alt=\"18-time\" width=\"806\" height=\"293\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/18-time.png 806w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/18-time-300x109.png 300w\" sizes=\"(max-width: 806px) 100vw, 806px\" \/><\/a><\/li>\n<li>Now we will begin optimizing the program performance. Select Analyze-&gt;Analyze Performance with VisualGDB:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/19-analyze.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3471\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/19-analyze.png\" alt=\"19-analyze\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/19-analyze.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/19-analyze-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/19-analyze-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a>Ensure that the &#8220;benchmark&#8221; target is selected as the startup target, as VisualGDB always profiles the startup target.<\/li>\n<li>Select &#8220;Profile without debugging&#8221; and press &#8220;OK&#8221;:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/20-start.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3472\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/20-start.png\" alt=\"20-start\" width=\"574\" height=\"434\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/20-start.png 574w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/20-start-300x227.png 300w\" sizes=\"(max-width: 574px) 100vw, 574px\" \/><\/a><\/li>\n<li>VisualGDB will launch the benchmark under profiler and begin collecting performance data:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/21-recording.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3473\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/21-recording.png\" alt=\"21-recording\" width=\"772\" height=\"511\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/21-recording.png 772w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/21-recording-300x199.png 300w\" sizes=\"(max-width: 772px) 100vw, 772px\" \/><\/a><\/li>\n<li>Once the profiling is complete, VisualGDB will automatically display function call tree sorted by the inclusive time. Expand the critical path until you see the &#8220;strtod&#8221; function. The function is responsible for parsing string representations of the floating point numbers (e.g. &#8220;1.2345&#8221;):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/22-critical.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3474\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/22-critical.png\" alt=\"22-critical\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/22-critical.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/22-critical-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/22-critical-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>You can enter the total time observed without profiling in the &#8220;total time&#8221; field so that VisualGDB will try to estimate how much time each function would take under normal circumstances (without profiler). Note that this estimate is not 100% accurate as VisualGDB (and valgrind) counts CPU instructions that don&#8217;t always translate 1-to-1 to CPU cycles:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/23-times.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3475\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/23-times.png\" alt=\"23-times\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/23-times.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/23-times-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/23-times-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>A quick <a href=\"https:\/\/tinodidriksen.com\/2011\/05\/cpp-convert-string-to-double-speed\/\">research<\/a> shows that the strtod() function is indeed relatively slow as it handles the advanced &#8216;E&#8217; syntax. We will address this by replacing it with a simpler implementation handling normal cases that will fall back to calling strtod() if it encounters the &#8216;e&#8217; character (the implementation is based on <a href=\"https:\/\/tinodidriksen.com\/uploads\/code\/cpp\/speed-string-to-double.cpp\">this one<\/a>). Create a file called &#8220;Optimizations.h&#8221; with the following contents:\n<pre class=\"\">#pragma once\r\n#include &lt;sys\/types.h&gt;\r\n#include &lt;string&gt;\r\n#include &lt;vector&gt;\r\n\r\ndouble strtod_fast(const char *start, char **end)\r\n{\r\n\u00a0\u00a0\u00a0 const char *p = start;\r\n\u00a0\u00a0\u00a0 double r = 0.0;\r\n\u00a0\u00a0\u00a0 bool neg = false;\r\n\u00a0\u00a0\u00a0 if (*p == '-') {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 neg = true;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ++p;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 while (*p &gt;= '0' &amp;&amp; *p &lt;= '9') {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 r = (r * 10.0) + (*p - '0');\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ++p;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0 if (*p == '.') {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double f = 0.0;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double coef = 1.0;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ++p;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 while (*p &gt;= '0' &amp;&amp; *p &lt;= '9') {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 f = (f * 10.0) + (*p - '0');\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ++p;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 coef \/= 10.0;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 r += f * coef;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0 if (neg) {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 r = -r;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 if ((*p | 0x20) == 'e')\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return strtod(start, end);\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 if (end)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 *end = const_cast&lt;char *&gt;(p);\r\n\u00a0\u00a0\u00a0 return r;\r\n}<\/pre>\n<\/li>\n<li>Include the file from json.hpp and replace the call to std::strtod() with a call to strtod_fast():<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/24-fast.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3476\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/24-fast.png\" alt=\"24-fast\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/24-fast.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/24-fast-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/24-fast-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Build the program and run it without debugging. See how the execution time got ~20% faster: <a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/25-newtime.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3477\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/25-newtime.png\" alt=\"25-newtime\" width=\"806\" height=\"293\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/25-newtime.png 806w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/25-newtime-300x109.png 300w\" sizes=\"(max-width: 806px) 100vw, 806px\" \/><\/a><\/li>\n<li>Doing another profiling session will quickly reveal that <strong>strtod_fast()<\/strong> is indeed much faster than <strong>strtod()<\/strong> and now takes only 8% of the total program run time:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/26-newfunc.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3478\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/26-newfunc.png\" alt=\"26-newfunc\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/26-newfunc.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/26-newfunc-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/26-newfunc-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>You can double-click on the function in the report and then click &#8220;annotate lines&#8221; to highlight performance of each individual line:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/27-lines.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3479\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/27-lines.png\" alt=\"27-lines\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/27-lines.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/27-lines-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/27-lines-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a>See how most of the time is spent parsing digits after the decimal point. If we wanted to optimize this function further, we could use a lookup table to handle 2 characters at once or try first converting all digits to an integral value and then dividing it in on FP operation.<\/li>\n<li>Now we will show a few more optimizations. Enable the &#8220;show individual lines&#8221; switch and navigate to the &#8220;scan_number()&#8221; method. See that the &#8220;yytext.push_back()&#8221; line is now taking almost 20% of the entire program run time:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/28-pushback.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3480\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/28-pushback.png\" alt=\"28-pushback\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/28-pushback.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/28-pushback-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/28-pushback-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Double-click on the line to open the corresponding source code. Note how <strong>yytext<\/strong> is defined as a string:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/29-yytext.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3481\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/29-yytext.png\" alt=\"29-yytext\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/29-yytext.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/29-yytext-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/29-yytext-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>As appending a character to it is considerably slowing down our program, we can replace it with a custom container optimized for quick appending characters one-by-one:\n<ul>\n<li>The custom container will not be null-terminated until its contents is explicitly requested<\/li>\n<li>It will initially allocate a large chunk of data for the string to avoid reallocating it when appending characters<\/li>\n<\/ul>\n<p>Add the following code to Optimizations.h:<\/p>\n<pre class=\"\">class FastAppendableString\r\n{\r\nprotected:\r\n\u00a0\u00a0\u00a0 char *m_pData;\r\n\u00a0\u00a0\u00a0 mutable char *m_pEnd;\r\n\u00a0\u00a0\u00a0 char *m_pEndOfAlloc;\r\n\u00a0\u00a0\u00a0 mutable bool m_NullTerminated = false;\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 void DoAppendChar(char ch)\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (m_pEnd &gt;= m_pEndOfAlloc)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 asm(\"int3\");\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 *m_pEnd++ = ch;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 void EnsureRawForm() const\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (m_NullTerminated)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pEnd--;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_NullTerminated = false;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\npublic:\r\n\u00a0\u00a0\u00a0 FastAppendableString()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pData = (char *)malloc(4096);\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pEnd = m_pData;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pEndOfAlloc = m_pData + 4096;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 ~FastAppendableString()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 free(m_pData);\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 FastAppendableString(const FastAppendableString &amp;) = delete;\r\n\u00a0\u00a0\u00a0 void operator=(const FastAppendableString &amp;) = delete;\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 const char *data()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (!m_NullTerminated)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DoAppendChar(0);\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_NullTerminated = true;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return m_pData;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 const size_t size()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (m_NullTerminated)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return m_pEnd - m_pData - 1;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return m_pEnd - m_pData;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 void clear()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pEnd = m_pData;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_NullTerminated = false;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 void push_back(char ch)\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EnsureRawForm();\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DoAppendChar(ch);\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 operator std::string()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return data();\r\n\u00a0\u00a0\u00a0 }\r\n};<\/pre>\n<p>Then replace the declaration of yytext to use the new custom container type: <a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/30-string.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3482\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/30-string.png\" alt=\"30-string\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/30-string.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/30-string-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/30-string-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Run the program again and see another 12% increase in the performance:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/31-time3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3483\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/31-time3.png\" alt=\"31-time3\" width=\"806\" height=\"293\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/31-time3.png 806w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/31-time3-300x109.png 300w\" sizes=\"(max-width: 806px) 100vw, 806px\" \/><\/a>Note that we have intentionally left the reallocation case unimplemented (it will trigger a breakpoint via the &#8220;int3&#8221; instruction). Adding code to reallocate the buffer will slightly reduce the performance as it will force the GCC to save the registers modified by <strong>realloc()<\/strong> before checking whether a call to realloc() is needed. This could be resolved by making a non-inlineable function saving all CPU registers on entry and calling it via inline assembly.<\/li>\n<li>The JSON parser example shown in this tutorial contains another relatively easy optimization opportunity. Switch to the &#8220;All functions&#8221; view and sort the functions by exclusive time. See how the emplace_back() method is taking 10% of the time. It is called from 4 different locations and Valgrind normally doesn&#8217;t distinguish which of them contributes the most to the execution time:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/32-emplace.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3484\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/32-emplace.png\" alt=\"32-emplace\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/32-emplace.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/32-emplace-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/32-emplace-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>To easily distinguish between different invocation paths of emplace_back(), run another profiling session enabling the &#8220;distinguish up to 1000 parent frames&#8221; option:<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/33-stack.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3485\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/33-stack.png\" alt=\"33-stack\" width=\"574\" height=\"434\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/33-stack.png 574w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/33-stack-300x227.png 300w\" sizes=\"(max-width: 574px) 100vw, 574px\" \/><\/a><\/li>\n<li>Now you can expand the [all contexts] node and see that 8% of the total program runtime is spent on the emplace_back() call from lexer::get():<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/34-advstack.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3486\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/34-advstack.png\" alt=\"34-advstack\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/34-advstack.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/34-advstack-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/34-advstack-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>This scenario is similar to the previous one (slow character-by-character appending):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/35-tokenstring.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3487\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/35-tokenstring.png\" alt=\"35-tokenstring\" width=\"1123\" height=\"740\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/35-tokenstring.png 1123w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/35-tokenstring-300x198.png 300w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/35-tokenstring-1024x675.png 1024w\" sizes=\"(max-width: 1123px) 100vw, 1123px\" \/><\/a><\/li>\n<li>Add another optimized container to Optimizations.h and edit token_string to use it:\n<pre class=\"\">class FastAppendableString2 : public FastAppendableString\r\n{\r\npublic:\r\n\u00a0\u00a0\u00a0 void pop_back()\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EnsureRawForm();\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if (m_pEnd &gt;= m_pData)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m_pEnd--;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 const char *begin() const\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EnsureRawForm();\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return m_pData;\r\n\u00a0\u00a0\u00a0 }\r\n\u00a0\u00a0 \u00a0\r\n\u00a0\u00a0\u00a0 const char *end() const\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return m_pEnd;\r\n\u00a0\u00a0\u00a0 }\r\n};<\/pre>\n<\/li>\n<li>Run the program again. See how we have reduced the runtime by another 7.5% (35% down from the original version):<a href=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/36-time4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3488\" src=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/36-time4.png\" alt=\"36-time4\" width=\"806\" height=\"293\" srcset=\"https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/36-time4.png 806w, https:\/\/visualgdb.com\/w\/wp-content\/uploads\/2017\/10\/36-time4-300x109.png 300w\" sizes=\"(max-width: 806px) 100vw, 806px\" \/><\/a>Note that profiling helps you optimize the program for a specific scenario. E.g. one of 3 optimizations in this tutorial relied on heavy use of floating-point numbers in the dataset and would not give much improvement for data sets containing mostly integral numbers and strings. Remember to always use real data sets (or proportionally scaled down subsets) when you profile your code to get the best final results.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial shows how to profile C++ code using Visual Studio, valgrind and VisualGDB. We will show how to import<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[152],"tags":[33,109,153],"_links":{"self":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3454"}],"collection":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/comments?post=3454"}],"version-history":[{"count":7,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3454\/revisions"}],"predecessor-version":[{"id":3495,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/posts\/3454\/revisions\/3495"}],"wp:attachment":[{"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/media?parent=3454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/categories?post=3454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/visualgdb.com\/w\/wp-json\/wp\/v2\/tags?post=3454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}