December 13th, 2022

Improving the State of Debug Performance in C++

Cameron DaCamara
Senior Software Engineer

In this blog we will explore one change the MSVC compiler has implemented in an effort to improve the codegen quality of applications in debug mode. We will highlight what the change does, and how it could be extended for the future. If debug performance is something you care about for your C++ projects, then Visual Studio 2022 version 17.5 is making that experience even better!

Please note that this blog will contain some assembly but being an expert in assembly is not required.

Overview

Motivation

You might notice that the title of this blog is a play on words based on a recent popular blog post of a similar name, “the sad state of debug performance in c++”. In the blog Vittorio Romeo highlights some general C++ shortcomings when it comes to debugging performance. Vittorio also also filed this Developer Community ticket “`std::move` (and similar functions) result in poor debug performance and worse debugging experience“; thanks to him and everyone who voted! Much of the reason for the observed slowdown is the cost of abstraction, with the notable example of std::move where the following code:

int i = 0;
std::move(i);

Would generate a function call when the code is conceptually:

int i = 0;
static_cast<int&&>(i);

The function std::move is conceptually a named cast, much like static_cast but with a contextual meaning for code around it. The penalty for using this named cast is that you get a function call generated in the debug assembly. Here’s the assembly of the two examples above:

std::move (click to expand) static_cast (click to expand)
main	PROC
sub	rsp, 56	; 00000038H
mov	DWORD PTR i$[rsp], 0
lea	rcx, QWORD PTR i$[rsp]
call	??$move@AEAH@std@@YA$$QEAHAEAH@Z
xor	eax, eax
add	rsp, 56	; 00000038H
ret	0
main	ENDP
main	PROC
sub	rsp, 24
mov	DWORD PTR i$[rsp], 0
xor	eax, eax
add	rsp, 24
ret	0
main	ENDP

Note to readers: All code samples compiled in this blog were compiled with “/Od /std:c++latest

On the surface, the compiler only generated 2 extra instructions in the std::move case, but the ‘call’ instruction, in particular, is both expensive and executes this code in addition to the code above:

??$move@AEAH@std@@YA$$QEAHAEAH@Z PROC			; std::move<int &>, COMDAT
mov	QWORD PTR [rsp+8], rcx
mov	rax, QWORD PTR _Arg$[rsp]
ret	0
??$move@AEAH@std@@YA$$QEAHAEAH@Z ENDP			; std::move<int &>

Note: to generate the assembly above, the compiler can be provided with the /Fa option. Furthermore, the weird names like “??$move@AEAH@std@@YA$$QEAHAEAH@Z” are a mangled name of the function template specialization of std::move.

So really your binary is now at a 5 instruction deficit to the static_cast code, and this cost is multiplied by the number of times that std::move is used.

Some compilers have already implemented some mechanism to acknowledge meta functions like std::move and std::forward as compiler intrinsics (as noted in Vittorio’s blog) and this support is done completely in the compiler front-end. As of 17.5, MSVC is offering better debugging performance by acknowledging these meta functions as well! More on how we do it later in this blog, but first…

Show me some code!

Note to readers: to take advantage of the new codegen quality, you will need to provide the /permissive- compiler option. Also worthy to note that /permissive- is implied when /std:c++20 or /std:c++latest is used.

Let’s take the simple example above again and make it a full program:

#include <utility>

int main() {
    int i = 0;
    std::move(i);
    std::forward<int&>(i);
}

Here’s the generated assembly difference between 17.4 and 17.5:

17.4 (click to expand) 17.5 (click to expand)
_Arg$ = 8
??$forward@AEAH@std@@YAAEAHAEAH@Z PROC
mov	QWORD PTR [rsp+8], rcx
mov	rax, QWORD PTR _Arg$[rsp]
ret	0
??$forward@AEAH@std@@YAAEAHAEAH@Z ENDP
_TEXT	ENDS
_TEXT	SEGMENT
_Arg$ = 8
??$move@AEAH@std@@YA$$QEAHAEAH@Z PROC
mov	QWORD PTR [rsp+8], rcx
mov	rax, QWORD PTR _Arg$[rsp]
ret	0
??$move@AEAH@std@@YA$$QEAHAEAH@Z ENDP
_TEXT	ENDS
_TEXT	SEGMENT
i$ = 32
main	PROC
sub	rsp, 56		; 00000038H
mov	DWORD PTR i$[rsp], 0
lea	rcx, QWORD PTR i$[rsp]
call	??$move@AEAH@std@@YA$$QEAHAEAH@Z
lea	rcx, QWORD PTR i$[rsp]
call	??$forward@AEAH@std@@YAAEAHAEAH@Z
xor	eax, eax
add	rsp, 56		; 00000038H
ret	0
main	ENDP
i$ = 0
main	PROC
$LN3:
sub	rsp, 24
mov	DWORD PTR i$[rsp], 0
xor	eax, eax
add	rsp, 24
ret	0
main	ENDP

Assembly reading tip: The main PROC above is our main function in the C++ code. The instructions that follow main PROC are what your CPU will execute when your program is first invoked. In the case above, it is clear that the code produced by 17.5 is much smaller, which can sometimes be an indication of a performance win. For the purposes of this blog, the performance win is both in the size of the code produced and the reduction in indirections due to inlining the ‘call’ instruction to std::move and std::forward. For the purposes of this blog we will rely on the newly generated assembly reduced complexity as an indicator of possible performance wins.

Yes, you read that right, the generated code in 17.5 doesn’t even create assembly entries for std::move or std::forward—which makes sense, they’re never called.

Let’s look at a slightly more complicated code example:

#include <utility>

template <typename T>
void add_1_impl(T&& x) {
    std::forward<T>(x) += std::move(1);
}

template <typename T, int N>
void add_1(T (&arr)[N]) {
    for (auto&& e : arr) {
        add_1_impl(e);
    }
}

int main() {
    int arr[10]{};
    add_1(arr);
}

In this code all we want to do is add 1 to all elements of the array. Here’s the table (only showing the add_1_impl function with std::forward and std::move):

17.4 (click to expand) 17.5 (click to expand)
??$add_1_impl@AEAH@@YAXAEAH@Z PROC
$LN3:
mov	QWORD PTR [rsp+8], rcx
sub	rsp, 72	; 00000048H
mov	DWORD PTR $T1[rsp], 1
lea	rcx, QWORD PTR $T1[rsp]
call	??$move@H@std@@YA$$QEAH$$QEAH@Z
mov	eax, DWORD PTR [rax]
mov	DWORD PTR tv72[rsp], eax
mov	rcx, QWORD PTR x$[rsp]
call	??$forward@AEAH@std@@YAAEAHAEAH@Z
mov	QWORD PTR tv68[rsp], rax
mov	rax, QWORD PTR tv68[rsp]
mov	eax, DWORD PTR [rax]
mov	DWORD PTR tv70[rsp], eax
mov	eax, DWORD PTR tv72[rsp]
mov	ecx, DWORD PTR tv70[rsp]
add	ecx, eax
mov	eax, ecx
mov	rcx, QWORD PTR tv68[rsp]
mov	DWORD PTR [rcx], eax
add	rsp, 72	; 00000048H
ret	0
??$add_1_impl@AEAH@@YAXAEAH@Z ENDP
??$add_1_impl@AEAH@@YAXAEAH@Z PROC
$LN3:
mov	QWORD PTR [rsp+8], rcx
sub	rsp, 24
mov	DWORD PTR $T1[rsp], 1
mov	rax, QWORD PTR x$[rsp]
mov	eax, DWORD PTR [rax]
add	eax, DWORD PTR $T1[rsp]
mov	rcx, QWORD PTR x$[rsp]
mov	DWORD PTR [rcx], eax
add	rsp, 24
ret	0
??$add_1_impl@AEAH@@YAXAEAH@Z ENDP

17.4 has 21 instructions while 17.5 has only 10, but this comparison is made that much more extreme by the fact that we are calling add_impl_1 in a loop so the complexity of executed instructions in 17.4 can ostensibly be significantly more costly than in 17.5—worse than that, actually, because we’re not accounting for the instructions executed in the functions std::forward and std::move.

Let’s make the code sample even more interesting and extreme to illustrate the visible differences. It might be observed that if we manually unroll the loop above we can get a performance win, so let’s do that using templates:

#include <utility>

template <typename T, int N, std::size_t... Is>
void add_1_impl(std::index_sequence<Is...>, T (&arr)[N]) {
    ((std::forward<T&>(arr[Is]) += std::move(1)), ...);
}

template <typename T, int N>
void add_1(T (&arr)[N]) {
    add_1_impl(std::make_index_sequence<N>{}, arr);
}

int main() {
    int arr[10]{};
    add_1(arr);
}

The code above replaces the loop in the previous example with a single fold expression. Let’s peek at the codegen (again only snipping add_1_impl with std::forward and std::move, we also replace the mangled function name with add_1_impl<...>):

17.4 (click to expand) 17.5 (click to expand)
add_1_impl<...> PROC 
$LN3: 
mov	QWORD PTR [rsp+16], rdx 
mov	BYTE PTR [rsp+8], cl 
sub	rsp, 248	; 000000f8H 
mov	DWORD PTR $T1[rsp], 1 
lea	rcx, QWORD PTR $T1[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv74[rsp], eax 
mov	eax, 4 
imul	rax, rax, 0 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv70[rsp], rax 
mov	rax, QWORD PTR tv70[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv72[rsp], eax 
mov	eax, DWORD PTR tv74[rsp] 
mov	ecx, DWORD PTR tv72[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv70[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T2[rsp], 1 
lea	rcx, QWORD PTR $T2[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv86[rsp], eax 
mov	eax, 4 
imul	rax, rax, 1 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv82[rsp], rax 
mov	rax, QWORD PTR tv82[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv84[rsp], eax 
mov	eax, DWORD PTR tv86[rsp] 
mov	ecx, DWORD PTR tv84[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv82[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T3[rsp], 1 
lea	rcx, QWORD PTR $T3[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv130[rsp], eax 
mov	eax, 4 
imul	rax, rax, 2 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv94[rsp], rax 
mov	rax, QWORD PTR tv94[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv128[rsp], eax 
mov	eax, DWORD PTR tv130[rsp] 
mov	ecx, DWORD PTR tv128[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv94[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T4[rsp], 1 
lea	rcx, QWORD PTR $T4[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv142[rsp], eax 
mov	eax, 4 
imul	rax, rax, 3 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv138[rsp], rax 
mov	rax, QWORD PTR tv138[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv140[rsp], eax 
mov	eax, DWORD PTR tv142[rsp] 
mov	ecx, DWORD PTR tv140[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv138[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T5[rsp], 1 
lea	rcx, QWORD PTR $T5[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv154[rsp], eax 
mov	eax, 4 
imul	rax, rax, 4 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv150[rsp], rax 
mov	rax, QWORD PTR tv150[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv152[rsp], eax 
mov	eax, DWORD PTR tv154[rsp] 
mov	ecx, DWORD PTR tv152[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv150[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T6[rsp], 1 
lea	rcx, QWORD PTR $T6[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv166[rsp], eax 
mov	eax, 4 
imul	rax, rax, 5 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv162[rsp], rax 
mov	rax, QWORD PTR tv162[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv164[rsp], eax 
mov	eax, DWORD PTR tv166[rsp] 
mov	ecx, DWORD PTR tv164[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv162[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T7[rsp], 1 
lea	rcx, QWORD PTR $T7[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv178[rsp], eax 
mov	eax, 4 
imul	rax, rax, 6 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv174[rsp], rax 
mov	rax, QWORD PTR tv174[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv176[rsp], eax 
mov	eax, DWORD PTR tv178[rsp] 
mov	ecx, DWORD PTR tv176[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv174[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T8[rsp], 1 
lea	rcx, QWORD PTR $T8[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv190[rsp], eax 
mov	eax, 4 
imul	rax, rax, 7 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv186[rsp], rax 
mov	rax, QWORD PTR tv186[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv188[rsp], eax 
mov	eax, DWORD PTR tv190[rsp] 
mov	ecx, DWORD PTR tv188[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv186[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T9[rsp], 1 
lea	rcx, QWORD PTR $T9[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv202[rsp], eax 
mov	eax, 4 
imul	rax, rax, 8 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv198[rsp], rax 
mov	rax, QWORD PTR tv198[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv200[rsp], eax 
mov	eax, DWORD PTR tv202[rsp] 
mov	ecx, DWORD PTR tv200[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv198[rsp] 
mov	DWORD PTR [rcx], eax 
mov	DWORD PTR $T10[rsp], 1 
lea	rcx, QWORD PTR $T10[rsp] 
call	??$move@H@std@@YA$$QEAH$$QEAH@Z 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv214[rsp], eax 
mov	eax, 4 
imul	rax, rax, 9 
mov	rcx, QWORD PTR arr$[rsp] 
add	rcx, rax 
mov	rax, rcx 
mov	rcx, rax 
call	??$forward@AEAH@std@@YAAEAHAEAH@Z 
mov	QWORD PTR tv210[rsp], rax 
mov	rax, QWORD PTR tv210[rsp] 
mov	eax, DWORD PTR [rax] 
mov	DWORD PTR tv212[rsp], eax 
mov	eax, DWORD PTR tv214[rsp] 
mov	ecx, DWORD PTR tv212[rsp] 
add	ecx, eax 
mov	eax, ecx 
mov	rcx, QWORD PTR tv210[rsp] 
mov	DWORD PTR [rcx], eax 
add	rsp, 248	; 000000f8H 
ret	0 
add_1_impl<...> ENDP
add_1_impl<...> PROC 
$LN3: 
mov	QWORD PTR [rsp+16], rdx 
mov	BYTE PTR [rsp+8], cl 
sub	rsp, 56	; 00000038H 
mov	DWORD PTR $T1[rsp], 1 
mov	eax, 4 
imul	rax, rax, 0 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T1[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 0 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T2[rsp], 1 
mov	eax, 4 
imul	rax, rax, 1 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T2[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 1 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T3[rsp], 1 
mov	eax, 4 
imul	rax, rax, 2 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T3[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 2 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T4[rsp], 1 
mov	eax, 4 
imul	rax, rax, 3 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T4[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 3 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T5[rsp], 1 
mov	eax, 4 
imul	rax, rax, 4 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T5[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 4 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T6[rsp], 1 
mov	eax, 4 
imul	rax, rax, 5 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T6[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 5 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T7[rsp], 1 
mov	eax, 4 
imul	rax, rax, 6 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T7[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 6 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T8[rsp], 1 
mov	eax, 4 
imul	rax, rax, 7 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T8[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 7 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T9[rsp], 1 
mov	eax, 4 
imul	rax, rax, 8 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T9[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 8 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
mov	DWORD PTR $T10[rsp], 1 
mov	eax, 4 
imul	rax, rax, 9 
mov	rcx, QWORD PTR arr$[rsp] 
mov	eax, DWORD PTR [rcx+rax] 
add	eax, DWORD PTR $T10[rsp] 
mov	ecx, 4 
imul	rcx, rcx, 9 
mov	rdx, QWORD PTR arr$[rsp] 
mov	DWORD PTR [rdx+rcx], eax 
add	rsp, 56	; 00000038H 
ret	0 
add_1_impl<...> ENDP

Our 17.4 example clocks in at a whopping 226 instructions while our 17.5 example is only 106 and the complexity of the instructions in 17.4 appears to be far more costly due to the number of call frame setups and ‘call’ instructions which are not present on the 17.5 side.

OK, perhaps the examples above are contrived and it might be far-fetched to think that code like the above would truly impact performance, but let’s take some code that is all but guaranteed to have some kind of real world application:

#include <vector>

int main() {
    std::vector<int> v;
    v.push_back(1); 
}

I will save you the massive assembly output on this one and simply callout the assembly size difference:

  • 17.4: 3136
  • 17.5: 3063

Your assembly is 74 instructions shorter just by the compiler eliding these meta functions, and you can all but guarantee that in the places where std::move and std::forward are used, they may be used in a loop (i.e. resizing the vector and moving the elements to a new memory block). Furthermore, since these meta functions are never instantiated the corresponding .obj, .lib, and .pdb will be slightly smaller after upgrading to 17.5.

How we did it

Rather than try to make the compiler aware of meta functions that act as named, no-op casts (i.e. the cast does not require a pointer adjustment), the compiler took an alternative approach and implemented this new inlining ability using a C++ attribute: [[msvc::intrinsic]].

The new attribute will semantically replace a function call with a cast to that function’s return type if the function definition is decorated with [[msvc::intrinsic]]. You can see how we applied this new attribute in the STL: GH3182. The reason the compiler decided to go down the attribute route is that we want to eventually extend the scenarios it can cover and offer a data-driven approach to selectively decorate code with the new functionality. The latter is important for users of MSVC as well.

You can read more about the attribute and its constraints and semantics in the Microsoft-specific attributes section of our documentation.

Looking ahead…

The compiler front-end is not alone in this story of improving the performance of generated code for debugging purposes, the compiler back-end is also working very hard on some debug codegen scenarios that they will share in the coming months.

Call to action: what types of debugging optimizations matter to you? What optimizations for debug code would you like to see MSVC implement?

Especially if you work for a game studio, please help us find out what your debugging workflow looks like by taking this survey: https://aka.ms/MSVCDebugSurvey. Data like this helps the team focus on what workflows are important to you.

Onward and upward!

Closing

As always, we welcome your feedback. Feel free to send any comments through e-mail at visualcpp@microsoft.com or through Twitter @visualc. Also, feel free to follow Cameron DaCamara on Twitter @starfreakclone.

If you encounter other problems with MSVC in VS 2019/2022 please let us know via the Report a Problem option, either from the installer or the Visual Studio IDE itself. For suggestions or bug reports, let us know through DevComm.

Author

Cameron DaCamara
Senior Software Engineer

Senior Engineer, Visual C++ compiler front-end team at Microsoft.

0 comments

Discussion are closed.