November 24th, 2023

On harmful overuse of std::move

The C++ std::move function casts its parameter to an rvalue reference, which enables its contents to be consumed by another operation. But in your excitement about this new expressive capability, take care not to overuse it.

std::string get_name(int id)
{
    std::string name = std::to_string(id);
    /* assume other calculations happen here */
    return std::move(name);
}

You think you are giving the compiler some help by saying “Hey, like, I’m not using my local variable name after this point, so you can just move the string into the return value.”

Unfortunately, your help is actually hurting. Adding a std::move causes the return statement to fail to satisfy the conditions for copy elision (commonly known as Named Return Value Optimization, or NRVO): The thing being returned must be the name of a local variable with the same type as the function return value.

The added std::move prevents NRVO, and the return value is move-constructed from the name variable.

std::string get_name(int id)
{
    std::string name = std::to_string(id);
    /* assume other calculations happen here */
    return name;
}

This time, we return name directly, and the compiler can now elide the copy and put the name variable directly in the return value slot with no copy. (Compilers are permitted but not required to perform this optimization, but in practice, all compilers will do it if all code paths return the same local variable.)

The other half of the overzealous std::move is on the receiving end.

extern void report_name(std::string name);

void sample1()
{
    std::string name = std::move(get_name());
}

void sample2()
{
    report_name(std::move(get_name()));
}

In these two sample functions, we take the return value from get_name and explicitly std::move it into a new local variable or into a function parameter. This is another case of trying to be helpful and ending up hurting.

Constructing a value (either a local variable or a function parameter) from a matching value of the same type will be elided: The matching value is stored directly into the local variable or parameter without a copy. But adding a std::move prevents this optimization from occurring, and the value will instead be move-constructed.

extern void report_name(std::string name);

void sample1()
{
    std::string name = get_name();
}

void sample2()
{
    report_name(get_name());
}

What’s particularly exciting is when you combine both mistakes. In that case, you took what would have been a sequence that had no copy or move operations at all and converted it into a sequence that creates two extra temporaries, two extra move operations, and two extra destructions.

#include <memory>

struct S
{
    S();
    S(S const&);
    S(S &&);
    ~S();
};

extern void consume(S s);

// Bad version
S __declspec(noinline) f1()
{
    S s;
    return std::move(s);
}

void g1()
{
    consume(std::move(f1()));
}

Here’s the compiler output for msvc:

; on entry, rcx says where to put the return value
f1:
    mov     qword ptr [rsp+8], rcx
    push    rbx
    sub     rsp, 48
    mov     rbx, rcx

    ; construct local variable s on stack
    lea     rcx, qword ptr [rsp+64]
    call    S::S()

    ; copy local variable to return value
    lea     rdx, qword ptr [rsp+64]
    mov     rcx, rbx
    call    S::S(S &&)

    ; destruct the local variable s
    lea     rcx, qword ptr [rsp+64]
    call    S::~S()

    ; return the result
    mov     rax, rbx
    add     rsp, 48
    pop     rbx
    ret

g1:
    sub     rsp, 40

    ; call f1 and store into temporary variable
    lea     rcx, qword ptr [rsp+56]
    call    f1()

    ; copy temporary to outbound parameter
    mov     rdx, rax
    lea     rcx, qword ptr [rsp+48]
    call    S::S(S &&)

    ; call consume with the outbound parameter
    mov     rcx, rax
    call    consume(S)

    ; clean up the temporary
    lea     rcx, qword ptr [rsp+56]
    call    S::~S()

    ; return
    add     rsp, 40
    ret

Notice that calling g1 resulted in the creation of a total of two extra copies of S, one in f1 and another to hold the return value of f1.

By comparison, if we use copy elision:

// Good version
S __declspec(noinline) f2()
{
    S s;
    return s;
}

void g2()
{
    consume(f2());
}

then the msvc code generation is

; on entry, rcx says where to put the return value
f2:
    push    rbx
    sub     rsp, 48
    mov     rbx, rcx

    ; construct directly into return value (still in rcx)
    call    S::S()

    ; and return it
    mov     rax, rbx
    add     rsp, 48
    pop     rbx
    ret

g2:
    sub     rsp, 40

    ; put return value of f1 directly into outbound parameter
    lea     rcx, qword ptr [rsp+48]
    call    f2()

    ; call consume with the outbound parameter
    mov     rcx, eax
    call    consume(S)

    ; return
    add     rsp, 40
    ret

You get similar results with gcc, clang, and icc icx.

In gcc, clang, and icx, you can enable the pessimizing-move warning to tell you when you make these mistakes.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Discussion is closed. Login to edit/delete existing comments.

  • Jonathan Duncan · Edited

    Is there some side-effect or other reason I can’t see why the return std::move(name); case isn’t possible to elide? Or is this just a case of the standards missing an opportunity and compilers being bound to obey the standards?

    • Kevin Norris · Edited

      The standard requires elision if the return value is a prvalue ("pure rvalue"), but std::move(foo) is an xvalue ("expiring lvalue"). The standard allows elision in the NVRO case described by Raymond, as well as a few oddball scenarios involving exceptions and coroutines. In all other cases, elision is only permitted under the as-if rule, which requires the compiler to prove that there is no difference in observable behavior - in practice, this means it has...

      Read more
  • José Silva

    Raymond, now we now that you are six months ahead. So, maybe you can give us a glimpse about the future? Pleeease. 🙂

    • Simon Farnsworth

      Note, though, that unlike the C++17 and later NRVO, Rust's one is entirely optional. In C++17 and later, when a set of conditions are met - Prvalue semantics ("guaranteed copy elision") - the compiler is required to perform NRVO.

      In Rust, NRVO is entirely optional; indeed, the optimization you linked has been disabled due to https://github.com/rust-lang/rust/issues/111005 in rustc versions up to 1.74, and has not been re-enabled in nightly yet, since it was unsound in some...

      Read more
  • Kevin Norris

    What I'm getting from this: std::move is a cast. It should be treated with exactly the same suspicion as std::static_cast. Only use it if you can articulate what the cast is formally doing (in particular: Why do you expect the compiler to treat the value differently when it is cast to rvalue reference?), why the use case is not covered by copy elision (which includes, but is not limited to, NVRO), and what will become...

    Read more
  • 紅樓鍮 · Edited

    It gets even spicier when you combine abuse with abuse: whereas
    <code>
    will just needlessly create a reference that will inhibit NRVO,
    <code>
    will actually create a dangling reference, because C++ will not extend the lifetime of a temporary if there's a function call between it and the local reference you're declaring.

    The punchline, though, is that this will again create a valid reference:
    <code>
    You may already know that is just implemented...

    Read more
    • Neil Rashbrook

      Assuming I did it correctly, you don’t get the optimisation if you move or cast the value.

      • 紅樓鍮 · Edited

        z/qhhj18rKT

        Interestingly it seems like MSVC can optimize out the move with return static_cast<T &&>(...); but neither GCC nor Clang can. And while both GCC and Clang issue a warning about return std::move(...);, they don’t for return static_cast<T &&>(...);.

        (godbolt.org can’t execute MSVC’s output, but you can see it in the disassembly.)