August 19th, 2024

Constructing nodes of a hand-made linked list, how hard can it be?

Suppose you are writing your own circular doubly-linked list structure.

struct node
{
    node* prev;
    node* next;
};

A natural choice for the default constructor is to make the node the sole element of a circular doubly-linked list.

struct node
{
    node* prev = this;
    node* next = this;
};

What if you also want to add a node after an existing node? Well, we could add a constructor for that.

struct node
{
    node* prev = this;
    node* next = this;

    node() = default;

    // Construct a node after a specific node
    node(node* other) :                      
        prev(other),                         
        next(other->next)                    
    {                                        
        prev->next = this;                   
        next->prev = this;                   
    }                                        
};

(Note that the “construct after another node” constructor takes the other node by pointer, rather than by reference, so that it won’t be mistaken for a copy constructor.)

But maybe you also want to have a “before” constructor that inserts the new node before an existing node.

struct node
{
    node* prev = this;
    node* next = this;

    node() = default;

    // Construct a node after a specific node
    node(node* other) :
        prev(other),
        next(other->next)
    {
        prev->next = this;
        next->prev = this;
    }

    // Construct a node before a specific node
    node(node* other) :                       
        prev(other->prev),                    
        next(other)                           
    {                                         
        prev->next = this;                    
        next->prev = this;                    
    }                                         
};

Uh-oh, we have two constructors with the same parameter list. That’s not going to work.

And really, even if we had only one of the constructors, it would still be a bad design because it’s not clear whether the newly-created item goes before or after.

A traditional solution to this is to use a factory method, so that the method name tells you whether it’s going before or after.

struct node
{
    node* prev = this;
    node* next = this;

    node() = default;

    static node create_after(node* other) 
    {                                     
        node n{ other->prev, other };     
        n.prev->next = &n;                
        n.next->prev = &n;                
        return n;                         
    }                                     
                                          
    static node create_before(node* other)
    {                                     
        node n{ other, other->next };     
        n.prev->next = &n;                
        n.next->prev = &n;                
        return n;                         
    }                                     
};

// usage
node a;
node b = node::create_after(&a); // a->b
node c = node::create_before(&b); // a->c->b

Unfortunately, this doesn’t work, or at least isn’t guaranteed to work, because copy elision is not guaranteed for named return values.

I tried to think of ways to get guaranteed copy elision while simultaneously having access to the address of the object, but the only way I could come up with was to overload the constructor, say by using a tag type.

struct place_after_t {};                       
inline constexpr place_after_t place_after{};  
                                               
struct place_before_t {};                      
inline constexpr place_before_t place_before{};

struct node
{
    node* prev = this;
    node* next = this;

    node() = default;

    // Construct a node after a specific node
    node(place_after_t, node* other) :
        prev(other),
        next(other->next)
    {
        prev->next = this;
        next->prev = this;
    }

    // Construct a node before a specific node
    node(place_before_t, node* other) :
        prev(other->prev),
        next(other)
    {
        prev->next = this;
        next->prev = this;
    }

    static node create_after(node* other) 
    {                                     
        return { place_after, other };    
    }                                     
                                          
    static node create_before(node* other)
    {                                     
        return { place_before, other };   
    }                                     
};

// Sample usage 1: Using tags
node a;
node b(place_after, &a); // list is a->b
node c(place_before, &b); // list is  a->c->b

// Sample usage 2: Using factories
node a;
node b = node::create_after(&a); // list is a->b
node c = node::create_before(&b); // list is  a->c->b

Or maybe we are looking at it the wrong way, and really the functionality we want are create_after and create_before instance methods.

struct node
{
    node* prev = this;
    node* next = this;

    node() = default;

    node create_after() { return { place_after, this }; }
    node create_before() { return { place_before, this }; }

private:
    struct place_after_t {};
    inline constexpr place_after_t place_after{};

    struct place_before_t {};
    inline constexpr place_before_t place_before{};

    node(place_after_t, node* other) :
        prev(other),
        next(other->next)
    {
        prev->next = this;
        next->prev = this;
    }

    node(place_before_t, node* other) :
        prev(other->prev),
        next(other)
    {
        prev->next = this;
        next->prev = this;
    }
};

// Sample usage
node a;
node b = a.create_after(); // list is a->b
node c = b.create_before(); // list is a->c->b

The moral of the story, I think, is that if you want to force copy elision of an object whose address must be known before it is returned, you have to do it from the constructor, because that’s the only time guaranteed copy elision will give you the object’s address.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

11 comments

Discussion is closed. Login to edit/delete existing comments.

Newest
Newest
Popular
Oldest
  • GL · Edited

    It took me a while before realizing what is supposed to happen. The first set of factory methods are functionally incorrect — the C++ specification defines them to alter the list by inserting pointers invalid upon return, because the object `n` of automatic storage duration has reached its end of lifetime, and since the defined effect is to insert invalid pointers, it is incorrect to do “NRVO” and construct directly into the final destination (i.e., the standard requires non-desired behavior). The overloaded constructors are correct, but only if you’re using a version of C++ with prvalue semantics — if I’m reading correctly, it guarantees that “this” in the constructor is the final destination. The member functions appear weird, because if you want to make a new node with `new`, then you have to say `new node(existing.create_after())`.

    Relying on prvalue semantics (“guaranteed copy elision”) for object identity (thanks to Uli Gerhardt for the phrase) seems to be obfuscating the code, and I would avoid that.

    • Raymond ChenMicrosoft employee Author 2 weeks ago

      I don’t see a clause in the C++ specification that prohibits NRVO if the object contains pointers. It seems to be explicitly permitted by [class.copy.elision] 1.1: “in a return statement in a function with a class return type, when the expression is the name of a non-volatile object with automatic storage duration (other than a function parameter or a variable introduced by the exception-declaration of a handler ([except.handle])) with the same type (ignoring cv-qualification) as the function return type, the copy/move operation can be omitted by constructing the object directly into the function call’s return object.” The exceptions are for function parameters or caught exceptions. Now, you can’t rely on NRVO (which is the source of the problem), but NRVO does not appear to be prohibited here, and in fact all three major compilers do it.

  • Alessandro Morelli · Edited

    Would it be enough to make node non-copyable and non-moveable to ensure NRVO?

  • 紅樓鍮

    Wouldn’t node need non-trivial move constructor and move assignment operator that properly update external pointers to self? Will the lack of guaranteed NRVO still pose a correctness issue if they’re added?

  • Rutger Ellen

    to be completely fair I at first did not spot the bug because my head converted these objects to pointers. I know this is for example purposes but creating an object on the stack and then adding it to a linked list that is implemented using pointers (duh) is such a strange thing for me that my mental parser missed this to begin with.

    • Raymond ChenMicrosoft employee Author

      They don’t have to be created on the stack. You can do new node(...) to create a node and attach it to an existing linked list.

      • anonymous

        this comment has been deleted.

  • Jyrki Vesterinen

    I don’t see copy elision as all that important. It’s just a performance optimization.

    • Uli Gerhardt

      Not in this case. It’s about the identity of the returned object because it’s address is stored.

      • Stuart Ballard

        Thank you for pointing (no pun intended) this out! It’s a really key point that’s easy to miss from the way the article is written.

        Seems like this might be a case for separating the allocation from the initialization – have just a single constructor, that always assigns the pointers to null or to this, and then use instance methods to connect it up, like

        void insert_before(node* other)
        {
           prev = other->prev;
           next = other;
           other->prev = this;
           prev->next = this;
        }

        Or be a bit cleverer and make it connect entire lists together:

        void insert_before(node* other)
        {
           other->prev->next = next;
           next->prev = other->prev;
           other->prev= this;
           next = other;
        }

Feedback