January 1st, 2026
intriguingmind blown2 reactions

Swapping two blocks of memory that reside inside a larger block, in constant memory

Suppose you have a large block of memory, and you want to swap two blocks that reside inside that large block. For concreteness, say you have a block of memory of the form A1, A2, B1, B2, C1, C2, D1, D2, D3, E1, and you want to swap the B’s and D’s, producing A1, A2, D1, D2, D3, C1, C2, B1, B2, E1.

A1 A2 B1 B2 C1 C2 D1 D2 D3 E1
     
A1 A2 D1 D2 D3 C1 C2 B1 B2 E1

Like, say, you have a double-null-terminated string and you want to swap two strings in it. Now, of course, the easy way would be just to allocate a second buffer equal in size to the original buffer and copy the strings to the second buffer in the desired order. But just as an exercise, can you do this without allocating additional memory?

Here’s a hint: There is an algorithm to swap two adjacent buffers using no additional space. In C++, this algorithm is implemented by the std::rotate method. For example, if you have a block of memory of the form X1, X2, Y1, Y2, Y3, then you can call std::rotate(v.begin(), v.begin() + 2, v.end()) to get Y1, Y2, Y3, X1, X2.

X1 X2 Y1 Y2 Y3
Y1 Y2 Y3 X1 X2

The function is called rotate because you can view it as taking the combined buffer and doing a circular rotation until Y1 is the new first element.

You can perform the desired exchange of non-adjacent blocks by performing three rotations. Starting with A1, A2, B1, B2, C1, C2, D1, D2, D3, E1, your first rotation exchanges the B’s with the C’s: your second rotation exchanges the B’s with the D’s, and your final rotation exchanges the C’s with the D’s.

A1 A2 B1 B2 C1 C2 D1 D2 D3 E1
   
A1 A2 C1 C2 B1 B2 D1 D2 D3 E1
   
A1 A2 C1 C2 D1 D2 D3 B1 B2 E1
   
A1 A2 D1 D2 D3 C1 C2 B1 B2 E1

(There are other ways to accomplish this with three rotations.)

But you can do better.

The way rotation works is by reversing the two buffers separately, and then reversing the combined buffer. Starting with X1, X2, Y1, Y2, Y3, we reverse the X’s (producing X2, X1) and reverse the Y’s (producing Y3, Y2, Y1), and then reverse the whole thing, giving a final result of Y1, Y2, Y3, X1, X2.

X1 X2 Y1 Y2 Y3
X2 X1 Y3 Y2 Y1
Y1 Y2 Y3 X1 X2

The cost of reversing n elements is n/2 swaps, so the cost of a rotation of n elements is n swaps. (You reverse the first part and the second part, which together add up to n/2 swaps; and then you reverse the whole thing, which is another n/2 swaps, for a total of n.)

Therefore, our three-rotation version costs a total of 2n swaps, where n is the combined size of the two blocks being swapped as well as the block that is sandwiched between them.

We can reduce the cost to n swaps by applying the reversal trick directly: Reverse the two blocks, reverse the block that is sandwiched between them, and then reverse the combo block.

A1 A2 B1 B2 C1 C2 D1 D2 D3 E1
A1 A2 B2 B1 C2 C1 D3 D2 D1 E1
A1 A2 D1 D2 D3 C1 C2 B1 B2 E1
template<typename BiDirIt>
void swap_discontiguous(BiDirIt first1, BiDirIt last1,
                        BiDirIt first2, BiDirIt last2)
{
    std::reverse(first1, last1);
    std::reverse(last1, first2);
    std::reverse(first2, last2);
    std::reverse(first1, last2);
}

But wait. If you look at the specification for std::rotate, you see that it requires only a forward iterator, not a bidirectional iterator. Yet std::reverse requires a bidirectional iterator. So how does std::rotate operate if it is given only forward iterators? We’ll look at that next time.

Bonus chatter: clang’s libcxx and gcc’s libstdc++ contain specializations of std::rotate for random-access iterators which perform only n/2 swaps, moving each element directly to its final destination by breaking the element movements into disjoint cycles and then moving the elements within each cycle. They also have special-cases for rotating zero elements (nop), as well as rotating a single element and swapping two equal-sized blocks (which reduces the number of swaps to n/2). I guess these cases happen often enough for the extra code to be worth it.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

9 comments

Sort by :
  • Brent Lewis 5 days ago

    Am I misunderstanding something, or is this article unnecessarily complicating (or inversely, oversimplifying) things? Read the first item, put it in its new place (modulo addition in indexes.) Put the item that was originally there in it’s new place. Repeat. Cache locality is poor, but it works. You can fix the cache locality in O(|n – m|) space, which is maybe the oversimplifying part. Do we care about cash locality in this article? We must, if we’re starting with reverse as a building block.

    • Raymond ChenMicrosoft employee Author

      It’s not obvious how to break the permutation into cycles and make sure you visit each cycle exactly once. (I noted in the Bonus Chatter that libcxx and libstdc++ both do this extra work.)

  • Neil Rashbrook 7 days ago

    Am I missing something or can you do it in two std::rotates?

    A1, A2, B1, B2, C1, C2, D1, D2, D3, E1

    Rotate the block from B1 to D3 so that D1 is now at the start:

    A1, A2, D1, D2, D3, B1, B2, C1, C2, E1

    Rotate the block from B1 to C2 so that C1 is now at the start:

    A1, A2, D1, D2, D3, C1, C2, B1, B2, E1

    Job done?

  • Kris G

    if we disregard the ‘no additional space’ constraint and allow a constant small amount of space (1 byte/word/etc), you simply loop through swapping 1 byte/word/etc at a time.
    In that way, this problem setup seems very contrived.

    • 許恩嘉

      The two blocks may be different in size.

    • Raymond ChenMicrosoft employee Author 7 days ago

      Not sure what you mean by “loop through swapping 1 byte/word/etc at a time.” The hard part is deciding which elements to swap. Do you start by swapping B1 and D1? That puts D1 in the right spot, but B1 is in the wrong place. (It needs to go where D2 is.)

  • Adam Rosenfield 1 week ago · Edited

    Maybe I’m spoiling tomorrow post, but my first thought was this:

    * If len(B) == len(D), directly swap the B’s and D’s, and you’re done
    * If len(B) < len(D), swap the B’s with the first len(B) of the D’s, and then reiterate on C and the remaining D’s
    * If len(B) > len(D), swap the D’s with the first len(D) of the B’s, and then reiterate on C and the remaining B’s

    This runs like the Euclidean GCD algorithm, in terms of how it reduces the lengths of each subarray at each step.

  • GL 1 week ago

    Bonus bonus chatter: If the disjoint swap is using random-access iterators and the two intervals are of the same length, the algorithm should just do swaps between the corresponding elements, so that the complexity doesn’t grow with how far the two intervals are away from each other. Reductions to rotate/reverse won’t be as efficient.