January 1st, 2026

2 reactions

Swapping two blocks of memory that reside inside a larger block, in constant memory

Raymond Chen

Suppose you have a large block of memory, and you want to swap two blocks that reside inside that large block. For concreteness, say you have a block of memory of the form A1, A2, B1, B2, C1, C2, D1, D2, D3, E1, and you want to swap the B’s and D’s, producing A1, A2, D1, D2, D3, C1, C2, B1, B2, E1.

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1


					⤩


A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

Like, say, you have a double-null-terminated string and you want to swap two strings in it. Now, of course, the easy way would be just to allocate a second buffer equal in size to the original buffer and copy the strings to the second buffer in the desired order. But just as an exercise, can you do this without allocating additional memory?

Here’s a hint: There is an algorithm to swap two adjacent buffers using no additional space. In C++, this algorithm is implemented by the std::rotate method. For example, if you have a block of memory of the form X1, X2, Y1, Y2, Y3, then you can call std::rotate(v.begin(), v.begin() + 2, v.end()) to get Y1, Y2, Y3, X1, X2.

X1	X2	Y1	Y2	Y3


⤩


Y1	Y2	Y3	X1	X2

The function is called rotate because you can view it as taking the combined buffer and doing a circular rotation until Y1 is the new first element.

You can perform the desired exchange of non-adjacent blocks by performing three rotations. Starting with A1, A2, B1, B2, C1, C2, D1, D2, D3, E1, your first rotation exchanges the B’s with the C’s: your second rotation exchanges the B’s with the D’s, and your final rotation exchanges the C’s with the D’s.

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1


			⤩


A1	A2	C1	C2	B1	B2	D1	D2	D3	E1


				⤩


A1	A2	C1	C2	D1	D2	D3	B1	B2	E1


		⤩


A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

(There are other ways to accomplish this with three rotations.)

But you can do better.

The way rotation works is by reversing the two buffers separately, and then reversing the combined buffer. Starting with X1, X2, Y1, Y2, Y3, we reverse the X’s (producing X2, X1) and reverse the Y’s (producing Y3, Y2, Y1), and then reverse the whole thing, giving a final result of Y1, Y2, Y3, X1, X2.

X1	X2	Y1	Y2	Y3

⇆		⇆

X2	X1	Y3	Y2	Y1

⇆

Y1	Y2	Y3	X1	X2

The cost of reversing n elements is n/2 swaps, so the cost of a rotation of n elements is n swaps. (You reverse the first part and the second part, which together add up to n/2 swaps; and then you reverse the whole thing, which is another n/2 swaps, for a total of n.)

Therefore, our three-rotation version costs a total of 2n swaps, where n is the combined size of the two blocks being swapped as well as the block that is sandwiched between them.

We can reduce the cost to n swaps by applying the reversal trick directly: Reverse the two blocks, reverse the block that is sandwiched between them, and then reverse the combo block.

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1

		⇄		⇄		⇄

A1	A2	B2	B1	C2	C1	D3	D2	D1	E1

		⇄

A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

template<typename BiDirIt>
void swap_discontiguous(BiDirIt first1, BiDirIt last1,
                        BiDirIt first2, BiDirIt last2)
{
    std::reverse(first1, last1);
    std::reverse(last1, first2);
    std::reverse(first2, last2);
    std::reverse(first1, last2);
}

But wait. If you look at the specification for std::rotate, you see that it requires only a forward iterator, not a bidirectional iterator. Yet std::reverse requires a bidirectional iterator. So how does std::rotate operate if it is given only forward iterators? We’ll look at that next time.

Bonus chatter: clang’s libcxx and gcc’s libstdc++ contain specializations of std::rotate for random-access iterators which perform only n/2 swaps, moving each element directly to its final destination by breaking the element movements into disjoint cycles and then moving the elements within each cycle. They also have special-cases for rotating zero elements (nop), as well as rotating a single element and swapping two equal-sized blocks (which reduces the number of swaps to n/2). I guess these cases happen often enough for the extra code to be worth it.

Topics

Code

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Discussion is closed. Login to edit/delete existing comments.

Brent Lewis January 4, 2026

Am I misunderstanding something, or is this article unnecessarily complicating (or inversely, oversimplifying) things? Read the first item, put it in its new place (modulo addition in indexes.) Put the item that was originally there in it’s new place. Repeat. Cache locality is poor, but it works. You can fix the cache locality in O(|n – m|) space, which is maybe the oversimplifying part. Do we care about cash locality in this article? We must, if we’re starting with reverse as a building block.
- Raymond Chen Author January 4, 2026
  
  It’s not obvious how to break the permutation into cycles and make sure you visit each cycle exactly once. (I noted in the Bonus Chatter that libcxx and libstdc++ both do this extra work.)
  - Neil Hulbert January 13, 2026
    
    I was thinking about this, and I believe you can prove with a bit of case work that the number of cycles in the permutation for this non-adjacent block swapping case is GCD(|B| + |C|, |D| − |B|):
    
    Assume without loss of generality that |D| >= |C|. Consider the action of the permutation on the indices of elements belonging to B or C. If you apply the permutation twice to the index of an element in B, the net effect is to shift it forward by |D| - |B|. If you apply the permutation once to the index of an element...
    Read more
    I was thinking about this, and I believe you can prove with a bit of case work that the number of cycles in the permutation for this non-adjacent block swapping case is GCD(|B| + |C|, |D| − |B|):
    
    Assume without loss of generality that |D| >= |C|. Consider the action of the permutation on the indices of elements belonging to B or C. If you apply the permutation twice to the index of an element in B, the net effect is to shift it forward by |D| – |B|. If you apply the permutation once to the index of an element in C, this likewise shifts it forward by |D| – |B|. If after either of these steps the resulting index lies in D, repeated applications of the permutation would shift it backward by |B| + |C| until the index ends up back in B or C. So using repeated applications of the permutation, any element indexed by i in B + C can be sent to index (i + |D| – |B|) mod (|B| + |C|). This guarantees no more than GCD(|B| + |C|, |D| – |B|) cycles.
    
    Now consider all possible index shifts produced by an application of the permutation: |D| + |C| for indices in B, |D| – |B| for indices in C, and -|B| – |C| for indices in D. The GCD of all these is GCD(|D| + |C|, |D| – |B|, |B| + |C|) which can be simplified to GCD(|B| + |C|, |D| – |B|). Since every displacement is a multiple of this gcd, indices in different residue classes mod this gcd cannot lie in the same cycle. So the permutation has at least GCD(|B| + |C|, |D| – |B|) cycles.
    
    This fixes the number of cycles at exactly GCD(length(B) + length(C), length(D) – length(B)). And since the first [number of cycles] elements starting at B must be in different residue classes mod the number of cycles, you could follow the cycle starting/ending at each of these to perform the swap.
    
    Read less
Neil Rashbrook January 2, 2026
Am I missing something or can you do it in two std::rotates?
```
A1, A2, B1, B2, C1, C2, D1, D2, D3, E1
```
Rotate the block from B1 to D3 so that D1 is now at the start:
```
A1, A2, D1, D2, D3, B1, B2, C1, C2, E1
```
Rotate the block from B1 to C2 so that C1 is now at the start:
```
A1, A2, D1, D2, D3, C1, C2, B1, B2, E1
```
Job done?
- Raymond Chen Author January 2, 2026
  
  I don’t think you’re missing anything. I think I’m the one who missed something.
Kris G January 1, 2026

if we disregard the ‘no additional space’ constraint and allow a constant small amount of space (1 byte/word/etc), you simply loop through swapping 1 byte/word/etc at a time.
In that way, this problem setup seems very contrived.
- 許恩嘉 January 3, 2026
  
  The two blocks may be different in size.
- Raymond Chen Author January 2, 2026
  
  Not sure what you mean by “loop through swapping 1 byte/word/etc at a time.” The hard part is deciding which elements to swap. Do you start by swapping B1 and D1? That puts D1 in the right spot, but B1 is in the wrong place. (It needs to go where D2 is.)
Adam Rosenfield January 1, 2026 · Edited

Maybe I’m spoiling tomorrow post, but my first thought was this:

* If len(B) == len(D), directly swap the B’s and D’s, and you’re done
* If len(B) < len(D), swap the B’s with the first len(B) of the D’s, and then reiterate on C and the remaining D’s
* If len(B) > len(D), swap the D’s with the first len(D) of the B’s, and then reiterate on C and the remaining B’s

This runs like the Euclidean GCD algorithm, in terms of how it reduces the lengths of each subarray at each step.

Stay informed

Get notified when new posts are published.

Email *

Country/Region *

I would like to receive the The Old New Thing Newsletter. Privacy Statement.

Follow this blog

Swapping two blocks of memory that reside inside a larger block, in constant memory

Category

Topics

Author

10 comments

Read next

How can you swap two adjacent blocks of memory using only forward iterators?

How can you swap two non-adjacent blocks of memory using only forward iterators?

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1


			⤩


A1	A2	C1	C2	B1	B2	D1	D2	D3	E1


				⤩


A1	A2	C1	C2	D1	D2	D3	B1	B2	E1


		⤩


A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1

		⇄		⇄		⇄

A1	A2	B2	B1	C2	C1	D3	D2	D1	E1

		⇄

A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1


			⤩


A1	A2	C1	C2	B1	B2	D1	D2	D3	E1


				⤩


A1	A2	C1	C2	D1	D2	D3	B1	B2	E1


		⤩


A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1

		⇄		⇄		⇄

A1	A2	B2	B1	C2	C1	D3	D2	D1	E1

		⇄

A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

Category

Topics

Share

Author

10 comments

Read next

How can you swap two adjacent blocks of memory using only forward iterators?

How can you swap two non-adjacent blocks of memory using only forward iterators?

Stay informed

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1


			⤩


A1	A2	C1	C2	B1	B2	D1	D2	D3	E1


				⤩


A1	A2	C1	C2	D1	D2	D3	B1	B2	E1


		⤩


A1	A2	D1	D2	D3	C1	C2	B1	B2	E1

A1	A2	B1	B2	C1	C2	D1	D2	D3	E1

		⇄		⇄		⇄

A1	A2	B2	B1	C2	C1	D3	D2	D1	E1

		⇄

A1	A2	D1	D2	D3	C1	C2	B1	B2	E1