Rotation revisited: Another unidirectional algorithm

Some time ago, we looked at the problem of swapping two blocks of memory that reside inside a larger block, in constant memory, and along the way, we learned about std::rotate which swaps two adjacent blocks of memory (not necessarily the same size).

I noted in a postscript that clang’s libcxx and gcc’s libstdc++ contain specializations of std::rotate for random-access iterators that view the operation as a permutation and decomposes the permutation into cycles.

I was mistaken.

The implementation in gcc’s libstdc++ has special cases for single-element rotations, but in the general case, it uses a different algorithm.

Let’s call the blocks of memory to be exchanged A and B, where A is made up of elements A1, A2, A3, and so on; and block B has elements B1, B2, B3, and so on. Without loss of generality, suppose the A block is smaller. (If not, we can just mirror the algorithm.) And for concreteness let’s say that the elements are A1, A2, A3, B1, B2, B3, B4, B5.

A1	A2	A3	B1	B2	B3	B4	B5
↑			↑					↑
first			mid					last

Exchange elements at first and mid, then move both iterators forward. After the first step, we have this:

B1	A2	A3	A1	B2	B3	B4	B5
	↑			↑				↑
	first			mid				last

After three steps, we have moved all of the A’s out and replaced them with an equal number of B’s.

B1	B2	B3	A1	A2	A3	B4	B5
			↑			↑		↑
			first			mid		last

But don’t stop. Keep on going until mid reaches last.

B1	B2	B3	B4	B5	A3	A1	A2
					↑			↑
					first				mid last

All of the B’s have been swapped to their final positions, but the A’s are jumbled.

But you can predict the exact nature of the jumbling. The A block is in two chunks. If we let n be the total number of elements |A| + |B| and a be the number of elements in A, then the first chunk has the final n % a elements, and the second chunk has the initial a − (n % a) elements.

Therefore, we can recursively rotate the two pieces of the A block to finish the job. Move mid to first + (n % a) and restart the algorithm.

This algorithm performs n − 1 swaps. You can calculate this inductively by observing that we perform |B| swaps, and then recursively rotate |A|. Or you can calculate this directly by observing that each swap moves one element to its final position, except that the final swap moves two elements to their final position.

The locality of this algorithm fairly good. The first iterator moves steadily forward, and the mid iterator moves forward most of the time, with at most O(log (min(|A|, |B|)) backward resets.

Next time, we’ll make a shocking discovery about this algorithm.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

1 comment

Jacob Manaker June 3, 2026

You’ve just given an itemized version of Euclid’s Algorithm (for GCD computation). FWIW, the same ideas give a proof of Cantor-Bernstein-Schröder.

Rotation revisited: Another unidirectional algorithm

Category

Topics

Author

1 comment

Read next

Rotation revisited: A shocking discovery about gcc’s unidirectional rotation algorithm

Rotation revisited: Cycle decomposition in clang’s libcxx

Category

Topics

Share

Author

1 comment

Read next

Rotation revisited: A shocking discovery about gcc’s unidirectional rotation algorithm

Rotation revisited: Cycle decomposition in clang’s libcxx

Stay informed