Commenter Neil Rashbrook pointed out that my original rotation-based algorithm for swapping two blocks of memory inside a larger block did too much rotating. My solution had three rotations, but Neil was able to get it down to two.
The set-up is that we have a large block of memory, and you want to swap two blocks that reside inside that large block. For concreteness, let’s say that it’s A1, A2, B1, B2, C1, C2, D1, D2, D3, E1, and you want to exchange the B’s with the D’s. Neil pointed out that you can start by rotating the BCD block to move the D’s to the front, producing ADBCE; and then rotate the BC block to move the C’s to the front, producing ADCBE.
| A1 | A2 | B1 | B2 | C1 | C2 | D1 | D2 | D3 | E1 |
|  | ⤩ |  | |||||||
| A1 | A2 | D1 | D2 | D3 | B1 | B2 | C1 | C2 | E1 |
|  | ⤩ |  | |||||||
| A1 | A2 | D1 | D2 | D3 | C1 | C2 | B1 | B2 | E1 |
There is a symmetric version where you start by swapping the B’s to the end. If you choose to swap the larger block into position first, then the number of swaps is 2n − max(|B|,|D|), which is a improvement over my three-rotation version that performed 2n swaps.
(But still not as good as the n swaps that we developed later.)
I did wonder whether there was a difference between rotating the Ds to the start or the Bs to the end, but I had been thinking in terms of computing the rotation positions of the second rotation, not realising that it affects the number of swaps, so that’s nice to know!