Sometimes one have a memory block where we want to put the first `n`

bytes at the end of the block, rather than the beginning, without changing the blocks themselves, as in the figure below. `a`

, `b`

, and `e`

are pointers to the beginning of `A`

, the beginning of `B`

, and the end of `B`

.

```
a |=========| |=========|
| block A | want | block B |
b |---------| =====> | |
| | | |
| | | |
| block B | |---------|
| | | block A |
e |=========| |=========|
```

By using extra space, more specifically $O(n)$ space, this is trivial.

```
// external buffer to hold A
char *tmp = malloc(sizeof(A));
// copy A to the buffer
memmove(tmp, a, sizeof(A));
// move B up to the top
memmove(b, a, sizeof(B));
// insert A at the bottom
memmove(a + sizeof(B)), tmp, sizeof(A));
free(tmp);
```

Of course, `sizeof(A)`

will not work when `A`

is a pointer, but the meaning is still clear.
Additinally, we could use `memcpy`

instead of `memmove`

on the first and last call,
if we really cared about not copying the data too much around^{1}.

This is a fine solution; it works. But can we do better? Well, can we do it without the `malloc`

call?

`block_swap`

This algorithm is based on the simple idea that if we swap block `A`

to its final position,
we have swapped a block `B2`

, which is of the same size as `A`

, to the top.
We are then left with a similar but smaller problem, which is to swap `B2`

and `B1`

:

```
a |=========| |=========| |=========|
| block A | one pass | block B2| new problem | block B2|
b |---------| =========> |---------| ============> |---------|
| | | | | |
| block B1| | block B1| | block B1|
c |- - - - -| |---------| |=========|
| block B2| | block A | :(block A):
e |=========| |=========| ...........
```

With this we can sketch out the general idea of our alogrithm:

```
void one_by_one_swap(char *a, char *b, size_t n) {
for (size_t i = 0; i < n; i++) {
char tmp = a[i];
a[i] = b[i]:
b[i] = tmp;
}
}
void block_swap(char *a, char *b, char *e) {
char *c = e - sizeof(A);
one_by_one_swap(a, c, sizeof(A));
block_swap(a, b, c);
}
```

One assumption we have made so far is that the `B`

block is larger than the `A`

block.
In order to fix this, we could check which of the blocks is the larger, and swap around the logic,
such that we always swap the smaller ‘into’ the larger.
This allows us for a little optimization: if the blocks are of equal size, we can simply make one call to `one_by_one_swap`

. In the second and final listing, we have even added some argument checking.
This code is runable.

```
void block_swap(char *a, char *b, char *e) {
assert(a < b);
assert(b < e);
size_t a_size = b - a;
size_t b_size = e - b;
if (a_size < b_size) {
// The case we assumed above
char *c = e - a_size;
one_by_one_swap(a, c, a_size);
block_swap(a, b, c);
} else if (b_size < a_size) {
// The opposite case
// Now `c` is between `a` and `b`
char *c = a + b_size;
one_by_one_swap(a, b, b_size);
block_swap(c, b, e);
} else {
// The trivial case
one_by_one_swap(a, b, a_size);
}
}
```

We can still do one more thing, which is replacing the recursion with a loop. However, the function is tail-recursive, so this is a trivial transformation for the compiler. Additionally, the transformation isn’t very interesting, so we will keep this recursive definition.

What is the running time of this? We can figure this out pretty intuetively, by the observation that
each pass swaps a block of `a`

elements into their *correct* position. Then, these elements
are not touched again. This means that all elements are moved exacly once — no more, no less.
Hence, this is a linear algorithm, as one would expect from a memory moving algorithm.

What about the space complexity? Even though the function is recursive, it is tail recursive, as the last thing that happends in the code paths where recursion is used is the recursive call itself. The compiler transforms this into a simple loop, such that we keep the linear space complexity (which, after all, was our main motivation to do this).

Lastly, a final optimization that could be used is in the `one_by_one_swap`

. Instead of swapping one byte at a time, we could swap, say, eight bytes at a time while there are more than eight bytes left to swap, and swap the remaining bytes one by one.

We have shown a possible implementation of the problem of swapping two ajacent memory block, without using an auxiliary buffer.

- The GNU C library
^{2}implementation of memmove calls`memcpy`

if the memory blocks are not overlapping, so this would be a*minor*optimization.^{[return]} - https://www.gnu.org/software/libc/download.html
^{[return]}