11

In an interview, I was asked to write an implementation of strcpy and then fix it so that it properly handles overlapping strings. My implementation is below and it is very naive. How do I fix it so that:

  1. It detects overlapping strings and
  2. after detecting, how do we deal with the overlap and proceed?

char* my_strcpy(char *a, char *b) {

     if (a == NULL || b == NULL) {
         return NULL;
     }
     if (a > b) {
         //we have an overlap?
         return NULL;
     }
     char *n = a;

     while (*b != '\0') {
         *a = *b;
         a++;
         b++;
     }
     *a = '\0';
     return n;
}

int main(int argc, char *argv[])
{
    char str1[] = "wazzupdude";
    char *after_cpy = my_strcpy(str1 + 2, str1);
    return 0;
}

EDIT:

So one possible implementation based on @Secure's answer is:

char* my_strcpy(char *a, char *b) {

    if (a == NULL || b == NULL) {
        return NULL;
    }

    memmove(a, b, strlen(b) + 1);
    return a;
}

If we don't rely on memmove, then

char* my_strcpy(char *a, char *b) {

    if (a == NULL || b == NULL) {
        return NULL;
    }

    if (a == b) {
        return a;
    }

    // case1: b is placed further in the memory
    if ( a <= b && a + strlen(a) > b ) {
        char *n = a;

        while(*b != '\0') {
            *a = *b;
            a++; b++;
        }
        *a = '\0';
        return n;
    }

    // case 2: a is further in memory
    else if ( b <= a && b + strlen(b) > a ) { 
        char *src = b + strlen(b) - 1; // src points to end of b
        char *dest = a;

        while(src != b) {
            *dest = *src;
            dest--; src--;  // not sure about this..
        }
        *a = '\0';
        return a;
    }
}
Student
  • 769
  • 1
  • 6
  • 11
user7
  • 2,159
  • 5
  • 24
  • 29
  • 3
    How is `a > b` supposed to "detect an overlap"? It merely tests the two addresses. – Blagovest Buyukliev Sep 15 '11 at 08:08
  • You could do two copies: first copy to a local buffer, with no chance of overlap, then from the local buffer to the destination. – pmg Sep 15 '11 at 08:16
  • @pmg: you could, but then `my_strcpy` would have to be allowed to fail ENOMEM. – Steve Jessop Sep 15 '11 at 09:05
  • @Steve: right -- "There ain't no such thing as a free lunch"; although doing two copies is very far from a *free lunch* in the first place :-) – pmg Sep 15 '11 at 09:26
  • Regarding your edit, as an interviewer my very next question would be: Why would you not rely on memmove, and instead trade a one-liner against an unmaintainable pointer handling mess? – Secure Sep 15 '11 at 12:20
  • actually...i would first rely on memmove and if the interviewer asks for the gory details or insists i cannot use memmove i would give a detailed implementation...btw..is the implementation above correct? – user7 Sep 15 '11 at 12:27
  • Correct? I regard to portability, as said, you're invoking undefined behaviour when the two pointers are from different arrays. There's an obvious error, you're using `strlen(a)`, but that's easy to solve. `*dest = '\0';` sets the first byte of `a`, not the last byte. But most importantly, you've missed the case that the strings do *not* overlap... – Secure Sep 15 '11 at 12:52
  • @Secure: The case where they do not overlap would be implemented in the same way as case1. However,I am not able to visualize the second case(a is further in memory)..hence the coding errors.I have made some edits though but I will appreciate it if you can help me visualize case 2 – user7 Sep 15 '11 at 13:23
  • I have edited my answer with a visualization of case 2. – Secure Sep 15 '11 at 14:18
  • Note: `if(a == NULL || b == NULL){ return NULL; }` is not needed if `a`, `b` point to C strings. – chux - Reinstate Monica May 04 '15 at 20:51
  • Shouldn't the "char *dest = a;" to the end of the equivalent destination and count down from there, if you are counting down from current position, aren't you buffer under-flowing? – Anton Krug Jul 03 '19 at 17:47

9 Answers9

11

There is no portable way to detect this. You have to do pointer comparisons, and these are only defined within the same object. I.e. if the two strings do not overlap and are in fact different objects, then the pointer comparisons give you undefined behaviour.

I would let the standard library handle this, by using memmove(a, b, strlen(b) + 1).

EDIT:

As Steve Jessop pointed out in the comments, there actually is a portable but slow way to detect overlap in this case. Compare each address within b with the first and last address of a for equality. The equality comparison with == is always well defined.

So you have something like this:

l = strlen(b);
isoverlap = 0;
for (i = 0; i <= l; i++)
{
    if ((b + i == a) || (b + i == a + l))        
    {
        isoverlap = 1;
        break;
    }
}

EDIT 2: Visualization of case 2

You have something like the following array and pointers:

S t r i n g 0 _ _ _ _ _ _ _
^       ^
|       |
b       a

Note that b + strlen(b) results in a pointer to the terminating \0. Start one behind, else you need extra handling of edge cases. It is valid to set the pointers there, you just can't dereference them.

src = b + strlen(b) + 1;
dst = a + strlen(b) + 1;

S t r i n g 0 _ _ _ _ _ _ _
^       ^     ^       ^  
|       |     |       |
b       a     src     dst

Now the copy loop which copies the \0, too.

while (src > b)
{
    src--; dst--;
    *dst = *src;
}

The first step gives this:

src--; dst--;

S t r i n g 0 _ _ _ _ _ _ _
^       ^   ^       ^  
|       |   |       |
b       a   src     dst

*dst = *src;

S t r i n g 0 _ _ _ 0 _ _ _
^       ^   ^       ^  
|       |   |       |
b       a   src     dst

And so on, until src ends up equal to b:

S t r i S t r i n g 0 _ _ _
^       ^              
|       |            
b       a          
src     dst

If you want it a bit more hackish, you could compress it further, but I don't recommend this:

while (src > b)
    *(--dst) = *(--src);
Secure
  • 4,030
  • 1
  • 14
  • 15
  • 1
    It's not true that overlap can't be detected portably. There is a shockingly inefficent way. This is for memmove, but I believe it can be adapted for strcpy: http://stackoverflow.com/questions/4023320/how-to-implement-memmove-in-standard-c-without-an-intermediate-copy/4023563#4023563 – Steve Jessop Sep 15 '11 at 09:01
  • "You have to do pointer comparisons, and these are only defined within the same object." -How do I check if the two pointers point to the same object (an array in this case) – user7 Sep 15 '11 at 09:47
  • @Steve Jessop: Correct, but in this case it can't be detected portably, because you only have two pointers, but not the array boundaries (start address and size) they're pointing into. – Secure Sep 15 '11 at 09:57
  • @user639309: It must be given as a known precondition. Else you can't check for it without invoking undefined behaviour if they're not, except when the array boundaries are known as Steve pointed out. Then you can check each possible address inside one of the arrays to be equal to the other pointer. OTOH, if you know the boundaries for *both* arrays, you can simply check them for equality. An equality check of pointers with == is always well defined, the same-object rule applies only for less than or greater than checks. – Secure Sep 15 '11 at 10:02
  • @Secure: But as you've said yourself, `strlen(b)+1` gives you that size. It only goes wrong if the caller has done something invalid first, e.g. if `a` doesn't point to a big enough buffer, but that's not our fault. – Steve Jessop Sep 15 '11 at 10:03
  • @Steve Jessop: But the string can be in the middle of a much larger array. If `a` points to a position before `b` within this array, you can't check for this. – Secure Sep 15 '11 at 10:05
  • @Secure: yes you can. My code on that `memmove` question does. If `a` is before `b`, and the regions we care about for `strcpy` overlap, then for some `l < strlen(b)+1` you hit `a + l == b`. – Steve Jessop Sep 15 '11 at 10:06
  • @Steve: Forget my last comment, I understand what you mean. You're right when checking `a` and `a + strlen(b)` for equality within `b`. This will work fine, but slow. ;) – Secure Sep 15 '11 at 10:09
  • The strings overlap if `a + strlen(a) == b + strlen(b)`. – Carl Norum Aug 21 '13 at 14:56
  • "pointer comparisons are only defined within the same object" - so what? if string pointers do not belong to the same object, they can't overlap anyway, so the undefined behavior won't do any harm. which means, simple comparison is in fact portable! – szulat Nov 24 '14 at 23:21
  • @szulat: Comparisons of unrelated pointers do not merely yield an unspecified result. Some compilers may infer that because the Standard imposes no requirements on what should happen if unrelated pointers are compared, the Standard thus imposes no requirements on what a program should do if given inputs that would result in such pointers being compared. Such compilers could then further infer that if a condition could only be true when given inputs which would invoke Undefined Behavior, a compiler may assume the condition is always false. Consequently, there is no safe Undefined Behavior. – supercat Jul 09 '15 at 23:40
  • 1)src = b + strlen(b) + 1; dst = a + strlen(b) + 1; should be src = b + strlen(b);dst = a + strlen(b) ;2)it's wrong while b>a and a+strlen(b) > b. – John Zhang Dec 26 '16 at 08:35
  • @CarlNorum I know it's a late comment (almost 5 years to the day!) but (as I understand it) you can't safely use `strlen(a)` because you don't know how the memory at `a` is laid out. – TripeHound Aug 16 '18 at 11:47
  • @TripeHound - if you’re not passing strings, it’s game over, yes. That’s not really part of the problem, though. My point was just this: if the strings overlap, they necessarily share the same null terminator. So you have to make only one comparison, not many. – Carl Norum Aug 16 '18 at 14:13
  • @CarlNorum Technically, the _memory_ could overlap but the _strings_ don't. `a` (the destination) could be something like `abc\0def\0ghijklmnop\0` with `b` pointing at the `ghijklmnop` (you might get something like this from `strtok()`). The _strings_ at `a` (`abc`) and at `b` (`ghijklmnop`) don't overlap, but the memory that you would want `strcpy( a, b )` to copy (if this were allowed) _would_ overlap. – TripeHound Aug 16 '18 at 14:29
  • @Carl Not necessarily... what led me here was a recent compile under gcc (of some very old code) producing the "wrong" result for a "shift-left" copy. (Technically, the result can't be "wrong", because as we were [mis]using `strcpy`, the result is undefined, but wrong in the sense of "not what we expected and happened to work for years"). – TripeHound Aug 16 '18 at 14:36
4

You could probably use memmove() if you expect the strings to be overlapping.

char* my_strcpy(char *a, char *b)
{
    memmove(a, b, strlen(b) + 1);
    return a;
}
brain
  • 2,279
  • 10
  • 11
  • That's taking into account that one char == one byte. I'd change strlen(b) + 1 by ( strlen(b) + 1 ) * sizeof( char ) – Baltasarq Sep 15 '11 at 10:30
  • sizeof(char) is always exactly one byte. – brain Sep 15 '11 at 10:37
  • Yep, but memmove expects bytes, not chars, even if they incidentally have the same size. Anyway, I just said "I would". – Baltasarq Sep 15 '11 at 11:07
  • 2
    @Baltasarq "memmove expects bytes, not chars" is misleading. `memmove()` expects a size in characters and in C, "bytes" and character have the same size. "The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. C11 "7.24.2.2" – chux - Reinstate Monica May 04 '15 at 20:21
4

Note: Here, b is the address of the source string and a is the address of the destination.

With a > b you wouldn't necessarily have an overlap. If

(a <= b && a+strlen(a) >= b) || (b <= a && b+strlen(b) >= a)

then you have an overlap.

However, besides detecting overlap for the sake of interview, a > b should do fine for strcpy. The idea is this:

If b is placed further in the memory (b > a), then you can normally copy b into a. Parts of b will be overwritten, but you are already past that part.

If a is placed further in the memory (a > b), it means that possibly by writing on the first location of a, you have already overwritten a location in b with a higher index. In such a case, you should copy in the opposite direction. So instead of copy from index 0 to strlen(b)-1, you should copy from strlen(b)-1 to 0.

If you are confused how that helps, draw two overlapping arrays on paper and try to copy once from the beginning of the array and once from the end. Try this with the overlapping arrays both in cases a > b and a < b.

Note, if a == b, you don't need to actually copy anything and you can just return.

Edit: I am not sure, but reading the other solutions, it seems like this answer may not be completely portable. Beware of that.

Shahbaz
  • 42,684
  • 17
  • 103
  • 166
  • if `a==b`, you could even just return :-) `strcpy` takes pointers-to-non-volatile, so there's no requirement to actually touch the memory. That said, it's not worth adding code to optimize that absurd case. – Steve Jessop Sep 15 '11 at 09:10
  • @chux, did you take the terminating NUL into account? – Shahbaz May 05 '15 at 07:26
3
if a > b; then
    copy a from the beginning
else if a < b; then
    copy a from the ending
else // a == b
    do nothing

You can refer to an implementation of memmove, it's quite like what I said.

Mu Qiao
  • 6,363
  • 28
  • 34
1

Even without using relational pointer comparisons, memmove, or equivalent, it is possible to code a version of strcpy which will be performed as an strlen and memcpy in the non-overlapping case, and as a top-down copy in the overlapping case. The key is to exploit the fact that if the first byte of the destination is read and then replaced with zero, calling strlen on the source and adding to the source pointer the value that was returned will yield a legitimate pointer which will equal the start of the destination in the "troublesome overlap" case. If the source and destination are different objects, the "source plus strlen" pointer may be safely computed and observed to be unequal to the destination.

In the event that adding the string length to the source pointer yields the destination pointer, replacing the zero byte with the earlier-read value and calling strlen on the destination will allow code to determine the ending address of the source and destination strings. Further, the length of the source string will indicate the distance between the pointers. If this value is large (probably greater than 16 or so), code may efficiently subdivide the "move" operation into a top-down sequence of memcpy operations. Otherwise the string may be copied with a top-down loop of single-byte copy operations, or with a sequence of "memcpy to source to buffer"/"memcpy buffer to destination" operations [if the per-byte cost of a large memcpy is less than half that of an individual-character-copy loop, using a ~256-byte buffer may be a useful optimization].

supercat
  • 69,493
  • 7
  • 143
  • 184
1
if (a>= b && a <= b+strlen(b))) || (b+strlen(b) >= a && b+strlen(b) <= a + strlen(b))

(*) you should cache strlen(b) to improve performance

What it does:
checks if the a+len [address of a + extra len bytes] is inside the string, or a [address of a] is inside the string, these are the only possibilities for a string overlapping.

amit
  • 166,614
  • 24
  • 210
  • 314
1

I was asked this in a recent interview. We don't have to 'detect' overlap. We can write strcpy in such a way that overlapping addresses are taken care of. The key is to copy from the end of source string instead of from the start.

Here is a quick code.

void str_copy(const char *src, char *dst) 
{
    /* error checks */

    int i = strlen(a); /* may have to account for null character */

    while(i >= 0) 
    {
        dst[i] = src[i];  
        i--; 
    }
}

EDIT: This only works when a < b. For a > b, copy from the start.

Bhaskar
  • 2,191
  • 1
  • 18
  • 22
  • 2
    The problem persists in case of overlapping strings. Like `memcpy`, you should copy from beginning or the end, based on if the destination to copy has lower or higher address than the source respectively. – Shahbaz Sep 15 '11 at 08:25
  • 1) Code does not compile. 2) Suggest re-working answer/code to use `src dest` throughout rather than `a b`. 3) `strlen()` returns type `size_t`, but then `size_t i` causes trouble with `while(i>=0)` test which is always true. – chux - Reinstate Monica May 04 '15 at 20:48
1

If these two strings overlap, then, while copying you'll run over the original a or b pointers.

Assuming that strcpy( a, b ) roughly means a <- b, i.e., the first parameter is the destination of the copy, then you only check whether the copy pointer reaches b's position.

You only need to save the b original position, and while copying, check you haven't reached it. Also, don't write the trailing zero if you have reached that position.

 char* my_strcpy(char *a, const char *b)
 {

    if ( a == NULL
      || b == NULL )
    {
        return NULL;
    }

    char *n = a;
    const char * oldB = b;

    while( *b != '\0'
       &&  a != oldB )
    {
        *a = *b;
        a++;
        b++;
    }

    if ( a != oldB ) {
        *a = '\0';
    }

    return n;
 }

This algorithm just stops copying. Maybe you want to do something else, such as marking the error condition, or add an end-of-the string mark to the previous position (though failing silently (as the algorithm does at the moment) isn't the best option).

Hope this helps.

Baltasarq
  • 11,266
  • 2
  • 34
  • 53
0

This SO entry is already older, but I am currently working on a old piece of code that copies overlapping strings with strcpy(). Characters were missing in the log output. I decided to use the following compact solution, which copies char by char.

static char *overlapped_strcpy(char *dest, const char *src)
{
  char *dst = dest;

  if (dest == NULL || src == NULL || dest == src)
    return dest;

  do {
    *dst++ = *src;
  } while (*src++);

  return dest;
}

EDIT:

As @Gerhardh pointed out, the code above only works if dest <= src (I just had to solve this case). For the case dest > src it is more complicated. However, copying from behind, as other answers have already mentioned, leads to success. For example:

if (dest <= src) {
  /* do the above */
} else {
  int i = (int)strlen(src);
  while (i >= 0) {
    dst[i] = src[i];
    i--;
  }
}
Andreas
  • 1
  • 2