5

When writing a C++ application, I normally limit myself to C++ specific language features. Mostly this means using STL instead of CRT where ever possible.

To me, STL is just so much more fluid and maintainable than using CRT. Consider the following:

std::string str( "Hello" );
if( str == "Hello" ) { ... }

The C-Runtime equivalent would be:

char const* str = "Hello";
if( strcmp( str, "Hello" ) == 0 ) { ... }

Personally I find the former example much easier to look at. It's just more clear to me what's going on. When I write a first pass of my code, the first thing on my mine is always to write code in the most natural way.

One concern my team has with the former example is the dynamic allocation. If the string is static OR has already been allocated elsewhere, they argue it doesn't make sense to potentially cause fragmentation or have a wasteful allocation here. My argument against this is to write code in the most natural way first, and then go back and change it after getting proof that the code causes a problem.

Another reason I don't like the latter example is that it uses the C Library. Typically I avoid it at all costs simply because it's not C++, it's less readable, and more error prone and is more of a security risk.

So my question is, am I right to avoid it the C Runtime? Should I really care about the extra allocation at this step in coding? It's hard for me to tell if I'm right or wrong in this scenario.

jacknad
  • 12,453
  • 38
  • 115
  • 190
void.pointer
  • 21,280
  • 21
  • 106
  • 196
  • 1
    Technically speaking, the C library **is** C++. – K-ballo May 23 '12 at 17:10
  • You already give some pretty good and valid arguments yourself. – Christian.K May 23 '12 at 17:11
  • 1
    If you want an immutible compile time string, use `const char(&)[N]`, in all other cases use `std::string`. – Mooing Duck May 23 '12 at 17:13
  • 7
    To those of you voting to close, I'd say this is a reasonably good subjective question: http://blog.stackoverflow.com/2010/09/good-subjective-bad-subjective/ – derekerdmann May 23 '12 at 17:13
  • 2
    To avoid dynamic allocation, use a class like [llvm::StringRef](http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html), which has many of the conveniences of a proper string class, such as ease of comparison, but doesn't do any allocation. You just need to be careful that the referred string stays alive for the lifetime of the StringRef. – Benjamin Lindley May 23 '12 at 17:15
  • @Christian.K Problem is, my team isn't buying it. They're solely focused on the memory usage above all else. I just want to make sure I'm not being a C++ zealot – void.pointer May 23 '12 at 17:17
  • 5
    "am I right to avoid it the C Runtime?" That's an artificial restriction. If C runtime fits into current coding style, use it. If it doesn't fit, don't use it. **always** using it or **never** using it is a bad idea - your job is to select the most suitable tool for the task. – SigTerm May 23 '12 at 17:18
  • 1
    "my team isn't buying it" Your example might cause a bottleneck due to excessive memory allocation, operator overloads, etc - depending on compiler. Their example might create vulnerability. So both patterns can backfire and neither is perfect. However, code readability is more important than performance, and all performance-specific questions should be only solved using profiling. So they should provide profiling data to back up their position. It also depends on your platform. If there are strict memory restrictions, they might be right, however there are also allocators for this scenario. – SigTerm May 23 '12 at 17:23
  • @SigTerm: The relative value of readability and performance is not fixed. For example, if you have a piece of code whose behavior is well-defined, whose input space is highly finite, and for which you've tested all inputs, then the code *never needs to be touched or read again* except possibly to make it faster. Thus, it can be as ugly and unreadable as you like as long as it's fast. On the other hand, for code which will require lots of extension and integration with other components as the software evolves, readability is probably more valuable. – R.. GitHub STOP HELPING ICE May 23 '12 at 17:56
  • @R..: "For example, if you have a piece of" That's exactly what Murphy's law and "Don't fix it if it ain't broke" are for. – SigTerm May 23 '12 at 19:08
  • Are you interfacing your C++ code with some C code? Otherwise, why are you manipulating your variable strings using pointers to char in the first place? Regarding string literals, I usually reference them using a `const char * const`, which can be compared with an `std::string`. And I rarely need to compare two constant strings, so I neither have to convert a pointer to char to an `std::string` nor to use `strcmp`. – Luc Touraille May 24 '12 at 15:26
  • If you really need to compare two variable strings in the form of pointers to char, I would go with `strcmp`. If you need to compare a constant string with a variable string in the form of a pointer to char, I would store the constant string in a static constant `std::string`: `static const std::string hello("Hello");`. The constant string will be constructed only once in the application lifetime, which should not incur a boundless overhead. – Luc Touraille May 24 '12 at 15:31

5 Answers5

5

I feel like my comment about llvm::StringRef went ignored, so I'll make an answer out of it.

llvm::StringRef str("Hello");

This essentially sets a pointer, calls strlen, then sets another pointer. No allocation.

if (str == "Hello") { do_something(); }

Readable, and still no allocation. It also works with std::string.

std::string str("Hello");
llvm::StringRef stref(str);

You have to be careful with that though, because if the string is destroyed or re-allocated, the StringRef becomes invalid.

if (str == stref) { do_something(); }

I have noticed quite substantial performance benefits when using this class in appropriate places. It's a powerful tool, you just need to be careful with it. I find that it is most useful with string literals, since they are guaranteed to last for the lifetime of the program. Another cool feature is that you can get substrings without creating a new string.

As an aside, there is a proposal to add a class similar to this to the standard library.

Benjamin Lindley
  • 95,516
  • 8
  • 172
  • 256
  • The proposal is http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3334.html if anyone else is interested. I wasn't aware that was happening; it's good news. std:string has always been a bit of an eyesore. – mrec Jun 17 '13 at 15:09
4

Are you doing C++ or C? Those are completely different languages with completely different ways of thinking.

If C++:

std::string str( "Hello" );
if( str == "Hello" ) { ... }

If C:

char const* str = "Hello";
if( strcmp( str, "Hello" ) == 0 ) { ... }

Don't mix both.

user703016
  • 35,157
  • 7
  • 80
  • 106
  • Would your opinion be the same if this was done in an application with a long lifetime (server) that already has memory/fragmentation problems? – void.pointer May 23 '12 at 17:14
  • @RobertDailey Certainly. Memory fragmentation problems do not come from the style of your code. – user703016 May 23 '12 at 17:15
  • Why not mix this? Obviously `std::string("...")` produces far more code, and runs much slower than the "C-variant". So, in C++ when you have a compile-time constant string, why not use the "C-style" code? – valdo May 23 '12 at 17:16
  • 2
    @valdo Mixing styles is confusing. Pick one and stick to it. Your "obviously" is actually *very* far from being obvious. In C++ for a compile-time constant string, use `const char(&)[N]`. – user703016 May 23 '12 at 17:18
  • 1
    I agree with valdo. It **obviously** creates more code as you will call a constructor and a destructor for the string object — let alone you will allocate dynamic memory — when the C-variant will be a hardcoded string. – qdii May 23 '12 at 17:22
  • @valdo: with GHz mobile devices going quad core, many GB storage, it is more important that the source code is higher quality. – jacknad May 23 '12 at 17:22
  • 1
    using C++ and objects does not automatically product "better quality" code. I saw numerous absurd object-oriented architectures greatly decreasing performances. The same can happen in C. Whether it is object-oriented is irrelevant here. – qdii May 23 '12 at 17:24
  • I very much agree with @qdii. And, honestly speaking, I'm simply very tied from discussion with "political" arguments, such as "redundant extra code *may* be almost that quick on a theoretically-existing platform" or "the difference in performance is unimportant". – valdo May 23 '12 at 17:43
  • 1
    @qdii I'm not saying that C++ and objects create "better quality" code (what is that, by the way?). I'm saying that since OP is using C++, he should stick to the C++-way-of-life. Else, what's the point of using C++? You don't cut meat with a fork, do you? – user703016 May 23 '12 at 17:49
  • 2
    Trying to stick to C++ only because “Mixing styles is confusing” seems irrelevant to me. One should code the way that fits best its purpose : if the goal is to compare two hardcoded strings, creating two string objects is overkill. Coding in C in that case is relevant. Just like using asm snippets on performance critical portions sections can be appropriate. – qdii May 23 '12 at 18:11
  • @qdii You said it yourself: *"Coding in C" is relevant*. OP is doing C++, so he should use the C++ way for hard-coded strings. The one I posted in an above comment. Not the C version. – user703016 May 23 '12 at 18:15
  • 2
    @Cicada: We don't code in C++ for the sake of coding in C++. At least, I don't know anybody who does. The main reason I choose to code in C++ is because it allows high-level abstractions while *still*, most of the time, producing extremely efficient code. If writing something *"the C++ way"* (whatever that means) produces less efficient code than some other method (and I'm not saying that it does), then it's a perfectly valid decision to use that other method. – Benjamin Lindley May 23 '12 at 18:29
4

Using a compiler that implements the Small String Optimization, I get this result:

main    PROC                        ; COMDAT

; 6    : {

$LN124:
  00000 48 83 ec 48       sub    rsp, 72            ; 00000048H

; 7    :    std::string str( "Hello" );

  00004 8b 05 00 00 00
        00                mov    eax, DWORD PTR ??_C@_05COLMCDPH@Hello?$AA@

; 8    : 
; 9    :    if( str == "Hello" )

  0000a 48 8d 15 00 00
        00 00            lea     rdx, OFFSET FLAT:??_C@_05COLMCDPH@Hello?$AA@
  00011 48 8d 4c 24 20   lea     rcx, QWORD PTR str$[rsp]
  00016 89 44 24 20      mov     DWORD PTR str$[rsp], eax
  0001a 0f b6 05 04 00
        00 00            movzx   eax, BYTE PTR ??_C@_05COLMCDPH@Hello?$AA@+4
  00021 41 b8 05 00 00
        00               mov     r8d, 5
  00027 c6 44 24 37 00   mov     BYTE PTR str$[rsp+23], 0
  0002c 48 c7 44 24 38
        05 00 00 00      mov     QWORD PTR str$[rsp+24], 5
  00035 c6 44 24 25 00   mov     BYTE PTR str$[rsp+5], 0
  0003a 88 44 24 24      mov     BYTE PTR str$[rsp+4], al
  0003e e8 00 00 00 00   call    memcmp
  00043 85 c0            test    eax, eax
  00045 75 1d            jne     SHORT $LN123@main

; 10   :    { printf("Yes!\n"); }

  00047 48 8d 0d 00 00
        00 00            lea     rcx, OFFSET FLAT:??_C@_05IOIEDEHB@Yes?$CB?6?$AA@
  0004e e8 00 00 00 00   call    printf

; 11   : 
; 12   : }

Not a single memory allocation in sight!

Community
  • 1
  • 1
Bo Persson
  • 86,087
  • 31
  • 138
  • 198
  • Since this is an optimization (and only for small strings, as it name says) it should not be a factor in your decision. – K-ballo May 23 '12 at 18:25
  • At least it means that the example in the question is not a real problem, so one problem less to worry about. – Bo Persson May 23 '12 at 18:32
  • This is because in some versions of STL implementation `std::string` is implemented as a hybrid of a fixed-size buffer (usually 32 characters long), **and** a pointer to the (optionally) allocated storage. Try running your example with a longer string. – valdo May 24 '12 at 09:01
  • @Valdo - The example shows that it is not a problem with short strings, true. However, how many programs spend a significant part of their time comparing lots of very long strings? Is this ever a bottleneck that using `strcmp` will solve? Hardly! – Bo Persson May 24 '12 at 11:08
0

Under the hood, std::string::operator== is ostensibly calling strcmp. Honestly, if fragmentation isn't an issue for you and you like to leverage the stl's more readable syntax, go for it and use the stl. If performance is an issue and you profile the code and you see that constant allocation/deallocation of std::string internal data is a hotspot/bottleneck, optimize there. If you don't like inconsistent coding style mixing operator==() and strcmp, write something like this:

inline bool str_eq(const char* const lhs, const char* const rhs)
{
    return strcmp(lhs, rhs) == 0;
}
inline bool str_eq(const std::string& lhs, const char* const rhs)
{
    return str_eq(lhs.c_str(), rhs);
}
inline bool str_eq(const char* const lhs, const std::string& rhs)
{
    return str_eq(lhs, rhs.c_str());
}
inline bool str_eq(const std::string& lhs, const std::string& rhs)
{
    return lhs == rhs;
}

This shouldn't really be a religious conversation. Both work the same. Now if you see somebody writing

std::string str( "Hello" );
if( strcmp(str.c_str(), "Hello") == 0 ) { ... }

or

std::string str( "Hello" );
if( str.compare( "Hello" ) == 0) { ... }

then you can have a debate on mixing styles because both of those obviously would been clearer using operator==

cppguy
  • 3,341
  • 1
  • 16
  • 33
0

If your team coding in C++, you should use all features it offers. Of course, C++ properly used takes care about memory allocation (constructors and destructors) and more natural syntax (for ==, +).

You may think OOP style may be slower. But you must measure first that bottleneck is string operations. It's unlikely for most scenarios. Premature optimization is root of all evil. Properly designed C++ classes will not lose to handy written C code.

Returning back to your question, worst variant to mix libraries. You may replace C string with OOP library, but still using old-school IO routines and maths.

demi
  • 4,925
  • 4
  • 30
  • 53