18

What's the easiest way to convert a C++ std::string to another std::string, which has all the unprintable characters escaped?

For example, for the string of two characters [0x61,0x01], the result string might be "a\x01" or "a%01".

Danra
  • 8,833
  • 4
  • 49
  • 113

5 Answers5

11

Take a look at the Boost's String Algorithm Library. You can use its is_print classifier (together with its operator! overload) to pick out nonprintable characters, and its find_format() functions can replace those with whatever formatting you wish.

#include <iostream>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>

struct character_escaper
{
    template<typename FindResultT>
    std::string operator()(const FindResultT& Match) const
    {
        std::string s;
        for (typename FindResultT::const_iterator i = Match.begin();
             i != Match.end();
             i++) {
            s += str(boost::format("\\x%02x") % static_cast<int>(*i));
        }
        return s;
    }
};

int main (int argc, char **argv)
{
    std::string s("a\x01");
    boost::find_format_all(s, boost::token_finder(!boost::is_print()), character_escaper());
    std::cout << s << std::endl;
    return 0;
}
Josh Kelley
  • 50,042
  • 19
  • 127
  • 215
9

Assumes the execution character set is a superset of ASCII and CHAR_BIT is 8. For the OutIter pass a back_inserter (e.g. to a vector<char> or another string), ostream_iterator, or any other suitable output iterator.

template<class OutIter>
OutIter write_escaped(std::string const& s, OutIter out) {
  *out++ = '"';
  for (std::string::const_iterator i = s.begin(), end = s.end(); i != end; ++i) {
    unsigned char c = *i;
    if (' ' <= c and c <= '~' and c != '\\' and c != '"') {
      *out++ = c;
    }
    else {
      *out++ = '\\';
      switch(c) {
      case '"':  *out++ = '"';  break;
      case '\\': *out++ = '\\'; break;
      case '\t': *out++ = 't';  break;
      case '\r': *out++ = 'r';  break;
      case '\n': *out++ = 'n';  break;
      default:
        char const* const hexdig = "0123456789ABCDEF";
        *out++ = 'x';
        *out++ = hexdig[c >> 4];
        *out++ = hexdig[c & 0xF];
      }
    }
  }
  *out++ = '"';
  return out;
}
  • I thought && was a perfectly good operator. You can even use it without needing an extra header file. – Ben Voigt Mar 10 '10 at 15:10
  • 4
    You can use *and* without a header in standard C++ too. This was copied from another project and I forgot to change those to make up for MSVC's deficiencies. –  Mar 10 '10 at 15:21
  • Why did you write in such a way that it requires a `back_inserter` to be passed in? Isn't simply returning a string by value (which means moving it anyway) just fine? – notadam May 01 '16 at 14:11
7

Assuming that "easiest way" means short and yet easily understandable while not depending on any other resources (like libs) I would go this way:

#include <cctype>
#include <sstream>

// s is our escaped output string
std::string s = "";
// loop through all characters
for(char c : your_string)
{
    // check if a given character is printable
    // the cast is necessary to avoid undefined behaviour
    if(isprint((unsigned char)c))
        s += c;
    else
    {
        std::stringstream stream;
        // if the character is not printable
        // we'll convert it to a hex string using a stringstream
        // note that since char is signed we have to cast it to unsigned first
        stream << std::hex << (unsigned int)(unsigned char)(c);
        std::string code = stream.str();
        s += std::string("\\x")+(code.size()<2?"0":"")+code;
        // alternatively for URL encodings:
        //s += std::string("%")+(code.size()<2?"0":"")+code;
    }
}
Scindix
  • 1,022
  • 12
  • 27
  • I like this answer a lot, despite the fiddling with stringstream, std::hex, and multiple casts. Would something like `char hex[5] = ""; ssize_t len = snprintf(hex, 5, "\\x%02x", c); s += std::string(hex, len);` in the `else` block work, too, or is there some gotcha I'm not seeing? – joshtch Jun 21 '18 at 18:33
  • 1
    The OP didn't specifically ask for a pure C++ solution, but I thought it sounded like he preferred C++. And so I limited myself to that. But yes, as far as I'm concerned your code would work just the same. (And would be a bit shorter.) – Scindix Jun 24 '18 at 15:06
  • 1
    There is a bug in the solution. The isprint method expects an int >-1 and <= 255, you run into problems for characters with ASCII code >127, meaning a negative `char c` – Tom Mar 16 '19 at 23:24
  • 1
    Here is my version: https://gist.github.com/timmi-on-rails/173c496a9c5a33ad9df7c6428b9a077b – Tom Mar 17 '19 at 09:55
  • 1
    @Tom You are right. I correctly casted the character in the stringstream section, but not in inside `isprint`. I missed that despite testing higher ASCII codes due to the fact that gcc's implementation always seems to return false unless it's a printable character. Nevertheless undefined behavior is evil and so I corrected my original code. – Scindix Mar 21 '19 at 13:54
  • @Scindix In windows with MSVC2017 an assertion window pops up. – Tom Mar 22 '19 at 14:59
3

One person's unprintable character is another's multi-byte character. So you'll have to define the encoding before you can work out what bytes map to what characters, and which of those is unprintable.

Douglas Leeder
  • 49,001
  • 8
  • 86
  • 133
2

Have you seen the article about how to Generate Escaped String Output Using Spirit.Karma?

hkaiser
  • 11,092
  • 1
  • 28
  • 33