25

I want to split std::string by regex.

I have found some solutions on Stackoverflow, but most of them are splitting string by single space or using external libraries like boost.

I can't use boost.

I want to split string by regex - "\\s+".

I am using this g++ version g++ (Debian 4.4.5-8) 4.4.5 and i can't upgrade.

nothing-special-here
  • 8,238
  • 10
  • 56
  • 90
  • Right know I am using this functions to split string: http://stackoverflow.com/a/236803/418518 it works only by __single char__. The regex format is correct, I have already used him in one java project. Works brillant. – nothing-special-here May 25 '13 at 11:39
  • The problem is that I don't know C++ much... and I just want to know how to split `std::string` using old c++ standard (`C++03` probably). If you have some links / code just paste it. :) Thanks! – nothing-special-here May 25 '13 at 11:40
  • Can you show example input and desired output? – melwil May 25 '13 at 11:41
  • Using [boost](http://www.boost.org/doc/libs/1_53_0/libs/regex/doc/html/boost_regex/ref/regex_token_iterator.html) may be an option. – Bernhard Barker May 25 '13 at 11:43
  • @melwil: Desired input / output: https://gist.github.com/maciejkowalski/af7e0ce2b92d967e050c – nothing-special-here May 25 '13 at 11:45
  • @Dukeling: Unfortunatelly, I can't use boost. ;/ – nothing-special-here May 25 '13 at 11:45
  • If that version of g++ C++11 compliant, [this](http://stackoverflow.com/a/13125497/1711796) / [this](http://en.cppreference.com/w/cpp/regex) may be a starting point. Otherwise, splitting by regex pattern *without* an external library will probably require writing a regex parser (which is no small task, or a small copy-paste task, assuming you can find code to do it). However, if you just want to split by multiple spaces, a simple iterative solution probably won't be too difficult, or simply split by a single space and ignore empty strings. – Bernhard Barker May 25 '13 at 11:54
  • C++03 does not come with a regex library. C++11 does but your compiler won't support C++11. You need to either use an existing third-party regex library, or write one of your own. – n. 'pronouns' m. May 25 '13 at 11:55

4 Answers4

56
#include <regex>

std::regex rgx("\\s+");
std::sregex_token_iterator iter(string_to_split.begin(),
    string_to_split.end(),
    rgx,
    -1);
std::sregex_token_iterator end;
for ( ; iter != end; ++iter)
    std::cout << *iter << '\n';

The -1 is the key here: when the iterator is constructed the iterator points at the text that precedes the match and after each increment the iterator points at the text that followed the previous match.

If you don't have C++11, the same thing should work with TR1 or (possibly with slight modification) with Boost.

Duloren
  • 1,455
  • 1
  • 16
  • 24
Pete Becker
  • 69,019
  • 6
  • 64
  • 147
  • 1
    @Narek - either that, or add explicit template arguments: `regex_token_iterator`. `sregex_token_iterator` is easier. Fixed. Thanks. – Pete Becker May 05 '15 at 11:23
  • the last example on [cplusplus.com reference doc](http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/) is similar to this answer – solstice333 Sep 06 '16 at 23:07
13

To expand on the answer by @Pete Becker I provide an example of resplit function that can be used to split text using regexp:

  #include <regex>

  std::vector<std::string>
  resplit(const std::string & s, std::string rgx_str = "\\s+") {


      std::vector<std::string> elems;

      std::regex rgx (rgx_str);

      std::sregex_token_iterator iter(s.begin(), s.end(), rgx, -1);
      std::sregex_token_iterator end;

      while (iter != end)  {
          //std::cout << "S43:" << *iter << std::endl;
          elems.push_back(*iter);
          ++iter;
      }

      return elems;

  }

This works as follows:

   string s1 = "first   second third    ";
   vector<string> v22 = my::resplit(s1);

   for (const auto & e: v22) {
       cout <<"Token:" << e << endl;
   }


   //Token:first
   //Token:second
   //Token:third


   string s222 = "first|second:third,forth";
   vector<string> v222 = my::resplit(s222, "[|:,]");

   for (const auto & e: v222) {
       cout <<"Token:" << e << endl;
   }


   //Token:first
   //Token:second
   //Token:third
   //Token:forth
Duloren
  • 1,455
  • 1
  • 16
  • 24
Marcin
  • 108,294
  • 7
  • 83
  • 138
12

You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.

The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.

std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        if (item.length() > 0) {
            elems.push_back(item);  
        }
    }
    return elems;
}


std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, elems);
    return elems;
}

By checking that item.length() > 0 before pushing item on to the elems vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)

Community
  • 1
  • 1
shf301
  • 30,022
  • 2
  • 46
  • 83
  • Well, we figured out the same way in the same time. :) But you were actually faster (~10 min) in pasting answer on SO. +1 & accept. – nothing-special-here May 25 '13 at 12:28
  • 2
    You should agree also on fact that using C++ to split string looks like even larger overkill, in C# you just do `str.split(...)` ;) – Lu4 Jul 30 '15 at 09:27
1
string s = "foo bar  baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
   cout << " [" << *i++ << "]";

prints [foo] [bar] [baz]

solstice333
  • 2,617
  • 19
  • 26