Parse (split) a string in C++ using string delimiter (standard C++)

Question

I am parsing a string in C++ using the following:

using namespace std;

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if (getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.

Example: I want to split:

scott>=tiger

with >= as delimiter so that I can get scott and tiger.

https://stackoverflow.blog/2019/10/11/c-creator-bjarne-stroustrup-answers-our-top-five-c-questions scroll down to #5. — Wais Kamal, Mar 06 '21 at 07:40

score 737 · Accepted Answer · edited Sep 16 '13 at 08:18

737

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"

The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.
The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.

If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

edited Sep 16 '13 at 08:18

answered Jan 10 '13 at 19:53

Vincenzo Pii

16,001
8
35
48

98

For those who don't want to modify the input string, do `size_t last = 0; size_t next = 0; while ((next = s.find(delimiter, last)) != string::npos) { cout << s.substr(last, next-last) << endl; last = next + 1; } cout << s.substr(last) << endl;` – hayk.mart Jan 31 '15 at 05:07
42

NOTE: `mushroom` outputs outside of the loop, i.e. `s = mushroom` – Don Larynx Jan 31 '15 at 22:22
2

Those samples does not extract the last token from string. A sample of mine extracting an IpV4 from one string: size_t last = 0; size_t next = 0; int index = 0; while (index<4) { next = str.find(delimiter, last); auto number = str.substr(last, next - last); IPv4[index++] = atoi(number.c_str()); last = next + 1; } – rfog Aug 17 '15 at 08:00
3

@hayk.mart Just a note, that would be the following, you need add 2 not 1 due to the size of the delimiter which is 2 characters :) : std::string s = "scott>=tiger>=mushroom"; std::string delimiter = ">="; size_t last = 0; size_t next = 0; while ((next = s.find(delimiter, last)) != std::string::npos) { std::cout << s.substr(last, next-last) << std::endl; last = next + 2; } std::cout << s.substr(last) << std::endl; – ervinbosenbacher Oct 30 '15 at 12:52
In order to get "tiger", use `std::string token = s.substr(s.find(delimiter) + 1);`, if you are sure that it exists (I use +1 in the length)... – gsamaras Apr 02 '19 at 16:45
Hi, I am using the code posted by @Vincenzo Pii, and it works fine, the only problem I have, is that I cant get the last word of my sentence. Anyone that resolved this problem? – Jonathan Prieto May 10 '20 at 14:37
This answer is wrong, if fail to handle the last one – Alen Wesker Jul 17 '20 at 03:44
7

Wondering how many of the 615 upvoters missed the last line and are running hidden bugs in their production code. Judging from the comments, I'd wager at least a handful. IMO this answer would be much better suited if it didn't use `cout` and instead showed it as a function. – Qix - MONICA WAS MISTREATED Sep 09 '20 at 08:47

score 94 · Answer 2 · answered Jan 10 '13 at 21:20

This method uses std::string::find without mutating the original string by remembering the beginning and end of the previous substring token.

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}

How do I perform this operation on vector where both strings in the vector are of same form and have same delimiters. I just want to output both strings parsed out in the same way as this works for one string. My "string delim" will remain same ofcourse — Areeb Muzaffar, Mar 16 '21 at 21:48

score 57 · Answer 3 · answered May 26 '16 at 07:25

57

You can use next function to split string:

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}

answered May 26 '16 at 07:25

Sviatoslav

687
5
4

8

IMO it does't work as expected: `split("abc","a")` will return a vector or a single string, `"bc"`, where I think it would make more sense if it had returned a vector of elements `["", "bc"]`. Using `str.split()` in Python, it was intuitive to me that it should return an empty string in case `delim` was found either at the beginning or in the end, but that's just my opinion. Anyway, I just think it should be mentioned – kyriakosSt Nov 30 '18 at 18:26
3

Would strongly recommend removing the `if (!token.empty()) ` prevent the issue mentioned by @kyriakosSt as well as other issues related to consecutive delimiters. – Steve Mar 12 '19 at 16:33
1

I would remove my upvote if I could, but SO won't let me. The issue brought up by @kyriakosSt is a problem, and removing `if (!token.empty())` does not seem to suffice to fix it. – bhaller Nov 21 '19 at 00:22
2

@bhaller this sniplet was designed exactly to skip empty fragments. If you need to keep empty ones I'm afraid you need to write another split implementation. Kindly suggest you to post it here for the good of comunity. – Sviatoslav Feb 26 '20 at 21:28

Arafat Hasan · Answer 4 · 2018-10-09T10:32:36.857

For string delimiter

Split string based on a string delimiter. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih" based on string delimiter "-+", output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

// for string delimiter
vector<string> split (string s, string delimiter) {
    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
    string token;
    vector<string> res;

    while ((pos_end = s.find (delimiter, pos_start)) != string::npos) {
        token = s.substr (pos_start, pos_end - pos_start);
        pos_start = pos_end + delim_len;
        res.push_back (token);
    }

    res.push_back (s.substr (pos_start));
    return res;
}

int main() {
    string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
    string delimiter = "-+";
    vector<string> v = split (str, delimiter);

    for (auto i : v) cout << i << endl;

    return 0;
}

Output

adsf
qwret
nvfkbdsj
orthdfjgh
dfjrleih

For single character delimiter

Split string based on a character delimiter. Such as splitting string "adsf+qwer+poui+fdgh" with delimiter "+" will output {"adsf", "qwer", "poui", "fdg"h}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

vector<string> split (const string &s, char delim) {
    vector<string> result;
    stringstream ss (s);
    string item;

    while (getline (ss, item, delim)) {
        result.push_back (item);
    }

    return result;
}

int main() {
    string str = "adsf+qwer+poui+fdgh";
    vector<string> v = split (str, '+');

    for (auto i : v) cout << i << endl;

    return 0;
}

Output

adsf
qwer
poui
fdgh

You are returning `vector` I think it'll call copy constructor. — Mayur, Nov 27 '18 at 08:21
Every reference I've seen shows that the call to the copy constructor is eliminated in that context. — David Given, Jan 12 '19 at 09:50
With "modern" (C++03?) compilers I believe this is correct, RVO and/or move semantics will eliminate the copy constructor. — Kevin, Mar 13 '19 at 14:09
I tried the one for single character delimiter, and if the string ends in a delimiter (i.e., an empty csv column at the end of the line), it does not return the empty string. It simply returns one fewer string. For example: 1,2,3,4\nA,B,C, — kounoupis, Mar 26 '19 at 01:42
I also tried the one for string delimiter, and if the string ends in a delimiter, the last delimiter becomes part of the last string extracted. — kounoupis, Mar 26 '19 at 01:44

score 20 · Answer 5 · edited Nov 14 '18 at 05:23

20

This code splits lines from text, and add everyone into a vector.

vector<string> split(char *phrase, string delimiter){
    vector<string> list;
    string s = string(phrase);
    size_t pos = 0;
    string token;
    while ((pos = s.find(delimiter)) != string::npos) {
        token = s.substr(0, pos);
        list.push_back(token);
        s.erase(0, pos + delimiter.length());
    }
    list.push_back(s);
    return list;
}

Called by:

vector<string> listFilesMax = split(buffer, "\n");

edited Nov 14 '18 at 05:23

fret

1,478
18
35

answered Jun 12 '17 at 08:54

William Cuervo

754
6
7

it's working great! I've added list.push_back(s); because it was missing. – Stoica Mircea May 25 '18 at 11:51
1

it misses out the last part of the string. After the while loop ends, we need to add the remaining of s as a new token. – whihathac Jun 01 '18 at 03:00
I've made an edit to the code sample to fix the missing push_back. – fret Nov 14 '18 at 01:25
1

It will be more nicer `vector split(char *phrase, const string delimiter="\n")` – Mayur May 17 '19 at 09:28
I know kinda late but, it would work much better if this if statement was added before push `if (token != "") list.push_back(token);` to prevent appending empty strings. – Oliver Tworkowski Jul 16 '20 at 21:37
@OliverTworkowski A lot of the time, what is viewed as being the "correct" behaviour involves leaving the empty strings in. Of course, this may be undesirable in your use case, in which case your suggestion is completely valid. – squ1dd13 Nov 11 '20 at 17:39

Rika · Answer 6 · 2021-01-09T02:02:45.893

17

You can also use regex for this:

std::vector<std::string> split(const std::string str, const std::string regex_str)
{
    std::regex regexz(regex_str);
    std::vector<std::string> list(std::sregex_token_iterator(str.begin(), str.end(), regexz, -1),
                                  std::sregex_token_iterator());
    return list;
}

which is equivalent to :

std::vector<std::string> split(const std::string str, const std::string regex_str)
{
    std::sregex_token_iterator token_iter(str.begin(), str.end(), regexz, -1);
    std::sregex_token_iterator end;
    std::vector<std::string> list;
    while (token_iter != end)
    {
        list.emplace_back(*token_iter++);
    }
    return list;
}

and use it like this :

#include <iostream>
#include <string>
#include <regex>

std::vector<std::string> split(const std::string str, const std::string regex_str)
{   // a yet more concise form!
    return { std::sregex_token_iterator(str.begin(), str.end(), std::regex(regex_str), -1), std::sregex_token_iterator() };
}

int main()
{
    std::string input_str = "lets split this";
    std::string regex_str = " "; 
    auto tokens = split(input_str, regex_str);
    for (auto& item: tokens)
    {
        std::cout<<item <<std::endl;
    }
}

play with it online! http://cpp.sh/9sumb

you can simply use substrings, characters, etc like normal, or use actual regular expressions to do the splitting.
its also concise and C++11!

edited Jan 09 '21 at 02:02

answered Nov 18 '20 at 03:46

Rika

19,296
28
91
182

2

This should be the correct answer, provided C++11 is on the table, which if it isn't...you should be using C++>=11, it's a game-changer! – DeusXMachina Jan 08 '21 at 21:47
Please can you explain the return statement in the function `split()`? I am trying to figure how the tokens are pushed into the `std::vector` container. Thanks. – BFamz Feb 03 '21 at 13:53
Would writing it as ```return std::vector{ std::sregex_token_iterator(str.begin(), str.end(), std::regex(regex_str), -1), std::sregex_token_iterator() };``` make it more obvious to you that how a temporary std::vector is being created and returned? we are using list initialization here. have a look [here](https://en.cppreference.com/w/cpp/language/list_initialization) – Rika Feb 03 '21 at 15:29
3

@DeusXMachina: a fine solution, certainly. One caveat: the "yet more concise form!" in the last code segment will not compile with _LIBCPP_STD_VER > 11, as the method is marked as "delete"... but the earlier code segments that don't implicitly require rvalue reference && compile and run fine under C++2a. – pob Mar 25 '21 at 04:19
@Rika yes that works. Thanks for your help. – BFamz Apr 12 '21 at 13:51

ryanbwork · Answer 7 · 2013-01-10T19:49:55.877

16

strtok allows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).

EDIT if you don't want to use c_str() to convert from string to char*, you can use substr and find_first_of to tokenize.

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}

edited Jan 10 '13 at 19:49

answered Jan 10 '13 at 19:18

ryanbwork

2,053
11
12

3

Thanks. But I want to use only C++ and not any C functions like `strtok()` as it would require me to use char array instead of string. – TheCrazyProgrammer Jan 10 '13 at 19:26
2

@TheCrazyProgrammer So? If a C function does what you need, use it. This isn't a world where C functions aren't available in C++ (in fact, they have to be). `.c_str()` is cheap and easy, too. – Qix - MONICA WAS MISTREATED Oct 14 '16 at 04:00
1

The check for if(token != mystring) gives wrong results if you have repeating elements in your string. I used your code to make a version that does not have this. It has many changes that change the answer fundamentally, so I wrote my own answer instead of editing. Check it below. – Amber Elferink Aug 28 '19 at 16:06

Shubham Agrawal · Answer 8 · 2020-10-16T09:29:03.807

9

Answer is already there, but selected-answer uses erase function which is very costly, think of some very big string(in MBs). Therefore I use below function.

vector<string> split(const string& i_str, const string& i_delim)
{
    vector<string> result;
    
    size_t found = i_str.find(i_delim);
    size_t startIndex = 0;

    while(found != string::npos)
    {
        result.push_back(string(i_str.begin()+startIndex, i_str.begin()+found));
        startIndex = found + i_delim.size();
        found = i_str.find(i_delim, startIndex);
    }
    if(startIndex != i_str.size())
        result.push_back(string(i_str.begin()+startIndex, i_str.end()));
    return result;      
}

edited Oct 16 '20 at 09:29

answered Aug 04 '19 at 13:17

Shubham Agrawal

441
6
12

1

I tested this, and it works. Thanks! In my opinion, this is the best answer because as the original answer-er states, this solution reduces the memory overhead, and the result is conveniently stored in a vector. (replicates the Python `string.split()` method.) – Robbie Capps Apr 28 '20 at 17:32

score 5 · Answer 9 · answered Jan 10 '13 at 19:40

I would use boost::tokenizer. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

Here's one that works for your case.

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}

Thanks. But I want to wish only standard C++ and not a third party library. — TheCrazyProgrammer, Jan 10 '13 at 19:49
@TheCrazyProgrammer: Okay, when I read "Standard C++", I thought that meant no non-standard extensions, not that you couldn't use standards conforming third party libraries. — Benjamin Lindley, Jan 10 '13 at 19:58

Beder Acosta Borges · Answer 10 · 2017-05-25T09:15:57.837

Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.

bool endsWith(const std::string& s, const std::string& suffix)
{
    return s.size() >= suffix.size() &&
           s.substr(s.size() - suffix.size()) == suffix;
}

std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
    std::vector<std::string> tokens;

    for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
    {
         size_t position = s.find(delimiter, start);
         end = position != string::npos ? position : s.length();

         std::string token = s.substr(start, end - start);
         if (!removeEmptyEntries || !token.empty())
         {
             tokens.push_back(token);
         }
    }

    if (!removeEmptyEntries &&
        (s.empty() || endsWith(s, delimiter)))
    {
        tokens.push_back("");
    }

    return tokens;
}

Examples

split("a-b-c", "-"); // [3]("a","b","c")

split("a--c", "-"); // [3]("a","","c")

split("-b-", "-"); // [3]("","b","")

split("--c--", "-"); // [5]("","","c","","")

split("--c--", "-", true); // [1]("c")

split("a", "-"); // [1]("a")

split("", "-"); // [1]("")

split("", "-", true); // [0]()

hmofrad · Answer 11 · 2021-03-09T22:08:20.420

5

This should work perfectly for string (or single character) delimiters. Don't forget to include #include <sstream>.

std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta";
std::string delimiter = "=,+"; 
std::istringstream ss(input);
std::string token;
std::string::iterator it;

while(std::getline(ss, token, *(it = delimiter.begin()))) {
    std::cout << token << std::endl; // Token is extracted using '='
    it++;
    // Skip the rest of delimiter if exists ",+"
    while(it != delimiter.end() and ss.peek() == *(it)) { 
        it++; ss.get(); 
    }
}

The first while loop extracts a token using the first character of the string delimiter. The second while loop skips the rest of the delimiter and stops at the beginning of the next token.

edited Mar 09 '21 at 22:08

answered Nov 06 '19 at 22:00

hmofrad

1,360
2
17
25

This is incorrect. If the input is modified as below, it would split using the first =, when it is not supposed to: `std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta=Echo";` – Amitoj Mar 09 '21 at 07:27
@Amitoj Good catch. I revised my answer to even cover inputs with malformed delimiters. – hmofrad Mar 09 '21 at 16:09

score 3 · Answer 12 · answered Jul 11 '19 at 14:48

This is a complete method that splits the string on any delimiter and returns a vector of the chopped up strings.

It is an adaptation from the answer from ryanbwork. However, his check for: if(token != mystring) gives wrong results if you have repeating elements in your string. This is my solution to that problem.

vector<string> Split(string mystring, string delimiter)
{
    vector<string> subStringList;
    string token;
    while (true)
    {
        size_t findfirst = mystring.find_first_of(delimiter);
        if (findfirst == string::npos) //find_first_of returns npos if it couldn't find the delimiter anymore
        {
            subStringList.push_back(mystring); //push back the final piece of mystring
            return subStringList;
        }
        token = mystring.substr(0, mystring.find_first_of(delimiter));
        mystring = mystring.substr(mystring.find_first_of(delimiter) + 1);
        subStringList.push_back(token);
    }
    return subStringList;
}

Something like `while (true)` is usually scary to see in a piece of code like this. Personally I'd recommend rewriting this so that the comparison to `std::string::npos` (or respectively a check against `mystring.size()`) makes the `while (true)` obsolete. — Joel Bodenmann, Dec 04 '19 at 16:58

Shubham Kumar Gupta Ggps · Answer 13 · 2020-09-07T09:44:02.573

A very simple/naive approach:

vector<string> words_seperate(string s){
    vector<string> ans;
    string w="";
    for(auto i:s){
        if(i==' '){
           ans.push_back(w);
           w="";
        }
        else{
           w+=i;
        }
    }
    ans.push_back(w);
    return ans;
}

Or you can use boost library split function:

vector<string> result; 
boost::split(result, input, boost::is_any_of("\t"));

Or You can try TOKEN or strtok:

char str[] = "DELIMIT-ME-C++"; 
char *token = strtok(str, "-"); 
while (token) 
{ 
    cout<<token; 
    token = strtok(NULL, "-"); 
}

Or You can do this:

char split_with=' ';
vector<string> words;
string token; 
stringstream ss(our_string);
while(getline(ss , token , split_with)) words.push_back(token);

score 2 · Answer 14 · answered Oct 10 '20 at 03:34

Since this is the top-rated Stack Overflow Google search result for C++ split string or similar, I'll post a complete, copy/paste runnable example that shows both methods.

splitString uses stringstream (probably the better and easier option in most cases)

splitString2 uses find and substr (a more manual approach)

// SplitString.cpp

#include <iostream>
#include <vector>
#include <string>
#include <sstream>

// function prototypes
std::vector<std::string> splitString(const std::string& str, char delim);
std::vector<std::string> splitString2(const std::string& str, char delim);
std::string getSubstring(const std::string& str, int leftIdx, int rightIdx);


int main(void)
{
  // Test cases - all will pass
  
  std::string str = "ab,cd,ef";
  //std::string str = "abcdef";
  //std::string str = "";
  //std::string str = ",cd,ef";
  //std::string str = "ab,cd,";   // behavior of splitString and splitString2 is different for this final case only, if this case matters to you choose which one you need as applicable
  
  
  std::vector<std::string> tokens = splitString(str, ',');
  
  std::cout << "tokens: " << "\n";
  
  if (tokens.empty())
  {
    std::cout << "(tokens is empty)" << "\n";
  }
  else
  {
    for (auto& token : tokens)
    {
      if (token == "") std::cout << "(empty string)" << "\n";
      else std::cout << token << "\n";
    }
  }
    
  return 0;
}

std::vector<std::string> splitString(const std::string& str, char delim)
{
  std::vector<std::string> tokens;
  
  if (str == "") return tokens;
  
  std::string currentToken;
  
  std::stringstream ss(str);
  
  while (std::getline(ss, currentToken, delim))
  {
    tokens.push_back(currentToken);
  }
  
  return tokens;
}

std::vector<std::string> splitString2(const std::string& str, char delim)
{
  std::vector<std::string> tokens;
  
  if (str == "") return tokens;
  
  int leftIdx = 0;
  
  int delimIdx = str.find(delim);
  
  int rightIdx;
  
  while (delimIdx != std::string::npos)
  {
    rightIdx = delimIdx - 1;
    
    std::string token = getSubstring(str, leftIdx, rightIdx);
    tokens.push_back(token);
    
    // prep for next time around
    leftIdx = delimIdx + 1;
    
    delimIdx = str.find(delim, delimIdx + 1);
  }
  
  rightIdx = str.size() - 1;
  
  std::string token = getSubstring(str, leftIdx, rightIdx);
  tokens.push_back(token);
  
  return tokens;
}

std::string getSubstring(const std::string& str, int leftIdx, int rightIdx)
{
  return str.substr(leftIdx, rightIdx - leftIdx + 1);
}

score 1 · Answer 15 · answered May 23 '17 at 09:37

If you do not want to modify the string (as in the answer by Vincenzo Pii) and want to output the last token as well, you may want to use this approach:

inline std::vector<std::string> splitString( const std::string &s, const std::string &delimiter ){
    std::vector<std::string> ret;
    size_t start = 0;
    size_t end = 0;
    size_t len = 0;
    std::string token;
    do{ end = s.find(delimiter,start); 
        len = end - start;
        token = s.substr(start, len);
        ret.emplace_back( token );
        start += len + delimiter.length();
        std::cout << token << std::endl;
    }while ( end != std::string::npos );
    return ret;
}

score 1 · Answer 16 · answered May 27 '20 at 17:34

std::vector<std::string> parse(std::string str,std::string delim){
    std::vector<std::string> tokens;
    char *str_c = strdup(str.c_str()); 
    char* token = NULL;

    token = strtok(str_c, delim.c_str()); 
    while (token != NULL) { 
        tokens.push_back(std::string(token));  
        token = strtok(NULL, delim.c_str()); 
    }

    delete[] str_c;

    return tokens;
}

SridharKritha · Answer 17 · 2021-03-03T13:37:58.800

Yet another answer: Here I'm using find_first_not_of string function which returns the position of the first character that does not match any of the characters specified in the delim.

size_t find_first_not_of(const string& delim, size_t pos = 0) const noexcept;

Example:

int main()
{
    size_t start = 0, end = 0;
    std::string str = "scott>=tiger>=cat";
    std::string delim = ">=";
    while ((start = str.find_first_not_of(delim, end)) != std::string::npos)
    {
        end = str.find(delim, start); // finds the 'first' occurance from the 'start'
        std::cout << str.substr(start, end - start)<<std::endl; // extract substring
    }
    return 0;
}

Output:

    scott
    tiger
    cat

score 1 · Answer 18 · answered Mar 08 '21 at 18:52

I make this solution. It is very simple, all the prints/values are in the loop (no need to check after the loop).

#include <iostream>
#include <string>

using std::cout;
using std::string;

int main() {
    string s = "it-+is-+working!";
    string d = "-+";

    int firstFindI = 0;
    int secendFindI = s.find(d, 0); // find if have any at all
    while (secendFindI != string::npos)
    {
        secendFindI = s.find(d, firstFindI);
        cout << s.substr(firstFindI, secendFindI - firstFindI) << "\n"; // print sliced part
        firstFindI = secendFindI + d.size(); // add to the search index
    }

}

The only downside of this solution is that is doing a search twice in the start.

score 0 · Answer 19 · 2018-01-29T11:22:42.387

0

#include<iostream>
#include<algorithm>
using namespace std;

int split_count(string str,char delimit){
return count(str.begin(),str.end(),delimit);
}

void split(string str,char delimit,string res[]){
int a=0,i=0;
while(a<str.size()){
res[i]=str.substr(a,str.find(delimit));
a+=res[i].size()+1;
i++;
}
}

int main(){

string a="abc.xyz.mno.def";
int x=split_count(a,'.')+1;
string res[x];
split(a,'.',res);

for(int i=0;i<x;i++)
cout<<res[i]<<endl;
  return 0;
}

P.S: Works only if the lengths of the strings after splitting are equal

edited Jan 29 '18 at 11:22

answered Jan 29 '18 at 08:15

This use GCC extension -- variable length array. – user202729 Apr 10 '18 at 08:58

score 0 · Answer 20 · edited Jun 18 '20 at 17:59

Function:

std::vector<std::string> WSJCppCore::split(const std::string& sWhat, const std::string& sDelim) {
    std::vector<std::string> vRet;
    size_t nPos = 0;
    size_t nLen = sWhat.length();
    size_t nDelimLen = sDelim.length();
    while (nPos < nLen) {
        std::size_t nFoundPos = sWhat.find(sDelim, nPos);
        if (nFoundPos != std::string::npos) {
            std::string sToken = sWhat.substr(nPos, nFoundPos - nPos);
            vRet.push_back(sToken);
            nPos = nFoundPos + nDelimLen;
            if (nFoundPos + nDelimLen == nLen) { // last delimiter
                vRet.push_back("");
            }
        } else {
            std::string sToken = sWhat.substr(nPos, nLen - nPos);
            vRet.push_back(sToken);
            break;
        }
    }
    return vRet;
}

Unit-tests:

bool UnitTestSplit::run() {
bool bTestSuccess = true;

    struct LTest {
        LTest(
            const std::string &sStr,
            const std::string &sDelim,
            const std::vector<std::string> &vExpectedVector
        ) {
            this->sStr = sStr;
            this->sDelim = sDelim;
            this->vExpectedVector = vExpectedVector;
        };
        std::string sStr;
        std::string sDelim;
        std::vector<std::string> vExpectedVector;
    };
    std::vector<LTest> tests;
    tests.push_back(LTest("1 2 3 4 5", " ", {"1", "2", "3", "4", "5"}));
    tests.push_back(LTest("|1f|2п|3%^|44354|5kdasjfdre|2", "|", {"", "1f", "2п", "3%^", "44354", "5kdasjfdre", "2"}));
    tests.push_back(LTest("|1f|2п|3%^|44354|5kdasjfdre|", "|", {"", "1f", "2п", "3%^", "44354", "5kdasjfdre", ""}));
    tests.push_back(LTest("some1 => some2 => some3", "=>", {"some1 ", " some2 ", " some3"}));
    tests.push_back(LTest("some1 => some2 => some3 =>", "=>", {"some1 ", " some2 ", " some3 ", ""}));

    for (int i = 0; i < tests.size(); i++) {
        LTest test = tests[i];
        std::string sPrefix = "test" + std::to_string(i) + "(\"" + test.sStr + "\")";
        std::vector<std::string> vSplitted = WSJCppCore::split(test.sStr, test.sDelim);
        compareN(bTestSuccess, sPrefix + ": size", vSplitted.size(), test.vExpectedVector.size());
        int nMin = std::min(vSplitted.size(), test.vExpectedVector.size());
        for (int n = 0; n < nMin; n++) {
            compareS(bTestSuccess, sPrefix + ", element: " + std::to_string(n), vSplitted[n], test.vExpectedVector[n]);
        }
    }

    return bTestSuccess;
}

score 0 · Answer 21 · answered Nov 12 '20 at 15:38

As a bonus, here is a code example of a split function and macro that is easy to use and where you can choose the container type :

#include <iostream>
#include <vector>
#include <string>

#define split(str, delim, type) (split_fn<type<std::string>>(str, delim))
 
template <typename Container>
Container split_fn(const std::string& str, char delim = ' ') {
    Container cont{};
    std::size_t current, previous = 0;
    current = str.find(delim);
    while (current != std::string::npos) {
        cont.push_back(str.substr(previous, current - previous));
        previous = current + 1;
        current = str.find(delim, previous);
    }
    cont.push_back(str.substr(previous, current - previous));
    
    return cont;
}

int main() {
    
    auto test = std::string{"This is a great test"};
    auto res = split(test, ' ', std::vector);
    
    for(auto &i : res) {
        std::cout << i << ", "; // "this", "is", "a", "great", "test"
    }
    
    
    return 0;
}

Greck · Answer 22 · 2021-05-11T12:09:14.027

template<typename C, typename T>
auto insert_in_container(C& c, T&& t) -> decltype(c.push_back(std::forward<T>(t)), void()) {
    c.push_back(std::forward<T>(t));
}
template<typename C, typename T>
auto insert_in_container(C& c, T&& t) -> decltype(c.insert(std::forward<T>(t)), void()) {
    c.insert(std::forward<T>(t));
}
template<typename Container>
Container splitR(const std::string& input, const std::string& delims) {
    Container out;
    size_t delims_len = delims.size();
    auto begIdx = 0u;
    auto endIdx = input.find(delims, begIdx);
    if (endIdx == std::string::npos && input.size() != 0u) {
        insert_in_container(out, input);
    }
    else {
        size_t w = 0;
        while (endIdx != std::string::npos) {
            w = endIdx - begIdx;
            if (w != 0) insert_in_container(out, input.substr(begIdx, w));
            begIdx = endIdx + delims_len;
            endIdx = input.find(delims, begIdx);
        }
        w = input.length() - begIdx;
        if (w != 0) insert_in_container(out, input.substr(begIdx, w));
    }
    return out;
}

Radem · Answer 23 · 2021-05-08T09:11:01.313

0

i use pointer arithmetic. inner while for string delimeter if you satify with char delim just remove inner while simply. i hope it is correct. if you notice any mistake or improve please leave the comment.

std::vector<std::string> split(std::string s, std::string delim)
{
    char *p = &s[0];
    char *d = &delim[0];
    std::vector<std::string> res = {""};

    do
    {
        bool is_delim = true;
        char *pp = p;
        char *dd = d;
        while (*dd && is_delim == true)
            if (*pp++ != *dd++)
                is_delim = false;

        if (is_delim)
        {
            p = pp - 1;
            res.push_back("");
        }
        else
            *(res.rbegin()) += *p;
    } while (*p++);

    return res;
}

edited May 08 '21 at 09:11

answered May 08 '21 at 00:58

Radem

11
1
5

Welcome to Stack Overflow. While this code may solve the question, [including an explanation](https://meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. – Pawara Siriwardhane May 08 '21 at 03:25

score -1 · Answer 24 · answered Feb 10 '21 at 12:44

Since C++11 it can be done like this:

std::vector<std::string> splitString(const std::string& str,
                                     const std::regex& regex)
{
  return {std::sregex_token_iterator{str.begin(), str.end(), regex, -1}, 
          std::sregex_token_iterator() };
} 

// usually we have a predefined set of regular expressions: then
// let's build those only once and re-use them multiple times
static const std::regex regex1(R"some-reg-exp1", std::regex::optimize);
static const std::regex regex2(R"some-reg-exp2", std::regex::optimize);
static const std::regex regex3(R"some-reg-exp3", std::regex::optimize);

string str = "some string to split";
std::vector<std::string> tokens( splitString(str, regex1) );

Notes:

this is a small improvement to this answer
see also Optimization techniques used by std::regex_constants::optimize

This is an incomplete answer, not really doing or explaining anything. — not2qubit, May 18 '21 at 21:52
@not2qubit other than your pointless opinion, do you have something useful to contribute ? — luca, May 20 '21 at 12:38

score -4 · Answer 25 · answered Feb 27 '17 at 20:45

-4

std::vector<std::string> split(const std::string& s, char c) {
  std::vector<std::string> v;
  unsigned int ii = 0;
  unsigned int j = s.find(c);
  while (j < s.length()) {
    v.push_back(s.substr(i, j - i));
    i = ++j;
    j = s.find(c, j);
    if (j >= s.length()) {
      v.push_back(s.substr(i, s,length()));
      break;
    }
  }
  return v;
}

answered Feb 27 '17 at 20:45

Yilei

211
2
4

1

Please be more accurate. Your code will not compile. See declaration of "i" and the comma instead of a dot. – jstuardo Mar 30 '17 at 12:28

Parse (split) a string in C++ using string delimiter (standard C++)

25 Answers25

Complete Example

For string delimiter

For single character delimiter

Linked

Related