-1

How can I split a string on multiple multi-character delimiters?

I want a function like vector<string> split_string(string input, vector<string> delims)

For example, split_string("foo+bar := baz",{"+"," ",":="}) = {"foo","+","bar"," "," ",":="," ","baz"}

fread2281
  • 1,089
  • 1
  • 11
  • 28
  • A very good answer here http://stackoverflow.com/questions/7621727/split-a-string-into-words-by-multiple-delimiters-in-c And more.. http://stackoverflow.com/questions/14265581/parse-split-a-string-in-c-using-string-delimiter-standard-c – Acha Bill Aug 22 '15 at 21:51
  • @jafar please link to a specific answer that works for multiple multi-char delimiters. – fread2281 Aug 22 '15 at 21:54
  • @Jonathan Potter please link to a specific answer that works for this question – fread2281 Aug 22 '15 at 21:56
  • 1
    The answer in the first link solves that problem. – Acha Bill Aug 22 '15 at 21:56
  • The first link uses single character delimiters while the second link uses string delimiters precisely `=>`. Combining the two solves the issue. – Acha Bill Aug 22 '15 at 22:01
  • That question has **57** answers, I'm pretty sure you can find one there that suits you. – Jonathan Potter Aug 22 '15 at 22:12
  • @fread2281 my solution. http://pastebin.com/zxK2t55b – Acha Bill Aug 22 '15 at 22:46
  • Say you have two delimiters, ":" and ":=". Which one wins? – user4581301 Aug 22 '15 at 23:33
  • Based on the comments you wrote so far, it appears that you might be under the impression that stackoverflow.com is a web site where someone writes your code for you, without you bothering to invest any effort in it yourself. You are mistaken. – Sam Varshavchik Aug 23 '15 at 00:06

2 Answers2

1

My cut at the same. I chose to go with divide and conquer. It is not fast. It is not efficient. But it is simple.

Unfortunately it didn't work in this case because we are preserving the delimiters in the output. Dividing allowed later delimiters to split previously found delimiters.

Eg:

Source :=foo+bar  .   :=baz+quaax:=  C++
Delims [+][ ][:=][:]
Result [:][=][foo][+][bar][ ][ ][.][ ][ ][ ][:][=][baz][+][quaax][:][=][ ][ ][C][+][+]

Yuck.

Finally settled on a similar approach to jafar's and added it to my support library to try out in a job I'm working on to replace the divide and conquer approach because it does look to be faster. Wouldn't have bothered posting this, but Jafar's is a bit over complicated for my tastes. Haven't done any profiling so his may be faster.

#include <iostream>
#include <vector>

// easy vector output
template<class TYPE>
std::ostream & operator<<(std::ostream & out,
                          const std::vector<TYPE> & in)
{
    for (const TYPE &val: in)
    {
        out << "["<< val << "]";
    }
    return out;
}

// find the first of many string delimiters
size_t multifind(size_t start,
                 const std::string & source,
                 const std::vector<std::string> &delims,
                 size_t & delfound)
{
    size_t lowest = std::string::npos;
    for (size_t i = 0; i < delims.size(); i++)
    {
        size_t pos = source.find(delims[i], start);
        if (pos == start)
        {
            lowest = pos;
            delfound = i;
            break;
        }
        else if (pos < lowest)
        {
            lowest = pos;
            delfound = i;
        }
    }
    return lowest;
}

// do the grunt work
std::vector<std::string> splitString(const std::string &source,
                                     const std::vector<std::string> &delims)
{
    std::vector<std::string> tokens;

    size_t current = 0;
    size_t delfound;
    size_t next = multifind(current,
                            source,
                            delims,
                            delfound);
    while(next != std::string::npos)
    {
        if (current < next)
        {
            tokens.push_back(source.substr(current, next - current));
        }
        tokens.push_back(delims[delfound]);
        current = next + delims[delfound].length();
        next = multifind(current,
                         source,
                         delims,
                         delfound);
    }
    if (current < source.length())
    {
        tokens.push_back(source.substr(current, std::string::npos));
    }
    return tokens;
}


void test(const std::string &source,
          const std::vector<std::string> &delims)
{
    std::cout << "Source " << source << std::endl;
    std::cout << "Delims " << delims << std::endl;
    std::cout << "Result " << splitString(source, delims) << std::endl << std::endl;
}

int main()
{
    test(":=foo+bar  .   :=baz+quaax:=  C++", { " ",":=","+" });
    test(":=foo+bar  .   :=baz+quaax:=  C++", { ":=","+"," " });
    test(":=foo+bar  .   :=baz+quaax:=  C++", { "+"," ",":=" });
    test(":=foo+bar  .   :=baz+quaax:=  C++", { "+"," ",":=",":" });
    test(":=foo+bar  .   :=baz+quaax:=  C++", { ":"," ",":=","+" });
    test("foo+bar  .   :=baz+quaax:=  C++lalala", { "+"," ",":=",":" });
}
user4581301
  • 29,019
  • 5
  • 26
  • 45
0

Try this

#include <iostream>
#include <string>
#include <vector>
#include <map>

std::vector<std::string> splitString(std::string input, std::vector<std::string> delimeters);
std::string findFirstOf(std::string input, std::vector<std::string> del);

int main()
{
    std::vector<std::string> words = splitString(":=foo+bar :=baz+quaax", { " ",":=","+" });
    for (std::string str : words)
        std::cout << str << ",";
    std::cout << std::endl;
    system("pause");
}
std::vector<std::string> splitString(std::string input, std::vector<std::string> delimeters)
{
    std::vector<std::string> result;
    size_t pos = 0;
    std::string token;
    std::string delimeter = findFirstOf(input, delimeters);

    while(delimeter != "")
    {
        if ((pos = input.find(delimeter)) != std::string::npos)
        {
            token = input.substr(0, pos);
            result.push_back(token);
            result.push_back(delimeter);
            input.erase(0, pos + delimeter.length());
        }
        delimeter = findFirstOf(input, delimeters);
    }
    result.push_back(input);
    return result;
}
//find the first delimeter in the string
std::string findFirstOf(std::string input, std::vector<std::string> del)
{

    //get a map of delimeter and position of delimeter
    size_t pos;
    std::map<std::string, size_t> m;

    for (int i = 0; i < del.size(); i++)
    {
        pos = input.find(del[i]);
        if (pos != std::string::npos)
            m[del[i]] = pos;
    }

    //find the smallest position of all delimeters i.e, find the smallest value in the map

    if (m.size() == 0)
        return "";

    size_t v = m.begin()->second;
    std::string k = m.begin()->first;

    for (auto it = m.begin(); it != m.end(); it++)
    {
        if (it->second < v)
        {
            v = it->second;
            k = it->first;
        }
    }
    return k;
}

output: ,:=,foo,+,bar, ,,:=,baz,+,quaax,.

Acha Bill
  • 1,197
  • 6
  • 16