0

Target string:

Hello("hey",a1,a2,a3(a4()),1)

Output I want to get:

"hey",a1,a2,a3(a4()),1

What have I tried:

int main()
{
    std::string input = "Hello(\"hey\",a1,a2,a3(a4()),1)";

    std::string regx = R"(Hello\(([\s\S]*?)\))";
    std::smatch matches;
    if (std::regex_search(input, matches, std::regex(regx)))
    {
        std::cout << matches[1] << std::endl;
    }

    return 0;
}

The wrong output I got:

"hey",a1,a2,a3(a4(
 
cigien
  • 50,328
  • 7
  • 37
  • 78
send but
  • 37
  • 6
  • OT: I found this site great for testing regular expressions: [https://regexr.com/](https://regexr.com/) – drescherjm Dec 06 '20 at 23:19
  • @drescherjm I do not understand how clean I ask, I get all kinds of minuses. This site is not for asking questions? – send but Dec 06 '20 at 23:20
  • 2
    You don't want a regex for this job. See https://stackoverflow.com/questions/546433 – cigien Dec 06 '20 at 23:20
  • Don't worry about the downvotes. Your question is clear, you've stated your requirements, provided an MCVE, and shown expected/output. The only reason I can think of for a down-vote is that you tagged the question with both C and C++. Note that they are different languages, and both tags rarely apply for a question. – cigien Dec 06 '20 at 23:22
  • @cigien finally a real help. This must be exactly what I'm looking for. Thank you very much. – send but Dec 06 '20 at 23:22
  • No problem. Take some time to read that link, and if it answers your question, you can close your question as a duplicate of it. – cigien Dec 06 '20 at 23:23
  • @cigien I see that there are two different approaches here. but I guess this is my answer: R"([^\(]*Hello(\(.*\))[^\)]*)"; – send but Dec 06 '20 at 23:27
  • Sure, test it thoroughly, and if it solves your problem, go ahead and post it as an answer. That will be helpful to future visitors. – cigien Dec 06 '20 at 23:28
  • @cigien yes, works well. i am going to close this thread. thank you – send but Dec 06 '20 at 23:39
  • Wouldn't `.find_first_of()` and `.find_last_of()` find the parenthesis you are interested in and then a simple `.substr()` to extract? – David C. Rankin Dec 07 '20 at 00:34
  • @DavidC.Rankin I was reading something just about this, but I couldn't imagine it. Do you know of an example? ( There are many such strings in my input string. I put them in a while loop and clear as I find them. ) – send but Dec 07 '20 at 00:37
  • Please don't add a solution to the question. I've rolled back your last edit. Go ahead and add an answer instead. You can find the text of your solution in the revision history. See [answer] – cigien Dec 07 '20 at 01:03
  • @cigien but this solution didn't help me. because it works very slowly. – send but Dec 07 '20 at 01:04
  • @sendbut That's fine. You don't have to add it as answer. But you shouldn't add it to the question. Note that you can write an answer where you show your solution, and point out its weaknesses and strengths. That would be a nice answer. – cigien Dec 07 '20 at 01:05

2 Answers2

3

Probably the simplest way is just to use the standard member functions .find_first_of() to find the first '(' and .find_last_of() to find the last ')' and then extract the .substr() between them. For example:

#include <iostream>

int main (void) {
    
    std::string input = "Hello(\"hey\",a1,a2,a3(a4()),1)";
    size_t start =  input.find_first_of ("("),
             end =  input.find_last_of (")");
    
    if (start != std::string::npos && end != std::string::npos)
        std::cout << input.substr (start+1, end-start-1) << '\n';
}

(you can add an additional validation that end > start if you like)

Example Use/Output

$ ./bin/str_first_last_of_substr
"hey",a1,a2,a3(a4()),1

Note: it is probably better to use .find() and .rfind() in this case as @RemyLebeau suggests in the comments. The changes are as follows:

    size_t start =  input.find ("("),
             end =  input.rfind (")");

Multiple Patterns In Single String

As you changed the input string to contain multiple blocks to extract, the simply .find_first_of() and .find_last_of() will no longer work. Instead simply use .find(), or simply iterate over each character in the strng. Keep a counter to balance the parenthesis. When you encounter an open-parenthesis, increment the counter. When you find a close-parenthesis, decrement the counter. When the counter reaches zero, extract the substring, e.g.

#include <iostream>

int main (void) {
    
    std::string input = "bla bla bla bla Hello(\"hey\",a1,a2,a3(a4()),1) "
                        "bla bla bla bla Hello(\"as\",d,c,v(f()),f) bla bla bla";
    size_t start = 0,
             end = start;
    int balance = 0;
    
    for (size_t i = start; input[i]; i++) {
        if (input[i] == '(') {
            if (balance == 0)
                start = end = i;
            balance++;
        }
        else if (input[i] == ')') {
            balance--;
            if (balance == 0)
                end = i;
        }
        if (start != end && balance == 0) {
            std::cout << input.substr (start+1, end-start-1) << '\n';
            start = end;
        }
    }
}

Example Use/Output

$ ./bin/str_first_last_of_substr2
"hey",a1,a2,a3(a4()),1
"as",d,c,v(f()),f

Extracting Only 1st Block After "Hello("

To extract only the first block of interest, you just need to limit the number of blocks you process. So you can simply keep a counter, initialized to zero, and after processing the first block, increment the counter, and add the non-zero counter as part of your for loop exit clause, e.g.

#include <iostream>

int main (void) {
    
    std::string input = "bla bla bla bla Hello(\"hey\",a1,a2,a3(a4()),1) "
                        "bla bla bla bla Hello(\"as\",d,c,v(f()),f) bla bla bla";
    size_t start = 0,
             end = start,
           count = 0;
    int balance = 0;
    
    for (size_t i = 0; !count && input[i]; i++) {
        if (input[i] == '(') {
            if (balance == 0)
                start = end = i;
            balance++;
        }
        else if (input[i] == ')') {
            balance--;
            if (balance == 0)
                end = i;
        }
        if (start != end && balance == 0) {
            std::cout << input.substr (start+1, end-start-1) << '\n';
            count++;
            start = end;
        }
    }
}

Example Use/Output

$ ./bin/str_first_last_of_substr3
"hey",a1,a2,a3(a4()),1

Parsing information from strings is really simple once you make friends with it. The key is always (1) know what your current position is in the string, (2) know what you are looking for, and (3) once found, properly handle extracting what you need and reset for the next search, ... repeat until you run out of string or reach some other predefined exit condition :)

Look things over and let me know if you have further questions.

David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
  • if we make input : "bla bla bla bla Hello(\"hey\",a1,a2,a3(a4()),1) bla bla bla bla Hello(\"as\",d,c,v(f()),f) bla bla bla" it wont work – send but Dec 07 '20 at 00:58
  • No, of course not. This was for the sample input. If you have multiple groupings, then you can either tokenize the blocks using `.find()` and work your way down the string. This was for the case shown. – David C. Rankin Dec 07 '20 at 01:08
  • `find()` would be better than `find_first_of()`, and `rfind()` instead of `find_last_of()`. – Remy Lebeau Dec 07 '20 at 01:08
  • @RemyLebeau - agreed for multiple searches. I'll add an example. – David C. Rankin Dec 07 '20 at 01:24
  • @DavidC.Rankin i am checking and trying to learn algoritm. so i can figure out how to make it works for "Hello(" thank you much! – send but Dec 07 '20 at 01:28
  • 1
    There it won't output anything because there is only one `'('` so without a closing `')'` to make `balance == 0`, nothing will ever print. If you look the algorithm just works down the string. When it finds a `'('` it add `+1` to `balance`. When it finds `')'` it subtracts `-1` from `balance`. When `balance == 0` your want to extract what is between the opening `'('` and closing `')'`. The combination of counters `start != end` is used as a flag to not extract anything when `balance == 0`. When working with balancing parens, there won't be one size-fits-all that handles unbalanced too. – David C. Rankin Dec 07 '20 at 01:38
  • @DavidC.Rankin I look like someone asking everything from here. but I couldn't find a way to do this. What I wanted was not to get the contents of both functions. it was just to get the value that was in the first Hello (). but I don't understand anything about indexes. – send but Dec 07 '20 at 02:16
  • That's not much different. Just use `size_t start = input.find ("Hello(")'` (this is optional) and then keep a counter and when you have extracted one block, exit. E.g. `size_t count = 0;`. Then after `std::cout << input.substr (start+1, end-start-1) << '\n';` add `count++;` and then to exit after 1 output add `for (size_t i = start; !count && input[i]; i++)` and it should work fine. – David C. Rankin Dec 07 '20 at 02:20
  • @DavidC.Rankin You are a genius, nothing else <3 – send but Dec 07 '20 at 02:25
  • You will get it. We all learned the same way. Sitting there figuratively beating our heads into the wall until the light-bulb comes on and you find the solution. The only variable in the game is how big the lump on your head gets before the light-bulb comes on. After you have done it long enough, it becomes easier to put it altogether -- but that is the same no matter what it is you are learning. As with all things in life, you have got to pay the price of admission -- if you want to see the show. (there really are no shortcuts `:)` – David C. Rankin Dec 07 '20 at 02:34
  • :) I hit the head again. I am getting a string subscript out of range error. @DavidC.Rankin – send but Dec 07 '20 at 02:44
  • The only way that happens is if `start` or `end` isn't properly initialized. You also may need to add a check on if your substring begins as the first or last character in the string. (I didn't check all different permutations there). If you post the string, I'll see what is happening. Otherwise, your loop will exit on either `count != 0` or reaching the *nul-termianting* character in `input`. Did you look at the example for this case I added? (the one with executable named `str_first_last_of_substr3`) Looking -- all corner-cases are covered -- so that isn't it. – David C. Rankin Dec 07 '20 at 02:48
  • @DavidC.Rankin I checked your sample. unfortunately I can't see what the string is because I'm downloading the strings from a website. kind of a boat. I tried try catch but it didn't work. I'll first print out each string and then implement it. so I will try to determine which one has an error. – send but Dec 07 '20 at 02:56
  • Chuckling..... Beware of Hackerrank and other similar sites -- they are all about the odd corner-cases. What happens if the string is `""` (empty-string). What happens if it is a string of all `"(((("` or only `"((()))"`. You need to think of all possible cases and add checks to make sure your code doesn't fail. The code will fail for `"()"` because `.substr (start+1, end-start-1)` will be out of range. Add a check to ensure `end - start > 1`, etc... For example change to `if (end - start > 1 && balance == 0)` – David C. Rankin Dec 07 '20 at 03:03
  • This will be difficult :)) end - start> 1 flag didn't work. The strings are too long and it's hard to debug this. what i'm wondering is is i getting this error while i print this or find the substring? I'm looking at this now – send but Dec 07 '20 at 03:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225600/discussion-between-david-c-rankin-and-send-but). – David C. Rankin Dec 07 '20 at 03:11
1

For a search as simple as this one, a regular expressions is overkill. Brute force is much simpler. Here’s a sketch:

std::string::size_type first = input.find(‘(‘);
if (first == std::string::npos)
    throw match_not_found();
std::string last = first;
int depth = 1;
while (depth != 0) {
    last = input.find_first_of(“()”, last + 1);
    if (last == std::string::npos)
        throw match_not_found();
    else if (input[last] == ‘(‘)
        ++depth;
    else // input[last] == ‘)’
        —-depth;
}
std::string result = input.substr(first + 1, last - first);

Not tested.

Pete Becker
  • 69,019
  • 6
  • 64
  • 147