1

I have a the following sentence:

[[Image:Levellers declaration and standard.gif|thumb|200px|Woodcut from a [[Diggers]] document by william everard]]

Regular expression

I am using the following regular expression in order to extract the words that are between brackets [[ ]]:

regex = "\\[\\[(.*?)\\]\\]"

The output should be equal to the following sentence:

Image:Levellers declaration and standard.gif|thumb|200px|Woodcut from a [[Diggers]] document by william everard

I want to only take into consideration the left-most brackets ]]


Problem

regex will extract [[Image:Levellers declaration and standard.gif|thumb|200px|Woodcut from a [[Diggers]] and will leave document by william everard]].


Question

How can I ignore the inner brackets represented by ]].


Update V0

I wrote a simple program inspired by BalancedParentheses.cpp in order to locate the beginning and ending of the text between brackets in a string.


Source code

    #include <stack>
    #include <iostream>
    #include <vector>
    #include <string>
    
    using namespace std;
    bool AreParanthesesBalanced(string exp)
    { 
        stack<char>  S;
        vector<pair<int, int>> index;
        int end;
        vector<int> start;
        for(int i = 0; i < exp.length(); i++)
        {
           if(exp[i] == '(' || exp[i] == '{' || exp[i] == '[')
           {
             S.push(exp[i]);
             start.push_back(i);
            }
                    
            else if(exp[i] == ')' || exp[i] == '}' || exp[i] == ']')
            {
                if(S.empty() || !ArePair(S.top(),exp[i]))
                {
                  return false;
                }
                  
               else
               {
                 S.pop();
                 end = i;
                 index.push_back(make_pair(start[start.size() - 1] ,end));
                 start.pop_back();
               }      
            }
         }
                
         for(int i = 0; i < index.size(); i ++)
         {
            cout << index[i].first <<"  "<< index[i].second << endl;
         }
       return S.empty() ? true:false;
   }
        
        int main()
        {
          string exp = "[[Image:Levellers declaration and standard.gif|thumb|200px|Woodcut from a [[Diggers]] document by william everard]] ";
        
           bool x = reParanthesesBalanced(exp);
           return 0;
        }

Output

75  83
74  84
1  113
0  114
Community
  • 1
  • 1
Hani Goc
  • 2,111
  • 3
  • 35
  • 76

1 Answers1

1

I think your issue is that you use a lazy (*?) quantifier, instead of a greedy (*) one.

The lazy quantifier will stop as soon as it meets the first ]] in the string, even if it is not the final one.

You can simply modify your regex to this :

regex = "\\[\\[(.*)\\]\\]"
Maximilian Ast
  • 2,954
  • 12
  • 33
  • 40
Theox
  • 1,365
  • 9
  • 20
  • If it were that simple, it would suffice to trim the leading and trailing `[`s and `]`s. – Wiktor Stribiżew Aug 06 '15 at 11:15
  • @Theox So I will prevent the regular expression from exiting when it meets the first ]]. – Hani Goc Aug 06 '15 at 11:20
  • 2
    @HaniGoc: On string with 2 of these on one line, you will get only one of them. This solution is broken. – nhahtdh Aug 06 '15 at 11:20
  • yes you are right @nhahtdh. well my mistake i didn't ask the question correctly. you are absolutly right. By using the PCRE library will I be able to solve this problem? – Hani Goc Aug 06 '15 at 11:22
  • I think @nhahtdh this link will help http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets – Hani Goc Aug 06 '15 at 11:23
  • 1
    @HaniGoc: It should be possible. You should look up questions on bracket balancing in PHP to see how to write such pattern. – nhahtdh Aug 06 '15 at 11:23