0

I have a file which is of the following form :

some text
some more text
. . .
. . .
data {
1 2 3 5 yes 10
2 3 4 5 no  11
}
some text
some text

I want to extract the data portion of the file using regular expression using the following procedure:

proc ExtractData {fileName} {
    set sgd [open $fileName r]
    set sgdContents [read $sgd]
    regexp "data \\{(?.*)\\}" $sgdContents -> data
    puts $data
}

But this is giving the following error:

couldn't compile regular expression pattern: quantifier operand invalid

I am not able figure out what is wrong with regular expression. Any help would be highly appreciated.

Dronacharya
  • 401
  • 1
  • 4
  • 12
  • 1
    Funny, you misplaced the question mark. Use `.*?` instead of `?.*` :) – HamZa Jul 18 '13 at 08:48
  • That works, thank you so much :) I was looking at examples in http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm where the mechanism of `?` isn't clear enough. If you could write a short explanation about the same it would be very helpful to a lot of people. – Dronacharya Jul 18 '13 at 08:49
  • Take a look at [this question](http://stackoverflow.com/questions/2301285). As for your question you made, it's about a "typo" so it isn't really suited for SO and if you changed it then it would be a duplicate :) – HamZa Jul 18 '13 at 08:56
  • Ah [this answer](http://stackoverflow.com/questions/3075130/difference-between-and-for-regex/3075532#3075532) seems more juicy :p – HamZa Jul 18 '13 at 09:00
  • I agree with you about that, thanks for the link. But there is one more problem, I want to capture only the text between the braces, how do I do that? I know that is possible, but don't know exactly how to do that. – Dronacharya Jul 18 '13 at 09:04
  • You have a capturing group so you could just use group 1. Otherwise use a lookahead-behind. – HamZa Jul 18 '13 at 09:06
  • Oops, it seems that tcl doesn't support lookbehinds. So your only bet is to use group 1 :P – HamZa Jul 18 '13 at 09:09
  • `()` denotes a capturing group I assume, but that seems to be placed correctly here, still I am getting all the lines starting from `data` upto } – Dronacharya Jul 18 '13 at 09:10
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/33686/discussion-between-dronacharya-and-hamza) – Dronacharya Jul 18 '13 at 09:15

2 Answers2

1

Use this regular expression

regexp {data \{(.*)\}} $sgdContents wholematch submatch
puts $submatch

wholematch matches the entire pattern. In your case it is

data {
1 2 3 5 yes 10
2 3 4 5 no  11
}

And submatch matches only the content inside braces like below:

1 2 3 5 yes 10
2 3 4 5 no  11
Varun
  • 641
  • 4
  • 9
0

The following regexp line works

regexp "data \\{\\\n(.*?)\\\n\\s*\\}" $sgdContents -> data

The only major thing wrong with the original regular expression was misplacement of the non-greedy match indicator (?), which directs the regular expression engine to stop matching as soon as first match is found.

Dronacharya
  • 401
  • 1
  • 4
  • 12