3

I have a block of text as such.

google.sbox.p50 && google.sbox.p50(["how to",[["how to tie a tie",0],["how to train your dragon 2 trailer",0],["how to do the cup song",0],["how to get a six pack in 3 minutes",0],["how to make a paper gun that shoots",0],["how to basic",0],["how to love lil wayne",0],["how to sing like your favorite artist",0],["how to be a heartbreaker marina and the diamonds",0],["how to tame a horse in minecraft",0]],{"q":"XJW--0IKH6sqOp0ME-x5B7b_5wY","j":"5","k":1}])

Using \\[([^]]+)\\] I am able to get everything I need, but with a little extra that I don't. I do not need the ["how to",[[. I only need the blocks that are formatted like,

["how to tie a tie",0]

Can someone please help me modify my expression to only get what I need? I've been at it for hours and I can't grasp the idea of RegEx.

Jerry
  • 67,172
  • 12
  • 92
  • 128
ArtW
  • 63
  • 4

5 Answers5

3

Put both the opening and closing square brackets in the negated character class?

\\[([^][]+)\\]

\\[ matches a literal [

\\] matches a literal ]

[^][] is a negated class, which for instance matches any character except ][. It might be a little difficult to see it, but it's equivalent to [^\\]\\[]. Here the double escapes are not required because you are using a character class (just like \\. is equivalent to [.])

([^][]+) captures everything within square brackets, making sure there's no ] or [ inside.

In C#, you can use the @ symbol to avoid having to double escape everytime and using this makes the regex like that:

var regex = new Regex(@"\[([^][]+)\]");

Note: This regex will capture everything within square brackets. If you wish to specificly get the format ["how to tie a tie",0], you can be more precise. After all, the regex will only match stuff you make it match:

var regex = new Regex(@"\["[^"]+",0\]");

Here, we have another negated character class: [^"]. This will match any character which is not a quote character.

This one assumes that the digit is always 0, as depicted in your sample text block. If you have multiple possibilities of numbers, you can use the character class [0-9]+:

var regex = new Regex(@"\["[^"]+",[0-9]+\]");

You can use \d+ as well, but this character class also matches other characters which may or may not render the regex worse. If you want to be more even cautious by allowing possible spaces, tabs, newlines, form feeds in between the characters, you can use this regex:

var regex = new Regex(@"\[\s*"[^"]+"\s*,\s*[0-9]+\s*\]");

Conclusion, there might be many regexes which suit what you need, just make sure you know how your data is coming through so you can pick one which has the right amount of freeway.

Community
  • 1
  • 1
Jerry
  • 67,172
  • 12
  • 92
  • 128
1

I think this is what you are looking for to match the format of ["how to tie a tie",0]:

(\["[^"]+",\d\])

( ) - around the whole thing so it all gets captured in this group
\[" - find ["
[^"]+ - find one or more of anything except "
", - find ",
\d - find a number, if you want more than just a single digit, do \d+
\] - match the ending ]

The only variable things in this regex are whatever is within the quotes ([^"]+) and the number (\d+).

Demo

If you don't want the square brackets in the capture group, you can do it like this:

\[("[^"]+",\d+)\]

I assume you don't want to match if there are quotes within your quotes as it would probably break whatever purpose you are using it for, but if you do, this should work:

\[("[^[\]]+",\d+)\]
Dallas
  • 17,186
  • 21
  • 64
  • 82
  • Thanks a million. Explaining it like that to me actually prevented me from asking about my other expression. – ArtW Aug 29 '13 at 19:44
  • @ArtW, out of curiosity, if you don't mind sharing - what was the other question/expression? – Dallas Aug 29 '13 at 19:45
  • I wanted to get "XJW--0IKH6sqOp0ME-x5B7b_5wY". so with your help I came up with \{\"[a-z]\"\:\"\w*\" it gives me a tad more then I want but with .split(':')[1] I can get what I want. I understand I have a lot to learn with RegEx – ArtW Aug 29 '13 at 20:23
  • @ArtW, several of your escapes are unnecessary in that regex. Also, you could use a capture group `()` to get specifically what you want using a regex like this: `{"[a-z]":"(\w+)"`. [This link](http://msdn.microsoft.com/en-us/library/bs2twtah.aspx) has details on how to use capture groups in c#. – Dallas Aug 29 '13 at 20:28
0

You must use this pattern

@"\[[^][]+\]"

More informations about square brackets here.

Community
  • 1
  • 1
Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
0

I think you need this one: (\[[^\[^]+?])

What you did mis is the ? (smallest match) and exclude any [ or ]

Jeroen van Langen
  • 18,289
  • 3
  • 33
  • 50
0

Seemingly the text in the outer brackets is a JSON representation of an object. Instead of a regular expression I'd just:

  1. strip off the stuff before the bracket + first bracket (google.sbox.p50 && google.sbox.p50() plus strip off the trailing bracket ). There are more ways to do this, and it can be more efficient than regex.
  2. JSON parse the remaining inner part.
  3. From that point you have the object representation, you can leave out the first element of the array what you don't need, plus you have everything else in a traversable form.

There's the session information at the end along with parameters anyway (in {} brackets), so in the end you may end up parsing stuff anyway. Better not to reinvent the wheel (JSON parsing).

Csaba Toth
  • 8,153
  • 4
  • 62
  • 100