1

I need to extract one node value inside a particular node in a long JSON string.

i.e. description node of person node:

"person":{"age":"10", "description":"example",job:{"title":"sales","salary":"$3000"}, "sex":"male"}

As using a JSON library to deserialize a long JSON string wasted too much time, I googled and found a regex to match one JSON node value and it works fast most of the time (taking a few ms)

Regex regex = new Regex("\"person\":{(?:[^{}]|(?<open>{)|(?<-open>}))*\"description\":\"(.*?)\"(?:.*?)(?(open)(?!))}");

It works slowly (using a full second) when a duplicated node name exists in some inner node after the match happened for an unknown reason:

"person":{"age":"10", "description":"example", job:{"title":"sales", "salary":"$3000", "description":"example"}, "sex":"male"}

I want to improve this regex efficiency so that it will only check the outermost content for the person node (bolded):

"person":{"age":"10", "description":"example", job:{"title":"sales", "salary":"$3000", "description":"example"}, "sex":"male"}

I am new to regex, this regex may not suit my situation.
Any idea?

Lucas Trzesniewski
  • 47,154
  • 9
  • 90
  • 138
Isolet Chan
  • 307
  • 4
  • 17
  • 1
    It should be easier (and probably faster) just to parse the JSON and pull the values you want. Ref. to [System.Web.Helpers.Json](http://msdn.microsoft.com/en-us/library/system.web.helpers.json%28v=vs.111%29.aspx) – Wagner DosAnjos Oct 03 '14 at 12:22
  • What appens when desciption contains quotes like `"description":"example \"blah\""` – Toto Oct 03 '14 at 12:22
  • What do you want to extract? – Avinash Raj Oct 03 '14 at 12:24
  • @wdosanjos I used JSON.NET to parse a fairly short json using JObject.Parse(json) and already used about 60ms. – Isolet Chan Oct 03 '14 at 12:35
  • @M42 You are right. Luckily in my case, the value will never contain \" making this question simpler – Isolet Chan Oct 03 '14 at 12:37
  • @AvinashRaj I want to extract description node of person node but the regex needs to be efficient and ignores the content inside all inner node like job node – Isolet Chan Oct 03 '14 at 12:39
  • Could the description be "behind" the job-part or do you know for certain it is in front of any sub-nodes? – asontu Oct 03 '14 at 13:53
  • @funkwurm Yes, it may be behind the job-part. The selected answer can match every situation as long as it belongs to the outermost node – Isolet Chan Oct 03 '14 at 14:07

1 Answers1

1

This regex should work for your case and should be faster:

"person"\s*:\s*\{(?:
  (?(open)(?!)|(?>"description"\s*:\s*"(?<description>(?:\\.|(?>[^\\"]+))*)"))
  |(?>[^{}"]+)
  |(?>(?:"(?:\\.|(?>[^\\"]+))*"))
  |(?<open>\{)
  |(?<-open>\})
)*?
(?(open)(?!))
(?(description)|(?!))

DEMO

Use it with IgnorePatternWhitespace. It will properly handle escaped quotes (\") too. Your description will be in the description named group.

It should be faster because I used atomic groups ((?>...)) in a few places where I know backtracking is useless, and also I made it stop scanning the text as soon as it finds the description it wants.

Oh, and you should use verbatim strings for regexes so you don't have to escape everything:

var regex = new Regex(@".....")
Lucas Trzesniewski
  • 47,154
  • 9
  • 90
  • 138
  • This doesn't catch properties "behind" the `job` leaf, in OP's example `"sex":"male"`. – asontu Oct 03 '14 at 13:43
  • @funkwurm in my understanding, the asker only needs the `description` node, so my regex stops as soon as it captures it. – Lucas Trzesniewski Oct 03 '14 at 13:45
  • Yes, i just need one node. And this regex works really well and faster in every situation comparing to the regex i was using! I wish I can build this kind of complicated but efficient regex like you. Thank you so much! – Isolet Chan Oct 03 '14 at 14:05
  • 2
    @IsoletChan You're welcome. If you want to master regexes see [this post](http://stackoverflow.com/a/22944075/3764814) and especially the advice about Jeffrey Friedl's book. – Lucas Trzesniewski Oct 03 '14 at 14:16
  • Will definitely buy the book! :) – Isolet Chan Oct 03 '14 at 14:22