0

I'm struggling to parse a key:value pair in a JSON-like string. I know people will automatically say "Use JSON.parse() for this!" and I absolutely agree. The problem is that I'm not dealing with JSON strings, but JSON-like strings.

At least my attempts of parsing these strings with JSON.parse have failed (I've tried to sanitize the string so that JSON.parse doesn't complain about malformed strings)

The problem I have is that the JSON-like string sometimes is truncated and some other times is not. What is guaranteed to happen is that the key publicProfileUrl will be in the text, all the time (or at least that's been consistent with observations) and I need to parse its value:

For example, this is an example of the string:

%%"fullName":"Eduardo Saverin",
"contactInfo":{
"publicProfileUrl":"https://sg.linkedin.com/in/saverin",
"twitterAccounts":["esaverin"],
"websites":[]},
"industry":"Internet",%%

all I'm interested in is parsing the value of publicProfileUrl.

This is my latest attempt at doing it:

\"publicProfileUrl\":\"(.*)\",

but it is matching all the way to the last comma (I added line breaks for formatting purposes only, but the original string doesn't have any line breaks).

Here's the original string:

%%"fullName":"Eduardo Saverin","contactInfo":{"publicProfileUrl":"https://sg.linkedin.com/in/saverin","twitterAccounts":["esaverin"],"websites":[]},"industry":"Internet",%%
ILikeTacos
  • 13,823
  • 16
  • 52
  • 81
  • Make your match "non greedy". See here http://stackoverflow.com/questions/11898998/how-can-i-write-a-regex-which-matches-non-greedy – Carsten Massmann Apr 01 '17 at 19:04

3 Answers3

2

So, something like

\"publicProfileUrl\":\"(.*?)\",

should work.

If you want to be absolutely safe:

As others have pointed out, this is not always "watertight". In your current application (url!) it is probably not an issue, but in a general case we might encounter an escaped " followed by a comma, like in "this is \"it\", no doubt!", which is supposed to be part of our target string. This pattern would so far cause a premature end of our target string. If we modify the regexp a little by adding a [^\\] into our search group then even this nasty little pattern can cause us no harm any more:

\"publicProfileUrl\":"(.*?[^\\])\",
Carsten Massmann
  • 16,701
  • 2
  • 16
  • 39
1

For the group matching add ? which means as little as possible

\"publicProfileUrl\":\"(.*?)\",
Gabriele Petrioli
  • 173,972
  • 30
  • 239
  • 291
  • Awesome, despite someone else having an identical answer. I'm choosing yours because you wrote what does the `?` means – ILikeTacos Apr 01 '17 at 19:12
1

Try excluding the closing double quote in your capture:

\"publicProfileUrl\":\"([^"]*)\",

Normally, line breaks would workaround the greedy matching

strider
  • 4,314
  • 2
  • 21
  • 29
  • using the non-greedy modifier `?` is better though if in case you happen to have a (doubly?) escaped `"` in your value – strider Apr 01 '17 at 19:20