1

I am using a data analysis package that exposes a Regex function for string parsing. I am trying to parse a response from a website that is in the format...

key1=val1&key2=val2&key3=val3 ...

[There is the possibility that the keys and values may be percent encoded, but the current return values are not, the current return values are tokens and other info that are alphanumeric].

I understand this data to be www-form-urlencoded, or alternatively it might be known as query string format.

The object is to extract the value for a given key, if the order of the keys cannot be relied upon. For example, I might know that one of the keys I should receive is "token", so what regex pattern can I use to extract the value for the key "token"? I have searched for this but cannot find anything that does what I need, but if there is a duplicate question, apologies in advance.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Steve Hibbert
  • 1,721
  • 2
  • 24
  • 45
  • Try the usual `[?&]token=([^&]*)`, a capturing group based pattern. Or a lookbehind one, `(?<=[?&]token=)[^&]*`. – Wiktor Stribiżew Oct 23 '17 at 08:20
  • Which programming language do you use? Probably there's a function for that common task... – Jan Oct 23 '17 at 08:26
  • @Wiktor Stribiżew Found something just now, seems it is simpler than I thought it would be... key2=([^\&]+). Forgot to say thanks for the response, much obliged. – Steve Hibbert Oct 23 '17 at 08:29
  • @Jan It's an app called Alteryx, not a language, but I think it is written in C#. There are lovely query string parsers in C#, but they are not directly exposed. Thanks for the response. – Steve Hibbert Oct 23 '17 at 08:32

1 Answers1

5

In Alteryx, you may use Tokenize with a regex containing a capturing group around the part you need to extract:

The Tokenize Method allows you to specify a regular expression to match on and that part of the string is parsed into separate columns (or rows). When using the Tokenize method, you want to match to the whole token, and if you have a marked group, only that part is returned.

I bolded the part of the method description that proves that if there is a capturing group, only this part will be returned rather than the whole match.

Thus, you may use

(?:^|[?&])token=([^&]*)

where instead of token you may use any of the keys the value for which you want to extract.

See the regex demo.

Details

  • (?:^|[?&]) - the start of a string, ? or & (if the string is just a plain key-value pair string, you may omit ? and use (?:^|&) or (?<![^&]))
  • token - the key
  • = - an equal sign
  • ([^&]*) - Group 1 (this will get extracted): 0 or more chars other than & (if you do not want to extract empty values, replace * with + quantifier).
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397