2

I'm looking for a regex to extract two numbers from the same text (they can be run independently, no need to extract them both in one go.

I'm using yahoo pipes.

Source Text: S$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)

Need to extract as a number: 1,475 and also (but can be extracted on a separate instance) Need to extract as a number: 137

I got the following pattern from someone quite helpful on a different forum:

\b(\d+(,\d+)*)\s+(sqft|sqm)

but when i go and use it with a replace $1, it brings back the whole source text instead of just the numbers i want (ie. 1,475 or 137 depending on whether i run \b(\d+(,\d+))\s+(sqft) or \b(\d+(,\d+))\s+(sqm)

what am i doing wrong?

Ahmad Mageed
  • 88,056
  • 18
  • 152
  • 168
macutan
  • 265
  • 2
  • 8
  • 21
  • What language? Regex has many variations. Can you post source code for what you have tried so far? – Mark Byers Feb 11 '10 at 02:20
  • 1
    If he's using the manual Yahoo! Pipes system (and I'm not even sure there's an API), then it's whatever Yahoo! uses, which does not involve source code but rather a GUI. – Matchu Feb 11 '10 at 02:23
  • It appears to support PCRE. See the link posted in the comments in my answer. – Nick Presta Feb 11 '10 at 02:30
  • @Matchu I believe the GUI is it. All I found was http://pipes.yahoo.com/pipes/docs?doc=operators#Regex and it doesn't mention the regex flavor used. – Ahmad Mageed Feb 11 '10 at 02:30

4 Answers4

2

Well you could do this by iterating through the matches and getting the results that way.

But if you want to use the replace method then this could work:

^.*?(?<sqft>\d+(,\d+)*)\s?sqft.*?(?<sqm>\d+(,\d+)*)\s?sqm.*$

And then replace with:

${sqft}
${sqm}

Here it is in action.

This will work with or without a comma in the sqft or sqm numbers. And the .* at the beginning, middle, and end forces it to match the entire string so that the replacement text eliminates everything except for what you're after.

Steve Wortham
  • 20,322
  • 4
  • 62
  • 86
  • this worked like a charm!!, how did you get it so fast?!?? thanks a lot! – macutan Feb 11 '10 at 02:31
  • 1
    @macutan: Be sure to click on the check mark next to an answer if it answered your question, so that the poster gets credit :) – Matchu Feb 11 '10 at 02:34
  • @macutan: Great. I actually just changed the number matching scheme a bit. The repeatable commas are a nice feature in your original post so it can match numbers like 1,234,567. So my revised regex above has incorporated that feature. And I've been practicing regular expressions a lot the past several months. I guess I've gotten faster. ;) – Steve Wortham Feb 11 '10 at 02:38
  • thanks steve. when try your pattern within the regex module in my pipe http://pipes.yahoo.com/pipes/pipe.edit?_id=c6af42d4ebb8a2afc2f139338bf9f627 (see middle column) i get a lot of the below problem in matching expression problem in matching expression problem in matching expression but when i put it on the site i use to test it http://www.gskinner.com/RegExr/ it seems to work,. what should i put within my yahoo pipe replace box for it to work? thanks to all once again for all of the answers... – macutan Feb 11 '10 at 03:30
  • thanks Steve, this worked, i just tried it again and refreshed the pipe and it worked!!!! Thank you All for your help. Ahmad, Nick, Don and Steve!!. – macutan Feb 11 '10 at 03:42
0

Since you didn't specify a language, here is some Python:

import re

s = "$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)"
print re.search(r'\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm', s).groups()
# prints ('1,475', '137')

Searches for any number, comma, or period after a word boundary, followed by an optional space, and the word 'sqft', then an optional space, a slash, an optional space space, followed by any number, comma, or period, an optional space, the word 'sqm'.

This should allow your formatting to be pretty loose (optional spaces, thousands and decimal separators).

Nick Presta
  • 26,924
  • 6
  • 51
  • 73
  • wow, that was very fast, i am using whatever language the regex module uses in yahoo pipe, how do i check that? usually what i test there it works on http://www.gskinner.com/RegExr/ – macutan Feb 11 '10 at 02:27
  • Does Yahoo! Pipes allow usage in any actual programming language, or just in its GUI? I thought it was the latter, but it's been years since I've actually touched them. – Matchu Feb 11 '10 at 02:28
  • Pipes appears to support PCRE (http://rsscases.marketingstudies.net/content/yahoo_pipes_regex_module.php) so what I posted above should work: `\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm` – Nick Presta Feb 11 '10 at 02:29
  • what i am trying to achieve is on the below pipe link http://pipes.yahoo.com/pipes/pipe.edit?_id=c6af42d4ebb8a2afc2f139338bf9f627 on the middle column within the regex module item.size_sqft, so in there i would like to put the pattern ideally get (within the debug panel below that item.size_sqft: 1,475 i try your patterns in http://www.gskinner.com/RegExr/ but when i go to the replace it doesn't seem to just give me the number and instead the line again with the number... where am i wrong? – macutan Feb 11 '10 at 03:26
0

In perl, I would write something like:

if ($line ~= m/\b([0-9.,]+) sqft/)
{
  $sqft = $1;
}
else
{
  $sqft = undef;
}

if ($line ~= m/\b([0-9.,]+) sqm/)
{
  $sqm = $1;
}
else
{
  $sqm = undef;
}
Don
  • 4,245
  • 24
  • 33
0

You may wish to consider the situations discussed in this answer in crafting a regex for numbers.

Community
  • 1
  • 1
tchrist
  • 74,913
  • 28
  • 118
  • 169