1

Hi i'm trying to extract all numbers (including decimals) before the ' character using PostgreSQL.

I'm not sure how to specify this because my regular expression is contained between these characters itself. See:

SUBSTRING(long_text, '[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)(?=')') as substring_feet

This results in the following error:

syntax error at or near "') as substring_feet
            --,SUBSTRING(long_text, (position((substring(long_text FROM '"
LINE 6: ...ng_text, '[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)(?=')') as subs...
                                                         ^

I have tried various escapes etc. but can't figure it out.

The expression passes ok using the online tester because this doesn't account for the quotes needed for my query.

Beeman
  • 43
  • 8

1 Answers1

1

To extract all occurrences, you may use regex_matches function with g flag, but you must also make sure you have no capturing groups in the pattern, replace them with non-capturing ones (i.e. (?:...)). To use a single quote, double it (it is common knowledge).

Use something like

SELECT regexp_matches(long_text, '[+-]?(?:[0-9]+(?:[.][0-9]*)?|[.][0-9]+)(?='')', 'g') as substring_feet

Note you may get rid of the lookahead if you wrap the part you want to extract with a capturing group (i.e. (...)):

SELECT regexp_matches(long_text, '([+-]?(?:[0-9]+(?:[.][0-9]*)?|[.][0-9]+))''', 'g') as substring_feet

See an online demo

If you really have an inverted comma, use

SELECT regexp_matches(long_text, '([+-]?(?:[0-9]+(?:[.][0-9]*)?|[.][0-9]+))’', 'g') as substring_feet

See another online demo.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • In your online example the input text is 'approx 3''' whereas in all the text fields I need to search it the text occurs as approx 5'. roughly 2'. height of 3'. etc. (i.e the inverted comma is used just once) – Beeman Apr 13 '18 at 11:42
  • @Beeman Was it an inverted comma? Why didn't you use it in the example in the question? Then use http://rextester.com/WBPB26247 – Wiktor Stribiżew Apr 13 '18 at 11:48
  • sorry i've got my terminology mixed up: not an inverted comma but a single quotation mark as described in the example. – Beeman Apr 14 '18 at 11:20
  • @Beeman Then my answer should be working for you. Please consider accepting then. – Wiktor Stribiżew Apr 14 '18 at 11:51
  • hi, no the example doesn't work because in my example the long_text field refers to a a column on a table that contains long passage of text. See my online example in the original question. – Beeman Apr 15 '18 at 12:13
  • @Beeman [It **WORKS**](https://regex101.com/r/D6q423/1), it extracts `3`. – Wiktor Stribiżew Apr 15 '18 at 12:53
  • got it! thanks for your help and sorry for my confusion. – Beeman Apr 15 '18 at 16:13