0

I have a column in which a string starts with - 'Chicago, IL, April 20, 2015 — and so on text here'. I want to extract the Date part from this string in Oracle. Any ideas on how to do this. I was able to find something for mm/dd/yyyy like below, but not for long date format.

SELECT REGEXP_SUBSTR(' the meeting will be on 8/8/2008', '[0-9]{1,}/[0-9]{1,}/[0-9]{2,}') FROM dual 
user747291
  • 791
  • 2
  • 15
  • 39

3 Answers3

2

You could use:

SELECT  TO_DATE(
          REGEXP_SUBSTR(
            'Chicago, IL, April 20, 2015 — and so on text here',
            '(JANUARY|FEBRUARY|MARCH|APRIL|MAY|JUNE|JULY|AUGUST|SEPTEMBER|'
              || 'OCTOBER|NOVEMBER|DECEMBER)'
              || '[[:space:]]+([012]?[0-9]|3[01])'
              || '[[:punct:][:space:]]+\d{4}',
            1,
            1,
            'i'
          ),
          'MONTH DD YYYY'
        )
FROM    DUAL;

If you want to validate the dates as well (so you don't get an error for February 29, 2001) then you could use a user-defined function:

CREATE FUNCTION parse_Date(
  in_string     VARCHAR2,
  in_format     VARCHAR2 DEFAULT 'YYYY-MM-DD',
  in_nls_params VARCHAR2 DEFAULT NULL
) RETURN DATE DETERMINISTIC
AS
BEGIN
  RETURN TO_DATE( in_string, in_format, in_nls_params );
EXCEPTION
  WHEN OTHERS THEN
    RETURN NULL;
END;
/

And replace the TO_DATE( ... ) function with PARSE_DATE( ... )

MT0
  • 86,097
  • 7
  • 42
  • 90
  • More general answer, should be the accepted one. I'd suggest changing `[12]?` to `[012]?` to catch dates like "April 01" though. Also it's still prone to errors like `February 31`, but catching all that with regexp is way too much work. – Timekiller Jun 06 '16 at 15:08
  • @Timekiller Updated to `[012]` and also added a way to handle erroneous dates (adapted from [my answer here](http://stackoverflow.com/a/35270553/1509264)). – MT0 Jun 06 '16 at 15:22
1

If your columns value is always start with 'Chicago, IL, April 20, 2015 — and so on text here' then you could simly use SUBSTR instead of REGEXP_SUBSTR

SELECT 
    SUBSTR(column_name
        ,INSTR(column_name, ',', 1, 2) + 1
        ,INSTR(column_name, '—') - INSTR(column_name, ',', 1, 2) - 1
    ) 
FROM 
    dual;

If not then you could use REGEXP_SUBSTR as other answer mention, my original answer is wrong as @MTO comment

Pham X. Bach
  • 4,501
  • 2
  • 21
  • 33
  • The regular expression will match: `Regarding point 42, 1984 is a fantastic book`. or even `99, 0000`. – MT0 Jun 06 '16 at 14:38
  • MTO got it, thank you. @user747291 you should update your query according to other answer. Mine is editted. – Pham X. Bach Jun 06 '16 at 14:43
  • Your `SUBSTR/INSTR` query will only work for 5-character month names and 2-digit days. If you want to match `May 1, 1900` or `December 25, 2016` then they are not 14 characters. It also relies on the date being appended with a hyphen (not sure this is always the case). – MT0 Jun 06 '16 at 14:47
  • @MT0 I used **INSTR - 14** to get `length_of_substr`, not get only 14 characters. **14** is started index of `SUBSTR(column_name, started_index, length_of_substr)` – Pham X. Bach Jun 06 '16 at 14:50
  • Ok, but then it won't work if the month is at the start of the string or if the OP picks a longer or shorter city name or if there is no hyphen. 14 is a magic number that might work in this specific case but not generally. – MT0 Jun 06 '16 at 14:56
  • MTO yes, so @user747291 you should check again to get your right answer – Pham X. Bach Jun 06 '16 at 14:59
1

Well, you can take a direct approach and use a regular expression like in the example that you've found:

SELECT 
  REGEXP_SUBSTR('Chicago, IL, April 20, 2015 - etc etc', '(January|February|March|April|May|June|July|August|September|October|November|December) [0-9]{1,2}, [0-9]{4}')
FROM dual;

But this will only work properly if all the dates are in the exact same format. Full month name with first letter uppercased, space, day, comma, space, 4-digit year. If there can be more than one space or no space at all, use \s* instead of spaces in the regular expression. If the month name isn't necessarily initcap, use initcap() on source or case-insensitive flag for regexp_substr function.

Additionally, this will catch bogus dates that fit the format, like "April 99, 1234", you'll have to filter them later.

Timekiller
  • 2,546
  • 1
  • 14
  • 15