-3

I'm trying to match several date formats in spanish with a regular expression


For example: 03 DE Mayo 2020 03 DE May. 2020 03 DE May. del 2020 03 DE Mayo 20 03 DE May. 20 03 DE May. del 20 3 DE Mayo 2020 3 DE May. 2020 3 DE May. del 2020 3 DE Mayo 20 3 DE May. 20 3 DE May. del 20

What could be the right regex to match this?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

1 Answers1

0

The following regex would match all of the date formats you listed, however, you haven't described what you want the regex to exclude. Please let me know if this isn't selective enough.

^\\d{1,2} DE (?:\\w{3}\\.|\\w+) (?:del )?(?:\\d{4}|\\d{2})$

Explanation:

  • ^ Anchors the match to the start of the string
  • \\d{1,2} Matches one or two digit characters (this is the day)
  • DE Literally matches " DE ", which is present in all of the formats
  • (?:\\w{3}\\.|\\w+) Either matches 3 word characters and then a literal dot, or matches one or more word characters (this is the month)
  • Matches a literal space (there is always a space after the month)
  • (?:del )? Matches "del " if it is there, but doesn't require it
  • (?:\\d{4}|\\d{2}) Matches either 2 or 4 digit characters (this is the year)
  • $ Anchors the match to the end of the string

In Java, to test if a certain string str is one of those date formats, you could use:

str.matches("^\\d{1,2} DE (?:\\w{3}\\.|\\w+) (?:del )?(?:\\d{4}|\\d{2})$");

Or if you want to pull out a list of all occurrences of those date formats in str, you could use:

ArrayList<String> matches = new ArrayList<>();
Matcher regex = Pattern.compile("\\d{1,2} DE (?:\\w{3}\\.|\\w+) (?:del )?(?:\\d{4}|\\d{2})").matcher(str);
while (regex.find()) {
    matches.add(regex.group());
}
Charlie Armstrong
  • 1,955
  • 2
  • 7
  • 16