10

I am struggling with building a regular expression for parsing this kind of strings (bible scriptures):

  'John 14:16–17, 25–26'
  'John 14:16–17'
  'John 14:16'
  'John 14'
  'John'

So the basic pattern is:

Book [[Chapter][:Verse]]

where chapter and verse is optional.

Dr.Kameleon
  • 21,495
  • 19
  • 103
  • 208
Dziamid
  • 10,081
  • 11
  • 62
  • 100
  • So it should match even if it's just the book's name? Do you have a list of books that it should match? Otherwise it would just match every word. – JJJ Apr 02 '12 at 09:36
  • Just match any word, the real problem for me is having so many optional parts. – Dziamid Apr 02 '12 at 09:41

5 Answers5

9

I think this does what you need:

\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?

Assumptions:

  • The numbers are always in sets of either 1 or 2 digits
  • The dash will match either of the following - and

Below is the regex with comments:

"
\w         # Match a single character that is a “word character” (letters, digits, and underscores)
   +          # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s         # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   ?          # Between zero and one times, as many times as possible, giving back as needed (greedy)
(          # Match the regular expression below and capture its match into backreference number 1
   \d         # Match a single digit 0..9
      {1,2}      # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)?         # Between zero and one times, as many times as possible, giving back as needed (greedy)
(          # Match the regular expression below and capture its match into backreference number 2
   :          # Match the character “:” literally
   \d         # Match a single digit 0..9
      {1,2}      # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)?         # Between zero and one times, as many times as possible, giving back as needed (greedy)
(          # Match the regular expression below and capture its match into backreference number 3
   [-–]       # Match a single character present in the list “-–”
   \d         # Match a single digit 0..9
      {1,2}      # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)?         # Between zero and one times, as many times as possible, giving back as needed (greedy)
(          # Match the regular expression below and capture its match into backreference number 4
   ,          # Match the character “,” literally
   \s         # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   \d         # Match a single digit 0..9
      {1,2}      # Between one and 2 times, as many times as possible, giving back as needed (greedy)
   [-–]       # Match a single character present in the list “-–”
   \d         # Match a single digit 0..9
      {1,2}      # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)?         # Between zero and one times, as many times as possible, giving back as needed (greedy)
"

And here are some examples of its usage in php:

if (preg_match('/\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?/', $subject)) {
    # Successful match
} else {
    # Match attempt failed
}

Get an array of all matches in a given string

preg_match_all('/\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Robbie
  • 16,416
  • 3
  • 35
  • 43
4

Try this here

\b[a-zA-Z]+(?:\s+\d+)?(?::\d+(?:–\d+)?(?:,\s*\d+(?:–\d+)?)*)?

See and test it here on Regexr

Because of the (?:,\s*\d+(?:–\d+)?)* at the end you can have a list of verses, verses ranges at the end.

stema
  • 80,307
  • 18
  • 92
  • 121
  • Yours is the most general one. I only added `[-–]` instead of hyphen as @Robby suggested and some capturing brackets to make it perfect. – Dziamid Apr 02 '12 at 09:59
3

Use this regex :

[A-Za-z]+( ([0-9]+)(:[0-9]+)?([\-–][0-9]+)?(, [0-9]+[\-–][0-9]+)?)?

Or in its 'prettier' version :

\w+( (\d+)(:\d+)?([\-–]\d+)?(, \d+[\-–]\d+)?)?

UPDATED : To match dashes or hyphens


NOTE : I've tested it and it matches ALL 5 possible versions.

Example : http://regexr.com?30h4q

enter image description here

Dr.Kameleon
  • 21,495
  • 19
  • 103
  • 208
0
([1|2|3]?([i|I]+)?(\s?)\w+(\s+?))((\d+)?(,?)(\s?)(\d+))+(:?)((\d+)?([\-–]\d+)?(,(\s?)\d+[\-–]\d+)?)?

works for almost every book...

laalto
  • 137,703
  • 64
  • 254
  • 280
0
   (\b[a-zA-Z]\w+\s\d+)(:\d+)+([-–]\d+)?([,;](\s)?(\d+:)?\d+([-–]\d+)?)?

This is a hybrid of all code presented here. The only formats it will not highlight are "book name only" or "book & chapter only" (just add ":1-all" after chapter #) I found the other codes provided to qualify too many variations, not in line with bible verse syntax.

These are the examples I tested in RegExr: (can't post images yet)

John humbolt 14:16–17, 25–26
John 14:16–17
John 14:16
John 77:3; 2:9-11
John 5:1-all brad 555-783-6867
John 6
hi there how are you
Ezra 32:5 John 14:16-17, 25-36
12 23:34
John 14:16-17,25-36
John 14:16-17; 32:25