14

I was learning regular expression in iOS, saw this tutorial:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet

It reads like this for \b:

\b matches word boundary characters such as spaces and punctuation. to\b will match the "to" in "to the moon" and "to!", but it will not match "tomorrow". \b is handy for "whole word" type matching.

and \s:

\s matches whitespace characters such as spaces, tabs, and newlines. hello\s will match "hello " in "Well, hello there!".

I have two questions on this:

1) what is the difference between \s and \b? when to use which?

2) \b is handy for "whole word" type matching -> Don't understand the meaning..

Need some guidance on these two.

georg
  • 195,833
  • 46
  • 263
  • 351
lakesh
  • 25,623
  • 57
  • 163
  • 254
  • Assertions in regexes are like "IFs" in conventional programming. `foo\b` matches "foo" IF it's followed by a non-word char. – georg Jun 10 '13 at 09:09
  • @thg435 firstly thanks.. Got a qn to ask. What are assertions in regexes? Do you have any example? – lakesh Jun 10 '13 at 09:22
  • `\b` in your question is an assertion. Other examples are anchors like `^`, `$` and lookarounds. – georg Jun 10 '13 at 09:27

4 Answers4

22

\b Boundary characters

\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.

For example in the sentence: "Hello there, this is one test. Testing"

The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.

enter image description here

\s Whitespace

\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.

enter image description here


Edit

Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.

In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).

enter image description here

Community
  • 1
  • 1
David Rönnqvist
  • 54,872
  • 18
  • 158
  • 197
  • 1
    @lakesh `e\s` would be a 2 character match. `e\b` will match only one character. Comparison between these two would probably explain it better than `e\b` vs `\s`. – Sulthan Jun 10 '13 at 09:24
3
  • \b is matching a word boundary. That is a zero width assertion, means it is not matching a character, it is matching a position, where a certain condition is true.

    \b is related to \w. \w is defining "word characters", means letters, digits and underscores. So \b is now matching on a change from a word character to a non-word character, or the other way round. Means it matches the start and end of a word, but not the character before or after the word.

  • \s is a predefined character class that is matching any whitespace character.

See and try out what \bFoo\b matches here on Regexr

See and try out what \sFoo\s matches here on Regexr

Community
  • 1
  • 1
stema
  • 80,307
  • 18
  • 92
  • 121
  • firstly thanks for replying. What do you use /\bFoo? why do u need an extra /? can explain to me? – lakesh Jun 10 '13 at 14:31
  • Sorry that is Perl syntax, just a regex delimiter, but I used it only on one expression. Removed – stema Jun 10 '13 at 14:33
2

\b is zero-width. That is, it doesn't actually match any character. Meanwhile, \s does match a character. This is an important distinction for capturing and more complicated regular expressions.

For example, say you're trying to match numbers that begin with multiple zeros, like 007 or 000101101. You might try:

0+\d*

But see, that would also match 1007 and 101000101101! So then, you might try:

\s0+\d*

But see how that wouldn't match a 007 at the beginning of the string (because there's no space character)? Using \b allows you to get the "whole word (or number)":

\b0+\d*
slackwing
  • 25,894
  • 12
  • 72
  • 124
0

\b matches any character that is not a letter or number without including itself in the match.

\s matches only white space.

For example: \b would match any of these: "!?,.@#$%^&*()_+ ".

$text = "Hello, Yo! moo .";
$regex = "~o\b~";

^---Will match all three o's.

$text = "Hello, Yo! moo .";
$regex = "~o\s~";

^---Will only match the 'o' in 'moo'.

frosty
  • 2,250
  • 7
  • 23
  • 61