insert space between upper case word with regex

Question

I would like to insert spaces between characters in word, but only for word with at least 2 upper case characters. I can use regex.

For example: "This is simple SEnTeNCE with a FEW word." -> "This is simple S E n T e N C E with a F E W word."

Well, you could first detect "words" with two or more uppercase letters: `(?=\w*[A-Z]\w*[A-Z])\w+`, then just insert the spaces. — tenub, Apr 25 '14 at 14:33
@user3572843: Welcome to Stack Overflow! Please consider bookmarking our [Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496) for future reference. — aliteralmind, Apr 25 '14 at 16:03

Casimir et Hippolyte · Answer 1 · 2014-04-25T14:57:59.570

4

A way with PHP/PCRE:

$pattern = '~(?:\b(?=(?:\w*[A-Z]){2})|(?!^)\G)\w\B\K~';

$text = preg_replace($pattern, ' ', $text);

pattern details:

(?:                      # non capturing group: begin with:
    \b                   # a word boundary 
    (?=(?:\w*[A-Z]){2})  # followed by a word with two uppercase letter at least
  |                      # OR
    (?!^)\G              # anchor: end of last match
)
\w\B                     # a word character followed by an other word character
\K                       # reset the match from match result

A way with Javascript with a callback:

var str = "This is simple SEnTeNCE with a FEW word.";

var res = str.replace(/\b(?:[a-z]*[A-Z]){2,}[a-z]*\b/g, function (m) {
    return  m.split('').join(' '); } );

console.log(res);

edited Apr 25 '14 at 14:57

answered Apr 25 '14 at 14:37

Casimir et Hippolyte

83,228
5
85
113

Can you elaborate on how you're using `(?!^)\G` and `\K`? I understand the lookbehind for start-of-input, but I've not used match anchors or resetters. – aliteralmind Apr 25 '14 at 16:06
@aliteralmind: `\G` matches the position in the string after the last match, but at the start (before the first match) since this position isn't defined, `\G` is an anchor for the start of the string. To forbid a match at the start of the string you can simply add `(?!^)` or `(?!\A)` or `(? – Casimir et Hippolyte Apr 25 '14 at 16:41
@aliteralmind: about the `\K` feature. The `\K` doesn't change the match but it only removes all that has been matched on its left from the result. A proof, you can not obtain overlapping results. Example: with the string "abcd", the pattern `abc\Kd|abc` will give only "d" but the second part of the alternation will never produce a result, since "abc" has been yet matched by the first part. – Casimir et Hippolyte Apr 25 '14 at 16:43
@aliteralmind: The use of `\K` here is a convenience that avoids to put all on the left in a capture group to make a reference in the replacement string. An other use of `\K`, it can be interesting when you are facing a problem of variable length lookbehind. – Casimir et Hippolyte Apr 25 '14 at 16:58
@aliteralmind: To understand the role of `\G` in a global search: The schema of the pattern is `(?: entry-point | \G – Casimir et Hippolyte Apr 25 '14 at 17:06
@aliteralmind: Note that if I follow strictly the schema for the current pattern, I must write `\K\B` instead of `\B\K`, where `\B` is `the-condition-to-break-the-contiguity`. But it doesn't matter here. – Casimir et Hippolyte Apr 25 '14 at 17:15
This is great information Casimir. I think it would be a nice addition to the FAQ. There's a good entry on [`\G`](http://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex), but the `\K` one could be improved, and there isn't any on using them together. If you have the time, put this information into your answer and consider adding a "walkthrough" of this specific example. Is that okay? – aliteralmind Apr 25 '14 at 18:00
@aliteralmind: Why not? I can write an "how to" like. – Casimir et Hippolyte Apr 25 '14 at 18:03

Robin · Answer 2 · 2014-04-25T23:15:54.760

A one regex solution would be (PCRE):

(?|(?=\b(?:[a-z]*[A-Z]){2})(\w)|(?!^)\G(\w))(?!\b)

(?|                             # branch reset group
  (?= \b (?:[a-z]* [A-Z]){2} )  # look ahead anchored at the begining of the word:
                                # check we are the beginning of a two-upper word
  (\w)                          # grab the first letter
|                               # OR
  (?!^)\G                       # we're following a previous match (and not
                                # at the beginning of the string)
  (\w)                          # if so we're inside a wanted word, so we grab
                                # a character
  (?!\b)                        # except if it's the last one (we don't want
                                # too many spaces)
)

And replace with

\1 # <- there's a space after the \1

See demo here.

Note that it might be easier to do it in more steps (grabbing the words, treating them individually, joining everything)...

insert space between upper case word with regex

2 Answers2