26

The following code:

val sentence = "1 2  3   4".split(" ")

gives me:

Array(1, 2, "", 3, "", "", 4)

but I'd rather want to have only the words:

Array(1, 2, 3, 4)

How can I split the sentence when the words are separated by multiple spaces?

Jacek Laskowski
  • 64,943
  • 20
  • 207
  • 364
yalkris
  • 2,299
  • 5
  • 24
  • 46
  • http://stackoverflow.com/questions/225337/how-do-i-split-a-string-with-any-whitespace-chars-as-delimiters this worked. – yalkris Jan 22 '13 at 23:34

3 Answers3

62

Use a regular expression:

scala> "1   2 3".split(" +")
res1: Array[String] = Array(1, 2, 3)

The "+" means "one or more of the previous" (previous being a space).

Better yet, if you want to split on all whitespace:

scala> "1   2 3".split("\\s+")
res2: Array[String] = Array(1, 2, 3)

(Where "\\s" is a Pattern which matches any whitespace. Look here for more examples.)

Tim
  • 1,555
  • 13
  • 17
  • cf. "1 2 3".split("""\s+""") is the same as "1 2 3".split("\\s+") in Scala. Raw string (string wrapped in """) is useful in complex pattern. – Naetmul Jan 01 '14 at 13:47
  • 2
    NB. For strings starting with whitespace: " 1 2 3".split("\\s+"), this gives a result with who's first element is an empty string. Is there a regex that will avoid this? – user48956 Jan 10 '14 at 18:18
  • @user48956 " 1 2 3".trim.split("\\s+") – Travis Kaufman Jan 20 '15 at 22:44
  • In a file text "\n\n".trim.split(" +") gives Array[String] = Array("") .. an empty string.. any suggestion ? – Antoni Jan 14 '18 at 18:28
5

You can filter out the "" from the split Array.

scala> val sentence = "1 2  3   4".split(" ").filterNot(_ == "")
sentence: Array[java.lang.String] = Array(1, 2, 3, 4)
Brian
  • 19,258
  • 6
  • 32
  • 51
2

This regular expression \\W+ delivers (alphaunmerical) words, thus

val sentence = "1 2  3   4".split("\\W+")
sentence: Array[String] = Array(1, 2, 3, 4)

For ease of use, in Scala 2.10.* and 2.11.* consider

implicit class RichString(val s: String) extends AnyVal {
  def words = s.split("\\W+")
}

Thus,

sentence.words
res: Array[String] = Array(1, 2, 3, 4)
elm
  • 18,533
  • 11
  • 59
  • 106