1

I have to split a long string with lyrics to a song into lines and then, for each line, split them into words. I'm going to hold this information in a 2 dimensional array.

I've seen some similar questions and they have been solved using [NSRegularExpression] (https://developer.apple.com/documentation/foundation/nsregularexpression) but I can't seem to find any regular expression that equals "everything except something" which is what I want to split on when splitting a string into words.

More specifically I want to split on Everything except alphanumerics or ' or -. In Java this regular expression is [^\\w'-]+

Below is the string, followed by my Swift code to attempt to achieve this task (I just split on whitespace instead of actually splitting on words with "[^\w'-]+" as I can't figure out how to do it.

 1 Is this the real life?
 2 Is this just fantasy?
 3 Caught in a landslide,
 4 No escape from reality.
 5 
 6 Open your eyes,
 7 Look up to the skies and see,
 8 I'm just a poor boy, I need no sympathy,
 9 Because I'm easy come, easy go,
10 Little high, little low,
11 Any way the wind blows doesn't really matter to me, to me.
12 
13 Mama, just killed a man,

(etc.)


let lines = s?.components(separatedBy: "\n")
var all_words = [[String]]()
for i in 0..<lines!.count {
    let words = lines![i].components(separatedBy: " ") 
    let new_words = words.filter {$0 != ""} 
    all_words.append(new_words)
 }
jscs
  • 62,161
  • 12
  • 145
  • 186
  • possible duplicate of https://stackoverflow.com/a/39667966/2303865 – Leo Dabus Mar 16 '19 at 01:03
  • 1
    You can use enumerateSubstrings in range to break your string into lines then break it up again into words. Using the extension from the link above you could do something like `let byLinesByWords = s.byLines.map{$0.byWords}` – Leo Dabus Mar 16 '19 at 01:08
  • if you need to drop the numbers at the beginning of each line `let byLinesByWords = s.byLines.map{$0.byWords.dropFirst()}` – Leo Dabus Mar 16 '19 at 01:11
  • Just match all occurrences using the reverse regex, `[\w'-]+`. – Wiktor Stribiżew Mar 16 '19 at 08:40

2 Answers2

1

I suggest to use a reverse pattern, [\w'-]+, to match the strings you need and use the matches matching function.

Your code will look like:

for i in 0..<lines!.count {
    let new_words = matches(for: "[\\w'-]+", in: lines![i]) 
    all_words.append(new_words)
 }

The following line of code:

print(matches(for: "[\\w'-]+", in: "11 Any way the wind blows doesn't really matter to me, to me."))

yields ["11", "Any", "way", "the", "wind", "blows", "doesn\'t", "really", "matter", "to", "me", "to", "me"].

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
0

One simple solution is to replace the sequences with a special character first and then split on that character:

let words = string
    .replacingOccurrences(of: "[^\\w'-]+", with: "|", options: .regularExpression)
    .split(separator: "|")
print(words)

However, if you can, use the system function to enumerate words.

Sulthan
  • 118,286
  • 20
  • 194
  • 245