How to get the first word of every line of a String in Groovy

Question

I have a method that returns a String with multiple lines. I want to parse the String and get the first word of each line.

Method getText() returns:

Lorem ipsum dolor 
sit amet odio 
magnis vitae iaculis

I want to get only

Lorem
sit
magnis

My current code is

        def projectString = getText()
        def projects = projectString.substring(0, projectString.indexOf(' '))

Of course that only gets the first word of the first line. I could use a while loop on the string based on new lines and get the first word using the substring method above, but I have a feeling that Groovy has a groovier way of doing this.

Initially I was thinking about using a pipe on the method call result so something like

def projects = getText() | sh "awk '{print $1}'"

But I couldn't get that to work.

I don't know who downvoted my post down below, but I offer two one-liner solutions that achieve the same goal as the selected solution using different approaches. And b.c. of the downvote, it'll generally get instantly dismissed as an incorrect solution which it is absolutely *not*... — solstice333, Sep 08 '17 at 21:08

score 5 · Accepted Answer · answered Sep 07 '17 at 21:36

5

Here is an example:

def projectString = """Lorem ipsum dolor
sit amet odio
magnis vitae iaculis"""

projectString = projectString
    .readLines()
    .collect { it[0.. it.indexOf(' ')] }
    .join("\n")

println projectString

You can check it online: https://groovyconsole.appspot.com/script/5132242514870272

answered Sep 07 '17 at 21:36

Vyacheslav Enis

1,602
12
17

2

`it[0.. it.indexOf(' ')]` looks odd... I prefer `it.split().head()` – tim_yates Sep 07 '17 at 22:05
1

Worked like a charm. I used this with @tim_yates suggestion for `it.split().head()` – wiredniko Sep 08 '17 at 12:17
I don't know who downvoted my post down below, but I offer two one-liner solutions that achieve the same goal as this solution using different approaches. And b.c. of the downvote, it'll generally get instantly dismissed as an incorrect solution which it is absolutely *not*... – solstice333 Sep 08 '17 at 21:07

solstice333 · Answer 2 · 2017-09-07T23:52:29.663

0

Groovy has the pattern operator ~ for regex patterns similar to Perl. The solution below uses (?m) to enable the multiline flag, and ^\w+ to grab one or more of [A-Za-z0-9_] at the beginning of the string. The regex expression evaluates to a Matcher object, and from there, all the matches (the first word of each line) is collected into a list.

The second solution starts off with a readLines() to return a list of lines, then uses the closure overload of collect() to map/transform each line to the first word of each line using StringTokenizer which is faster than the typical String split. Also, it seems like the API suggests lazy evaluation based on its stream-like interface, which would be better than parsing the entire line since we only want the first word anyway.

Examples below:

def foo = """Lorem ipsum dolor
sit amet odio
magnis vitae iaculis"""

println((foo =~ /(?m)^\w+/).collect())
println foo.readLines().collect { new StringTokenizer(it).nextElement() }

// both print [Lorem, sit, magnis]

edited Sep 07 '17 at 23:52

answered Sep 07 '17 at 21:52

solstice333

2,617
19
26

1

Hm, not sure why this got downvoted. It does what the OP wants to do in two different simple ways with only a single line of code in each example. Try it out at least. – solstice333 Sep 08 '17 at 20:55
BTW, you may want to switch the `\w` to a `\S` based on what you're trying to match. `\S` is more loose in that it will match any non-whitespace character. – solstice333 Sep 08 '17 at 21:10
Have you checked javadocs for StringTokenizer at all? It's an old legacy implementation. "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead." https://docs.oracle.com/javase/8/docs/api/index.html?java/util/StringTokenizer.html https://docs.oracle.com/javase/7/docs/api/index.html?java/util/StringTokenizer.html – Vyacheslav Enis Sep 09 '17 at 03:26
Ya I understand that, and was actually thinking about just using a generic string split at first, but I read up on this: https://stackoverflow.com/questions/691184/scanner-vs-stringtokenizer-vs-string-split. To summarize, there's profiling results that say StringTokenizer is almost 2x as fast as split(). If you don't need the rest of the tokens or regular expressions, why bother using split? It's just excess work at that point. – solstice333 Sep 09 '17 at 03:55

How to get the first word of every line of a String in Groovy

2 Answers2