151

Given a string that isn't too long, what is the best way to read it line by line?

I know you can do:

BufferedReader reader = new BufferedReader(new StringReader(<string>));
reader.readLine();

Another way would be to take the substring on the eol:

final String eol = System.getProperty("line.separator");
output = output.substring(output.indexOf(eol + 1));

Any other maybe simpler ways of doing it? I have no problems with the above approaches, just interested to know if any of you know something that may look simpler and more efficient?

Paolo Forgia
  • 5,804
  • 7
  • 39
  • 55
His
  • 5,412
  • 13
  • 50
  • 73
  • 5
    Well your requirement said "read it line by line", which implies you don't need all the lines in memory at one time, so I would stick with the BufferedReader or Scanner approach, whichever you feel more comfortable with ( don't know which is more efficient). This way your memory requirements are less. It will also allow you to "scale up" the application to use larger strings by potentially reading data from a file in the future. – camickr Jul 08 '09 at 16:38

11 Answers11

215

There is also Scanner. You can use it just like the BufferedReader:

Scanner scanner = new Scanner(myString);
while (scanner.hasNextLine()) {
  String line = scanner.nextLine();
  // process the line
}
scanner.close();

I think that this is a bit cleaner approach that both of the suggested ones.

gregko
  • 5,189
  • 7
  • 44
  • 71
notnoop
  • 56,619
  • 20
  • 118
  • 142
  • 5
    I don't think it's a fair comparison though - String.split relies on the entire input being read into memory, which isn't always feasible (e.g. for large files). – Adamski Jul 08 '09 at 08:00
  • 3
    The input has to reside in memory, given that the input is String. The memory overhead is the array. Also, the resulting Strings reuse the same back-end character array. – notnoop Jul 09 '09 at 13:21
  • 1
    Beware Scanner can produce wrong results if you scan an UTF-8 file with Unicode characters and don't specify the encoding in Scanner.It might interpret a different character as end of line. In Windows it uses its default encoding. – live-love Nov 07 '17 at 04:12
141

You can also use the split method of String:

String[] lines = myString.split(System.getProperty("line.separator"));

This gives you all lines in a handy array.

I don't know about the performance of split. It uses regular expressions.

Michael
  • 34,340
  • 9
  • 58
  • 100
ftl
  • 2,818
  • 1
  • 20
  • 23
  • 3
    And hope the line separator doesn't have regex characters in it. :) – Tom Hawtin - tackline Jul 08 '09 at 09:06
  • 47
    "line.separator" is not reliable anyway. Just because the code is running on (e.g.) Unix, what's to stop the file from having Windows-style "\r\n" line separators? BufferedReader.readLine() and Scanner.nextLine() always check for all three styles of separator. – Alan Moore Jul 09 '09 at 06:25
  • 6
    I know this comment is really old, but ... The question doesn't mention files at all. Assuming the String was not read from a file, this approach is probably safe. – Jolta Jun 04 '13 at 12:20
  • @Jolta This is not safe even for manually constructed Strings, if you're on windows and constructed your String with '\n' and then split on line.separator you get no lines. – masterxilo May 04 '16 at 11:47
  • Huh? If I create a string on my linux box using `line.separator` and someone else reads it on windows using `line.separator`, it's still humped. That's not incompetent coders from doing stupid things, it's just how things (don't always) work. – Larry Jan 12 '17 at 17:49
  • How about latest JDK/11 API - [`String.lines`](https://stackoverflow.com/a/50631407/1746118)? – Naman May 31 '18 at 19:27
47

Since I was especially interested in the efficiency angle, I created a little test class (below). Outcome for 5,000,000 lines:

Comparing line breaking performance of different solutions
Testing 5000000 lines
Split (all): 14665 ms
Split (CR only): 3752 ms
Scanner: 10005
Reader: 2060

As usual, exact times may vary, but the ratio holds true however often I've run it.

Conclusion: the "simpler" and "more efficient" requirements of the OP can't be satisfied simultaneously, the split solution (in either incarnation) is simpler, but the Reader implementation beats the others hands down.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

/**
 * Test class for splitting a string into lines at linebreaks
 */
public class LineBreakTest {
    /** Main method: pass in desired line count as first parameter (default = 10000). */
    public static void main(String[] args) {
        int lineCount = args.length == 0 ? 10000 : Integer.parseInt(args[0]);
        System.out.println("Comparing line breaking performance of different solutions");
        System.out.printf("Testing %d lines%n", lineCount);
        String text = createText(lineCount);
        testSplitAllPlatforms(text);
        testSplitWindowsOnly(text);
        testScanner(text);
        testReader(text);
    }

    private static void testSplitAllPlatforms(String text) {
        long start = System.currentTimeMillis();
        text.split("\n\r|\r");
        System.out.printf("Split (regexp): %d%n", System.currentTimeMillis() - start);
    }

    private static void testSplitWindowsOnly(String text) {
        long start = System.currentTimeMillis();
        text.split("\n");
        System.out.printf("Split (CR only): %d%n", System.currentTimeMillis() - start);
    }

    private static void testScanner(String text) {
        long start = System.currentTimeMillis();
        List<String> result = new ArrayList<>();
        try (Scanner scanner = new Scanner(text)) {
            while (scanner.hasNextLine()) {
                result.add(scanner.nextLine());
            }
        }
        System.out.printf("Scanner: %d%n", System.currentTimeMillis() - start);
    }

    private static void testReader(String text) {
        long start = System.currentTimeMillis();
        List<String> result = new ArrayList<>();
        try (BufferedReader reader = new BufferedReader(new StringReader(text))) {
            String line = reader.readLine();
            while (line != null) {
                result.add(line);
                line = reader.readLine();
            }
        } catch (IOException exc) {
            // quit
        }
        System.out.printf("Reader: %d%n", System.currentTimeMillis() - start);
    }

    private static String createText(int lineCount) {
        StringBuilder result = new StringBuilder();
        StringBuilder lineBuilder = new StringBuilder();
        for (int i = 0; i < 20; i++) {
            lineBuilder.append("word ");
        }
        String line = lineBuilder.toString();
        for (int i = 0; i < lineCount; i++) {
            result.append(line);
            result.append("\n");
        }
        return result.toString();
    }
}
Arend
  • 2,158
  • 1
  • 15
  • 16
  • 4
    As of Java8, the BufferedReader has a `lines()` function returning a `Stream` of the lines, which you can collect into a list if you wish, or process the stream. – Steve K Sep 30 '15 at 05:54
25

Using Apache Commons IOUtils you can do this nicely via

List<String> lines = IOUtils.readLines(new StringReader(string));

It's not doing anything clever, but it's nice and compact. It'll handle streams as well, and you can get a LineIterator too if you prefer.

fabian
  • 67,623
  • 12
  • 74
  • 102
Brian Agnew
  • 254,044
  • 36
  • 316
  • 423
  • 2
    One drawback of this approach is that `IOUtils.readlines(Reader)` throws an `IOException`. Even though this will probably never happen with a StringReader, you'll have to catch or declare it. – sleske Jan 26 '12 at 14:41
  • There is a slight typo, it should be: List lines = IOUtils.readLines(new StringReader(string)); – tommy chheng Feb 06 '12 at 01:57
19

Solution using Java 8 features such as Stream API and Method references

new BufferedReader(new StringReader(myString))
        .lines().forEach(System.out::println);

or

public void someMethod(String myLongString) {

    new BufferedReader(new StringReader(myLongString))
            .lines().forEach(this::parseString);
}

private void parseString(String data) {
    //do something
}
Batiaev
  • 1,115
  • 1
  • 14
  • 29
13

Since Java 11, there is a new method String.lines:

/**
 * Returns a stream of lines extracted from this string,
 * separated by line terminators.
 * ...
 */
public Stream<String> lines() { ... }

Usage:

"line1\nline2\nlines3"
    .lines()
    .forEach(System.out::println);
ZhekaKozlov
  • 29,055
  • 16
  • 100
  • 138
7

You can also use:

String[] lines = someString.split("\n");

If that doesn't work try replacing \n with \r\n.

Tisho
  • 7,474
  • 5
  • 40
  • 52
Olin Kirkland
  • 545
  • 4
  • 22
  • 3
    Hardcoding the representation of newline makes the solution platform-dependent. – thSoft Apr 07 '15 at 15:35
  • @thSoft I would argue the same can be said about *not harcoding* it - if you don't hardcode it, you'll get different outcome on different platforms for the same input (i.e. with exactly same line breaks instead of platform-dependent line breaks in the input). This isn't really a yes/no and you have to think about what your input will be. – Jiri Tousek Jul 17 '19 at 17:09
  • Yeah, in practice I've used and seen the method I answered with hundreds of times. It's just more straightforward to have one line that breaks your text chunks than using the Scanner class. That is, if your string isn't abnormally massive. – Olin Kirkland Jul 18 '19 at 06:25
7

You can use the stream api and a StringReader wrapped in a BufferedReader which got a lines() stream output in java 8:

import java.util.stream.*;
import java.io.*;
class test {
    public static void main(String... a) {
        String s = "this is a \nmultiline\rstring\r\nusing different newline styles";

        new BufferedReader(new StringReader(s)).lines().forEach(
            (line) -> System.out.println("one line of the string: " + line)
        );
    }
}

Gives

one line of the string: this is a
one line of the string: multiline
one line of the string: string
one line of the string: using different newline styles

Just like in BufferedReader's readLine, the newline character(s) themselves are not included. All kinds of newline separators are supported (in the same string even).

masterxilo
  • 1,949
  • 1
  • 24
  • 29
5

Or use new try with resources clause combined with Scanner:

   try (Scanner scanner = new Scanner(value)) {
        while (scanner.hasNextLine()) {
            String line = scanner.nextLine();
            // process the line
        }
    }
Mārcis
  • 81
  • 1
  • 2
2

You can try the following regular expression:

\r?\n

Code:

String input = "\nab\n\n    \n\ncd\nef\n\n\n\n\n";
String[] lines = input.split("\\r?\\n", -1);
int n = 1;
for(String line : lines) {
    System.out.printf("\tLine %02d \"%s\"%n", n++, line);
}

Output:

Line 01 ""
Line 02 "ab"
Line 03 ""
Line 04 "    "
Line 05 ""
Line 06 "cd"
Line 07 "ef"
Line 08 ""
Line 09 ""
Line 10 ""
Line 11 ""
Line 12 ""
Paul Vargas
  • 38,878
  • 15
  • 91
  • 139
1

The easiest and most universal approach would be to just use the regex Linebreak matcher \R which matches Any Unicode linebreak sequence:

Pattern NEWLINE = Pattern.compile("\\R")
String lines[] = NEWLINE.split(input)

@see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

rednoah
  • 755
  • 10
  • 35