414

I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes. Code:

public void insertUpdate(DocumentEvent e) {
    String split[], docStr = null;
    Document textAreaDoc = (Document)e.getDocument();

    try {
        docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
    } catch (BadLocationException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }

    split = docStr.split("\\n");
}
Can Berk Güder
  • 99,195
  • 24
  • 125
  • 135
dr.manhattan
  • 4,392
  • 3
  • 17
  • 10
  • 9
    what is the error that you get? Dont say "does not work", that doesnt mean anything. Tell us the error/result you get. That is the first step in debugging code - figure out what the wrong result is, and how your program got to that. – Chii Jan 18 '09 at 10:18
  • What do you realy want to do? - break lines as they are entered in the JTextArea? - finding where the JTextArea is doing line wraps? - ??? – user85421 Apr 29 '09 at 12:05

20 Answers20

786

This should cover you:

String lines[] = string.split("\\r?\\n");

There's only really two newlines (UNIX and Windows) that you need to worry about.

Buhake Sindi
  • 82,658
  • 26
  • 157
  • 220
cletus
  • 578,732
  • 155
  • 890
  • 933
  • 43
    A JTextArea document SHOULD use only '\n'; its Views completely ignore '\r'. But if you're going to look for more than one kind of separator, you might as well look for all three: "\r?\n|\r". – Alan Moore Jan 18 '09 at 18:02
  • 12
    Mac 9 uses \r. OSX 10 uses \n – Raekye May 06 '13 at 05:25
  • ${fn:length(fn:split(data, '\\r?\\n'))} is not working in jstl –  Jun 17 '14 at 15:48
  • Isn't it: 'String[] lines = String.split("\\r?\\n");' ? – FeinesFabi Oct 30 '14 at 10:45
  • 5
    @antak yes, `split` by default removes trailing empty strings if they ware result of split. To turn this mechanism off you need to use overloaded version of `split(regex, limit)` with negative limit like `text.split("\\r?\\n", -1)`. More info: [Java String split removed empty values](http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values) – Pshemo Jul 19 '16 at 13:08
  • String[] lines = string.split(System.getProperty("line.separator")); This will work fine while you use strings generated in your same OS/app, but if for example you are running your java application under linux and you retrieve a text from a database that was stored as a windows text, then it could fail. – ibai Mar 24 '17 at 23:40
  • 1
    The comment by @stivlo is misinformation, and it is unfortunate that it has so many upvotes. As @ Raekye pointed out, OS X (now known as macOS) has used \n as its line separator since it was released in 2001. Mac OS 9 was released in 1999, and I have never seen a Mac OS 9 or below machine used in production. There is not a single modern operating system that uses \r as a line separator. NEVER write code that expects \r to be the line separator on Mac, unless a) you're into retro computing, b) have an OS 9 machine spun up, and c) can reliably determine that the machine is actually OS 9. – James McLaughlin May 03 '17 at 22:53
  • And what does it mean? – Lealo Aug 08 '17 at 23:22
  • This answer did not work for me. I just use "String pieces[] = text.split("\n") " or "String pieces[] = text.split(System.getProperty("line.separator")) "on java 8. – Maykel Llanes Garcia Dec 04 '17 at 21:24
  • What about unicode?? A next-line character ('\u0085'), A line-separator character ('\u2028'), or A paragraph-separator character ('\u2029). – john ktejik Nov 11 '18 at 00:10
  • how about this: \v+ (one or more vertical whitespace character) – Ubeogesh Jan 14 '20 at 14:17
152

String#split​(String regex) method is using regex (regular expressions). Since Java 8 regex supports \R which represents (from documentation of Pattern class):

Linebreak matcher
\R         Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

So we can use it to match:

As you see \r\n is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.


So if you want to split on line separator use split("\\R").

If you don't want to remove from resulting array trailing empty strings "" use split(regex, limit) with negative limit parameter like split("\\R", -1).

If you want to treat one or more continues empty lines as single delimiter use split("\\R+").

Pshemo
  • 113,402
  • 22
  • 170
  • 242
  • 6
    Yes, it's the best answer. Unfortunate that the question was asked six years too early for this answer. – Dawood ibn Kareem Nov 22 '19 at 03:20
  • I ended up splitting on `\\R+`, to avoid any end-of-line characters that were not covered by `\\R` alone. – SeverityOne Jan 21 '20 at 06:45
  • **JAVA 9 PROBLEM with `find` `matches`**. Java 9 incorrectly allows regex like `\R\R` to match sequence `\r\n` which represents *single separation sequence*. To solve such problem we can write regex like `(?>\u000D\u000A)|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]` which thanks to [atomic group](https://www.regular-expressions.info/atomic.html) `(?>\u000D\u000A)` will prevent regex which already matched `\r\n` to backtrack and try to match `\r` and `\n` separately. – Pshemo Jan 29 '21 at 12:23
135

If you don’t want empty lines:

String.split("[\\r\\n]+")
Gumbo
  • 594,236
  • 102
  • 740
  • 814
  • 4
    double backslashes are unnecessary, see section "Backslashes, escapes, and quoting" http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html – angryITguy Dec 05 '11 at 22:09
  • 8
    @giulio Yes, I know (see [Understanding regex in Java: split(“\t”) vs split(“\\t”) - when do they both work, and when should they be used](http://stackoverflow.com/questions/3762347/understanding-regex-in-java-split-t-vs-split-t-when-do-they-both-wor/3762377#3762377)). – Gumbo Dec 06 '11 at 08:54
  • 1
    This worked on Mac OSX when the above answer did not. – John Nov 01 '14 at 23:57
  • This also worked for me. Excellent solution. It worked for the following 2 cases: 1) i woke up at 3 o clock.\r\n\r\nI hope 2) this is real life\r\nso I – logixplayer Jul 17 '15 at 15:52
  • This answer is exactly correct. One little suggestion would be that it might be helpful to add _why_ it gets rid of the empty lines for people that might not be as familiar with regex and how it behaves. For anybody that might be wondering, it's because the "+" is a greedy operator and will match at least one but will continue to match the '\r\n' characters until it no longer can match them. See here: http://www.regular-expressions.info/repeat.html#greedy – greyseal96 Apr 08 '16 at 20:54
  • Why not `[\\r?\\n]+`? – tresf Feb 21 '19 at 18:19
  • 2
    @tresf You can't use quantifiers in square brackets. – Breina Dec 11 '19 at 08:58
52
String.split(System.getProperty("line.separator"));

This should be system independent

Shervin Asgari
  • 22,044
  • 28
  • 92
  • 138
  • 44
    It's an interesting idea, but you should take care that the text actually uses the system's line separator. I've good many many text files under unix (e.g. XML) that uses "Windows" separators and quite a few under Windows that use unix separators. – Maarten Bodewes Jul 30 '12 at 23:37
  • Works even on android – ruX Mar 07 '14 at 13:23
  • 7
    Files created in a Windows OS and transfered to a Unix OS will still contain \r\n seperators. I think it's better to play safe and take both seperators in account. – bvdb Jul 18 '14 at 11:44
  • 17
    This is a very problematic approach! The file may not originate from the system running the code. I strongly discourage these kinds of "system independent" designs that actually depends on a specific system, the runtime system. – Martin Dec 11 '14 at 08:38
  • @Martin if you have control over the deployed system, this is fine. However, if you are deploying your code to the cloud and have no control, then its not the best way to do it – Shervin Asgari Dec 11 '14 at 12:56
  • 4
    @Shervin It is never the best way to do it. It is in fact very bad practice. Consider some other programmer calling System.setProperty("line.separator", "you have no point"); Your code is broken. It might even be called similarly by a dependency you have no knowledge about. – Martin Dec 16 '14 at 13:34
  • This did not work as the file originated on Unix, and was being split on Windows. – Greg Oct 06 '15 at 17:06
  • @Martin -- "some other programmer calling System.setProperty("line.separator", "you have no point"); " --- Just wondering, wouldn't such idiocy/sabotage break a lot of expected behaviours in the JDK libraries, too? – Rop Jul 13 '17 at 13:52
  • @Rop I can't think of any cases right away, but there might exist dependencies to system properties that actually break code. I would strongly encourage configuration without use of system properties whenever possible. – Martin Aug 15 '17 at 14:53
19

A new method lines has been introduced to String class in , which returns Stream<String>

Returns a stream of substrings extracted from this string partitioned by line terminators.

Line terminators recognized are line feed "\n" (U+000A), carriage return "\r" (U+000D) and a carriage return followed immediately by a line feed "\r\n" (U+000D U+000A).

Here are a few examples:

jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
 ipusm
 sit

jshell> "lorem \n ipusm \r  sit".lines().forEach(System.out::println)
lorem
 ipusm
  sit

jshell> "lorem \n ipusm \r\n  sit".lines().forEach(System.out::println)
lorem
 ipusm
  sit

String#lines()

Nicola Isotta
  • 161
  • 1
  • 9
Anton Balaniuc
  • 8,462
  • 1
  • 30
  • 47
12

In JDK11 the String class has a lines() method:

Returning a stream of lines extracted from this string, separated by line terminators.

Further, the documentation goes on to say:

A line terminator is one of the following: a line feed character "\n" (U+000A), a carriage return character "\r" (U+000D), or a carriage return followed immediately by a line feed "\r\n" (U+000D U+000A). A line is either a sequence of zero or more characters followed by a line terminator, or it is a sequence of one or more characters followed by the end of the string. A line does not include the line terminator.

With this one can simply do:

Stream<String> stream = str.lines();

then if you want an array:

String[] array = str.lines().toArray(String[]::new);

Given this method returns a Stream it upon up a lot of options for you as it enables one to write concise and declarative expression of possibly-parallel operations.

Ousmane D.
  • 50,173
  • 8
  • 66
  • 103
12

You don't have to double escape characters in character groups.

For all non empty lines use:

String.split("[\r\n]+")
sth
  • 200,334
  • 49
  • 262
  • 354
Martin
  • 1,845
  • 19
  • 21
  • Yes, you do. If they need double-escaping anywhere, they need it everywhere. Whitespace escapes like `\r` and `\n` can have one or two backslashes; they work either way. – Alan Moore Jun 06 '16 at 19:09
  • 2
    The double backslash `'\\'` in code becomes a `'\'` character and is then passed to the RegEx engine, so `"[\\r\\n]"` in code becomes `[\r\n]` in memory and RegEx will process that. I don't know how exactly Java handles RegEx, but it is a good practice to pass a "pure" ASCII string pattern to the RegEx engine and let it process rather than passing binary characters. `"[\r\n]"` becomes (hex) `0D0A` in memory and one RegEx engine might accept it while another will choke. So the bottom line is that even if Java's flavour of RegEx doesn't need them, keep double slashes for compatibility – nurchi Sep 15 '16 at 17:31
10

All answers given here actually do not respect Javas definition of new lines as given in e.g. BufferedReader#readline. Java is accepting \n, \r and \r\n as new line. Some of the answers match multiple empty lines or malformed files. E..g. <sometext>\n\r\n<someothertext> when using [\r\n]+would result in two lines.

String lines[] = string.split("(\r\n|\r|\n)", -1);

In contrast, the answer above has the following properties:

  • it complies with Javas definition of a new line such as e.g. the BufferedReader is using it
  • it does not match multiple new lines
  • it does not remove trailing empty lines
Till Schäfer
  • 632
  • 6
  • 14
8

If, for some reason, you don't want to use String.split (for example, because of regular expressions) and you want to use functional programming on Java 8 or newer:

List<String> lines = new BufferedReader(new StringReader(string))
        .lines()
        .collect(Collectors.toList());
Danilo Piazzalunga
  • 6,909
  • 5
  • 43
  • 65
  • I know this may be an overkill solution. – Danilo Piazzalunga Mar 07 '18 at 19:52
  • 3
    Or `String[] lines = new BufferedReader(...).lines().toArray(String[]::new);` for an array instead of a list. The nice thing about this solution is that `BufferedReader` knows about all kinds of like terminators, so it can handle text in all sorts of formats. (Most of the regex-based solutions posted here fall short in this regard.) – Ted Hopp Apr 25 '18 at 03:48
  • 3
    This solution is obsolete since Java 11 and the introduction of the String.lines() method. – leventov Oct 04 '18 at 00:22
7

Maybe this would work:

Remove the double backslashes from the parameter of the split method:

split = docStr.split("\n");
Michael
  • 33,344
  • 15
  • 70
  • 105
  • 8
    Not really. When you write a regex in the form of a Java String literal, you can use "\n" to pass the regex compiler a linefeed symbol, or "\\n" to pass it the escape sequence for a linefeed. The same goes for all the other whitespace escapes except \v, which isn't supported in Java literals. – Alan Moore Jan 18 '09 at 20:55
  • 3
    @Yuval. Sorry that is incorrect, you don't need it at all "Backslashes, escapes, and quoting" http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html – angryITguy Dec 05 '11 at 22:10
4

For preserving empty lines from getting squashed use:

String lines[] = String.split("\\r?\\n", -1);
sevenforce
  • 6,925
  • 3
  • 28
  • 24
3

The above code doesnt actually do anything visible - it just calcualtes then dumps the calculation. Is it the code you used, or just an example for this question?

try doing textAreaDoc.insertString(int, String, AttributeSet) at the end?

Chii
  • 13,490
  • 2
  • 33
  • 44
  • insertUpdate() is a DocumentListener method. Assuming the OP is using it right, trying to modify the document from within the listener method will generate an exception. But you're right: the code in that question doesn't actually do anything. – Alan Moore Jan 18 '09 at 17:55
2

As an alternative to the previous answers, guava's Splitter API can be used if other operations are to be applied to the resulting lines, like trimming lines or filtering empty lines :

import com.google.common.base.Splitter;

Iterable<String> split = Splitter.onPattern("\r?\n").trimResults().omitEmptyStrings().split(docStr);

Note that the result is an Iterable and not an array.

Thomas Naskali
  • 447
  • 4
  • 13
2

The above answers did not help me on Android, thanks to the Pshemo response that worked for me on Android. I will leave some of Pshemo's answer here :

split("\\\\n")
clasher
  • 71
  • 10
1

String lines[] =String.split( System.lineSeparator())

husayt
  • 12,616
  • 7
  • 45
  • 75
1

After failed attempts on the basis of all given solutions. I replace \n with some special word and then split. For me following did the trick:

article = "Alice phoned\n bob.";
article = article.replace("\\n", " NEWLINE ");
String sen [] = article.split(" NEWLINE ");

I couldn't replicate the example given in the question. But, I guess this logic can be applied.

kravi
  • 627
  • 1
  • 8
  • 13
1

There is new boy in the town, so you need not to deal with all above complexities. From JDK 11 onward, just need to write as single line of code, it will split lines and returns you Stream of String.

public class MyClass {
public static void main(String args[]) {
   Stream<String> lines="foo \n bar \n baz".lines();
   //Do whatever you want to do with lines
}}

Some references. https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#lines() https://www.azul.com/90-new-features-and-apis-in-jdk-11/

I hope this will be helpful to someone. Happy coding.

Red Boy
  • 4,333
  • 2
  • 18
  • 35
0
  • try this hope it was helpful for you

 String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();

try {
    docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
}

split = docStr.split("\n");
Vishal Yadav
  • 993
  • 3
  • 13
  • 28
0

There are three different conventions (it could be said that those are de facto standards) to set and display a line break:

  • carriage return + line feed
  • line feed
  • carriage return

In some text editors, it is possible to exchange one for the other:

Notepad++

The simplest thing is to normalize to line feedand then split.

final String[] lines = contents.replace("\r\n", "\n")
                               .replace("\r", "\n")
                               .split("\n", -1);
Paul Vargas
  • 38,878
  • 15
  • 91
  • 139
-1
package in.javadomain;

public class JavaSplit {

    public static void main(String[] args) {
        String input = "chennai\nvellore\ncoimbatore\nbangalore\narcot";
        System.out.println("Before split:\n");
        System.out.println(input);

        String[] inputSplitNewLine = input.split("\\n");
        System.out.println("\n After split:\n");
        for(int i=0; i<inputSplitNewLine.length; i++){
            System.out.println(inputSplitNewLine[i]);
        }
    }

}
bobble bubble
  • 11,968
  • 2
  • 22
  • 34
Naveen
  • 347
  • 1
  • 4
  • 11
  • This pales in comparison to the other answers, which are more explanatory and less code-heavy. Could you explain what it is you're accomplishing with this code, and why it would make a suitable answer? – Makoto May 19 '14 at 00:24
  • 2
    This has nothing to do with splitting a file into lines. Consider removing your answer. – Martin Dec 11 '14 at 08:47