2

The problem I am solving is replacing all Strings from another String.

I solved this problem fairly easily on codingbat.com by using String.replaceAll, and doing it until the first String no longer contains the other String.

However, I dislike this method as it is very slow. I have tried searching this website for more efficient methods, and came across these questions:

Fastest way to perform a lot of strings replace in Java

String.replaceAll is considerably slower than doing the job yourself

They solved the problem by using StringUtils and Patterns. I still think these methods are too slow!

When I code problems like these, I like to get my runtime under two seconds with Java. I'm testing this with a String of 1,000,000 characters. String.replaceAll went well over two seconds, and so did the other two methods.

Does anyone have a fast solution for this problem? Thanks!

EDIT: Unfortunately, the answers I received still run too slowly. And yes, I did mean make a new String, not change the old String, sorry for that mistake.

I'm not sure how it would work, but I think looping over each char and checking might work. Something with algorithms.

Community
  • 1
  • 1
Jim Smith
  • 41
  • 5

5 Answers5

1

Strings are immutable so you can't remove stuff from them. Which means that you need to create a new String without the stuff that you want removed. When you use String.replace that is pretty much what it does: it creates a new String.

Beware of String.replaceAll since it uses a regular expression that gets compiled every time you call it (so never use it in a long loop). This is likely your problem.

If you need to use regular expressions, use the Pattern class to compile your regex and reuse the instance to create a new Matcher for each string you process. If you don't reuse your Pattern instance, it is going to be slow.

If you don't need a regular expression, StringUtils has a replaceEach() that does not rely on regular expressions.

If you are processing a large String. You may want to do things in a streaming fashion and loop over the characters and copy characters over to a StringBuilder.

Alternatively, you could use a regular expression to search for a particular pattern in the String and loop over the matches it finds and for each match append everything from the previous match to the current match to a StringBuilder.

Jilles van Gurp
  • 6,938
  • 4
  • 34
  • 41
  • I agree. The fastest way to do this is of course to write some native code (JNI). Failing that, loop through the chars and maintain a buffer of found matching characters etc. Thought I'm not sure exactly how much of a speed increase even this would produce. – Richard Feb 20 '15 at 16:43
  • I wouldn't expect any dramatic improvements from that. Rather the overhead of using JNI might actually outweigh any benefits. – Jilles van Gurp Feb 20 '15 at 16:57
1

The problem is your String in enormous, you only want to move/copy it once, and all the solutions that use multiple calls to replace will still end up doing an enormous amount of unnecessary work.

What you really want to use is Apache StringUtils.replaceEachRepeatedly, as that method handles searching for multiple strings while only building the result string one.

Emily Crutcher
  • 618
  • 4
  • 10
0

Apart of the time that each methods (replace, StringUtils or Patterns, ...) takes you only have one Thread working.

If you can split the work done by that thread in two or more, for example each Thread runs for a specific position in the string to other, you will be able to have a fast solution.

The tricky part is to divide the work and then join it together. That will depend how you read the string, where do you write it in the end for example.

Regards,

0

I have faced the same problem some time ago and came to this post: Replace all occurrences of a String using StringBuilder?

Using the implementation given in the post:

public static void main(String[] args) {
    String from = "A really long string full of ands and ors";
    String replaceFrom = "and";
    String replaceTo = "or";

    long initTime = System.nanoTime();
    String result1 = from.replace(replaceFrom, replaceTo);
    System.out.println("Time1: " + (System.nanoTime() - initTime));

    System.out.println(result1);

    StringBuilder sb1 = new StringBuilder(from);
    initTime = System.nanoTime();
    replaceAll(sb1, replaceFrom, replaceTo);
    System.out.println("Time1: " + (System.nanoTime() - initTime));

    System.out.println(sb1.toString());
}

// From https://stackoverflow.com/questions/3472663/replace-all-occurences-of-a-string-using-stringbuilder
public static void replaceAll(StringBuilder builder, String from, String to) {
    int index = builder.indexOf(from);
    while (index != -1) {
        builder.replace(index, index + from.length(), to);
        index += to.length(); // Move to the end of the replacement
        index = builder.indexOf(from, index);
    }
}

The explanation of the better performance of the second solution is that it relays on StringBuilder, a mutable object rather than on String an immutable one. See Immutability of Strings in Java for a better explanation.

This solution will work both using StringBuffer and StringBuilder, but as explained in Difference between StringBuilder and StringBuffer StringBuffer is synchronized and StringBuilder is not, so if you don't need synchronisation you better use StringBuilder.

Community
  • 1
  • 1
antonio
  • 17,130
  • 4
  • 43
  • 56
0

I just tried this, which resulted in :

100960923

197642683484

import java.util.Stack;
public class Test {   
     
public static String removeAll(final String stringToModify, final String stringToFindAndRemove) {
    if (stringToModify==null||stringToModify.length()==0) return new String(stringToModify);
    if (stringToFindAndRemove==null||stringToFindAndRemove.length()==0) return new String(stringToModify);
    if (stringToModify.length()<stringToFindAndRemove.length()) return new String(stringToModify);
    int lastChar = 0;
    int buffPos=0;
    Stack<Integer>stack = new Stack<Integer>();
    char[] chars = stringToModify.toCharArray();
    char[] ref = stringToFindAndRemove.toCharArray();
    char[] ret = new char[chars.length];        
    for (int a=0;a<chars.length;a++) {
        if (chars[a]==ref[buffPos]) {
            if (buffPos==ref.length-1) {
                buffPos=0;
                stack.pop();
            } else {
                if (buffPos==0) stack.push(lastChar);                   
                buffPos++;
            }
        } else {
            if (buffPos!=0) {
                for (int b=0;b<buffPos;b++) {
                    ret[lastChar]=ref[b];
                    lastChar++;
                }
                a--;
                buffPos = 0;
            }  else {
                ret[lastChar]=chars[a];
                lastChar++;                 
            }
        }                   
        if (stack.size()>0&&(lastChar-stack.peek()>=ref.length)) {
            while(stack.size()>0 && (lastChar-stack.peek()>=ref.length)) {
                int top = stack.pop();
                boolean f = true;                   
                for (int foo=0;foo<ref.length;foo++) {
                    if (ret[top+foo]!=ref[foo]) {
                        f=false;
                        break;
                    }
                }
                if (f) lastChar=top;                    
            }
        }           
    }
    if (buffPos!=0) {
        for (int b=0;b<buffPos;b++) {
            ret[lastChar]=ref[b];
            lastChar++;
        }
    }
    char[] out = new char[lastChar];
    System.arraycopy(ret,0,out,0,lastChar);
    return new String(out);
}
    
    public static void main(final String[] args) {        
        StringBuffer s = new StringBuffer();
        StringBuffer un = new StringBuffer();       
        for (int a=0;a<100000;a++) {
            s.append("s");
            un.append("un");
        }
        StringBuffer h = new StringBuffer(s);
        h.append(un);
        h.append("m");
        String huge = h.toString();
        String t = "sun";
        long startTime = System.nanoTime();             
        String rep = removeAll(huge,t);
        long endTime = System.nanoTime();
        long duration = (endTime - startTime);
        //System.out.println(rep);
        System.out.println(duration);
        startTime = System.nanoTime();      
        rep = new String(huge);
        int pos = rep.indexOf(t);
        while (pos!=-1) {
            rep = rep.replaceAll(t,"");
            pos = rep.indexOf(t);
        }       
        endTime = System.nanoTime();
        duration = (endTime - startTime);
        //System.out.println(rep);
        System.out.println(duration);
    }
}

I'd be interested to see how fast this runs on someone elses machine. Because my boss thinks my machine is fast enough! :)

Community
  • 1
  • 1
Richard
  • 884
  • 8
  • 18
  • This doesn't work for the following situation: removeAll("susunnm", "sun") returns "susunnm". It should return "m". – Jim Smith Feb 22 '15 at 18:27
  • By jingo you're right :) Modified the code.. You were correct, it does take well over 2 seconds! WARNING : This code is not properly tested. As you can see the actual method finishes in under a second, but the repetitive string replace takes what seems to be an hour or so! – Richard Feb 23 '15 at 22:41