11

I have a string constructed from user keyboard types, so it might contain '\b' characters (backspaces).

I want to clean the string, so that it will not contain the '\b' characters, as well as the characters they are meant to erase. For instance, the string:

String str = "\bHellow\b world!!!\b\b\b.";

Should be printed as:

Hello world.

I have tried a few things with replaceAll, and what I have now is:

System.out.println(str.replaceAll("^\b+|.\b+", ""));

Which prints:

Hello world!!.

Single '\b' is handled fine, but multiples of it are ignored.

So, can I solve it with Java's regex?

EDIT:

I have seen this answer, but it seem to not apply for java's replaceAll.
Maybe I'm missing something with the verbatim string...

Community
  • 1
  • 1
Elist
  • 4,985
  • 3
  • 30
  • 66

5 Answers5

5

It can't be done in one pass unless there is a practical limit on the number of consecutive backspaces (which there isn't), and there is a guarantee (which there isn't) that there are no "extra" backspaces for which there is no preceding character to delete.

This does the job (it's only 2 small lines):

while (str.contains("\b"))
    str = str.replaceAll("^\b+|[^\b]\b", "");

This handles the edge case of input like "x\b\by" which has an extra backspace at the start, which should be trimmed once the first one consumes the x, leaving just "y".

Bohemian
  • 365,064
  • 84
  • 522
  • 658
  • Thanks, I will adopt this approach. Will accept this answer (if no one else has a magicl pure Regex to beat this...) – Elist May 11 '15 at 17:19
  • 1
    It just need a simple fix. if the input is `\bbbbbHellow\b world!!!\b\b\b.` it will have a result of `Hello world!!.` which I think is not expected and should give us an output of `bbbbHello world!!.` instead. Just remove the quantifier at the beginning or make `\b` a group. – Garis M Suero May 11 '15 at 17:25
  • @GarisMSuero - Your example prints `bbbbHello world.` as expected. – Elist May 11 '15 at 17:28
  • @GarisMSuero I am not sure what you mean. Your example produces `bbbbHello world.` which seems to be valid http://ideone.com/SlCuW7 – Pshemo May 11 '15 at 17:28
  • @Pshemo @elist Sorry about my confusion then. I still don't know what's the use of the quantifier `+` in this case. – Garis M Suero May 11 '15 at 17:32
  • @GarisMSuero It is to eliminate series of backspaces `\b\b\b...` placed at start of string which don't have any non `\b` character before them, so they can't be matched by `[^\b]\b`. This solution can be rewritten also as `str = str.replaceAll("[^\b]?\b", "");` which maybe will be easier to read for some people. – Pshemo May 11 '15 at 17:38
4

The problem you are trying to solve can't be solved with single regular expression. The problem there is that grammar, that generates language {any_symbol}*{any_symbol}^n{\b}^n (which is special case of your input) isn't regular. You need to store state somewhere (how much symbols before \b and \b it has read), but DFA can't do it (because DFA can't know how much sequential \b it can find). All proposed solutions are just regexes for your case ("\bHellow\b world!!!\b\b\b.") and can easily be broken with more complicated test.

Easiest solution for your case is replacing in cycle pair {all except \b}{\b}

UPD: Solution, proposed by @Bohemian seems perfectly correct:

UPD 2: Seems like java's regexes can parse not only regular languages, but also inputs like {a}^n{b}^n with recursive lookahead, so in case for java it is possible to match those groups with single regex. Thanks for @Pshemo comments and @Elist edits!

Community
  • 1
  • 1
qwwdfsad
  • 2,709
  • 13
  • 23
  • 1
    I suspect that this can be done with regex, but this regex would be extremely unreadable so it would be better to create our own parser. – Pshemo May 11 '15 at 17:05
  • I'm refering again to the C# example refered in my edit: http://stackoverflow.com/a/16604714/1609201. Is there an analogue in Java? If not, what's the difference in Regex feature between the two languages? – Elist May 11 '15 at 17:17
  • Now even with this Java regex: https://stackoverflow.com/questions/3644266/how-can-we-match-an-bn-with-java-regex which is about a^n b^n? – Pshemo May 11 '15 at 17:19
  • Seems that I wasn't aware of all java regex possibilities. It looks like java's regexes can parse not only regular languages and can match inputs like a^n b^n with recursive lookahead, so answer is "It's possible" (but be aware of StackOverflowError for large inputs) – qwwdfsad May 11 '15 at 17:23
4

This looks like a job for Stack!

Stack<Character> stack = new Stack<Character>();

// for-each character in the string
for (int i = 0; i < str.length(); i++) {
    char c = str.charAt(i);

    // push if it's not a backspace
    if (c != '\b') {
        stack.push(c);
    // else pop if possible
    } else if (!stack.empty()) {
        stack.pop();
    }
}

// convert stack to string
StringBuilder builder = new StringBuilder(stack.size());

for (Character c : stack) {
    builder.append(c);
}

// print it
System.out.println(builder.toString());

Regex, while nice, isn't well suited to every task. This approach is not as concise as Bohemian's, but it is more efficient. Using a stack is O(n) in every case, while a regex approach like Bohemian's is O(n2) in the worst case.

Community
  • 1
  • 1
Luke
  • 7,719
  • 3
  • 43
  • 74
  • 1
    Obviously, stack is the ultimate solution here, but I was looking for a quick and 'inline' way to solve this. Also I have Learned a few nice regex tricks... – Elist May 11 '15 at 20:44
  • 1
    @Luke and Elist, I like this solution, but when just "getting things working", it's amazing what you can do with regex in almost no code - it's a skill worth learning. And it performs OK too - sure not nanosecond fast, but a typical call to `replaceAll()` will only take a few microseconds; it's "fast enough" and you can quickly get on with doing the rest of the code and revisit it later if you need to squeeze more performance out of your app. – Bohemian May 11 '15 at 23:45
0

If i understand the question correctly, this is the solution to your question:

String str = "\bHellow\b world!!!\b\b\b.";
System.out.println(str.replace(".?\\\b", ""));
ioseb
  • 15,719
  • 2
  • 30
  • 28
0

This has been a nice riddle. I think you can use a regex to remove the same number of identical repeated characters and \bs (i.e. for your particular input string):

String str = "\bHellow\b world!!!\b\b\b.";
System.out.println(str.replaceAll("^\b+|(?:([^\b])(?=\\1*+(\\2?+\b)))+\\2", ""));

This is an adaptation of How can we match a^n b^n with Java regex?.

See IDEONE demo, where I added .replace("\b","<B>")); to see if there are any \bs left.

Output:

Hello world.

A generic regex-only solution is outside of regex scope... for now.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • The String itself contains '.', the pattern doesn't – Elist May 11 '15 at 16:57
  • Yes, that is why I removed it from the pattern. – Wiktor Stribiżew May 11 '15 at 16:58
  • Interesting, but still prints Hellow\b world. in my console – Elist May 11 '15 at 16:59
  • Try changing the `!` in the input string to `x` and see what happens. (-1) – Bohemian May 11 '15 at 17:02
  • 1
    @Bohemian: I have edited the answer with a modification of the "a^n b^n" regex. – Wiktor Stribiżew May 11 '15 at 21:23
  • @stribizhev you are totally missing the point. You regex does not work for the general case, only a narrow edge case. See this [IDEONE](http://ideone.com/zCPY7K) fork of your code (with only the input changed) showing it failing. I don't believe it can be realistically done in one call, because the quantity of consecutive backspaces and their combinations with letters are effectively infinite. A repetitive solution, such as [mine](http://stackoverflow.com/a/30173614/256196) is the only way – Bohemian May 11 '15 at 23:33