Trying to edit character items into a single sentence

Question

I'm trying to turn a sentence that my participants entered into something that looks like an actual sentence. Unfortunately, the software I am using records their exact response, key by key. This means that when they press "delete" it shows up as "backspace" instead of actually deleting the last item. Additionally, space shows up as "space". *

Here's an example of one of the responses:

sentence<-c("[g", "r", "a", "d", "u", "a", "t", "i", "o", "n", "space", 
"f", "r", "o", "m", "space", "g", "o", "backspace", "backspace", 
"c", "o", "l", "l", "e", "g", "e", "space]")

I need to create code that, when it sees a "backspace" deletes both the backspace command and the item before it.

Here is what I've tried:

letters <- NULL
    for (j in 1:length(sentence)){
        if (sentence[j] != "backspace"){
          letters[j] = sentence[j]
        }
        if (sentence[j] == "backspace"){ 
          letters[j] = letters[-j]
          letters[j-1] = NA}``

This does not work, as it is only providing one output at a time, instead of recursively editing the entire vector. Any insight would be appreciated!

EDIT: The tricky part here is the case of double backspaces. How can I tell it to function essentially like a keyboard, and delete both the "g" and the "o" in the example? But have it be flexible enough to know that when it sees a single backspace, it should only delete the item before it (not two)?

In your sample example instead of letter `o` why `g` has to be removed also? I'm trying to get a clear view of what I'm supposed to achieve. — Anoushiravan R, May 18 '21 at 19:09
@AnoushiravanR The OP is trying to recreate what the actual result of pressing the given keystrokes would be. In this case, two backspaces would undo the previous two character strokes (o and g). — Aaron Montgomery, May 18 '21 at 19:10

Chris Ruehlemann · Answer 1 · 2021-05-19T07:17:25.207

Here's a one-liner based on regex:

gsub("(\\b\\w\\b\\s){1,}(backspace\\s){1,}", "", paste0(sentence, collapse = ' '))
[1] "[g r a d u a t i o n space f r o m space c o l l e g e space]"

Here the idea is to paste0 the separate items together into a single string and from there to remove backspace and its preceding single-letters. The removal is based on these rules:

a deleted letter is a single alphanumeric character
each single-letter deletion is recorded by the string backspace
if several letters were deleted consecutively, then both the deleted letters on the one hand and the backspace strings on the other concatenate, that is, form 'blocks'
the number of items in these blocks is identical; i.e., 2 deletions --> 2 backspace strings, 3 deletions --> 3 backspace strings etc.
both deleted characters and backspace strings are separated by a single whitespace character

The regex picks up on these rules:

(\\b\\w\\b\\s){1,}: this asserts an expression (i) consisting of any single alphanumeric char \\w surrounded by non-space-consuming word boundaries (which secure the singularity and help avoid matching something like space) followed by a single whitespace \\s, and (ii) allowed to occur (at least) once or more times
(backspace\\s){1,}: this asserts, immediately following the above string), the string backspace followed by one \\s and occurring (at least) once or more times

If you need the separations back split the string on whitespace:

unlist(strsplit(gsub("(\\b\\w\\b\\s){1,}(backspace\\s){1,}", "", paste0(sentence, collapse = ' ')), " "))
 [1] "[g"     "r"      "a"      "d"      "u"      "a"      "t"      "i"      "o"      "n"      "space"  "f"     
[13] "r"      "o"      "m"      "space"  "c"      "o"      "l"      "l"      "e"      "g"      "e"      "space]"

This looks very nice (+1). I'm somewhat embarrassed to ask, but: how does this work? What in the regex syntax above "detects" the number of consecutive backspaces and behaves accordingly? Certainly it's the `{1, }` uses, but I'm not quite sure how and I'd like to understand it. — Aaron Montgomery, May 18 '21 at 20:27
Hi @AaronMontgomery Sorry for the late reply. Have added some explanations to the post. Hope this is the sought clarification — Chris Ruehlemann, May 19 '21 at 07:09
What kind of experiment were you conducting? I'd be glad to learn more about it. — Chris Ruehlemann, May 19 '21 at 07:10
This certainly deserves more than upvote. Mr. @ChrisRuehlemann do you have any suggestion on how to start learning `regex` ? I mean beyond the very basic principles. — Anoushiravan R, May 19 '21 at 08:19
Thanks for this. I couldn't agree more: regex is powerful. There is a dedicated community effort page here: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean. And, with all due humility, I describe regex a lot in my R book: https://benjamins.com/catalog/z.228 — Chris Ruehlemann, May 19 '21 at 09:57
Thanks, @ChrisRuehlemann -- thoughtful explanations like this are the best thing about the SO network. — Aaron Montgomery, May 19 '21 at 14:16

Anoushiravan R · Answer 2 · 2021-05-18T19:35:51.027

Updated Ok I modified your sentence vector as if we had 4 backspaces in your vector to test for various index detection and removal. This solution may sound a bit verbose and weird but I think it's effective for your cause. I created a matrix of all the indices. Based on your assumption they are consecutive two by two. So I added 2 extra indices as we wanted to also remove 2 other preceding indices and then remove all the values from your sentence:

library(dplyr)

indices <- matrix(grep("backspace", sentence, value = FALSE), ncol = 2, byrow = TRUE)
colnames(indices) <- c(3, 4)

# This gives us our primary index matrix to start with
      3  4
[1,]  7  8
[2,] 19 20

indices %>%
  as_tibble() %>%
  mutate(`1` = `3` - 2, 
         `2` = `3` - 1) %>%
  relocate(`1`, `2`) %>%
  as.matrix() -> indices

# Then we complete our matrix by all the indices that should be removed
      1  2  3  4
[1,]  5  6  7  8
[2,] 17 18 19 20

sentence <- sentence[-c(indices)]
sentence

 [1] "[g"     "r"      "a"      "d"      "o"      "n"      "space"  "f"      "r"      "o"     
[11] "m"      "space"  "c"      "o"      "l"      "l"      "e"      "g"      "e"      "space]"

Data

c("[g", "r", "a", "d", "u", "a", "backspace", "backspace", "o", 
"n", "space", "f", "r", "o", "m", "space", "g", "o", "backspace", 
"backspace", "c", "o", "l", "l", "e", "g", "e", "space]")

OP wanted the backspaces removed as well as the characters that should have been backspaced out, so the desired length is 24 for this example. — Aaron Montgomery, May 18 '21 at 19:07

Aaron Montgomery · Answer 3 · 2021-05-18T19:37:12.190

First, a note unrelated to your solution: note that letters is already a vector in R, so it's probably better to give your vector some other name.

The issue with your attempt is the negative indexing: when you write letters[-j], R interprets this as "all elements of the vector letters except for the one in the jth position". That must not quite be what you wanted, because you attempted to shove that entire long-ish vector into letters[j], which is a vector of length 1.

You could implement this with a while loop, for instance:

clean_backspace <- function(x){
  while(any(x == "backspace")){
    first <- min(which(x == "backspace"))
    x <- if(first == 1) {x[-1]} else {x[-(first - 0:1)]}
  }
  x
}

clean_backspace(sentence)

The idea is to look for the first instance of backspace, then remove both it and the entry before it.

EDIT: The old algorithm would choke fairly hard if the user pressed a backspace key first, which is bad. Updated to include a check to handle that case.

@akrun had the right idea with subtracting `0:1` as a vector instead of retyping `min(which(...))` twice as I originally did, so I've incorporated that here. — Aaron Montgomery, May 18 '21 at 19:22
I'll also note that this solution is pretty slow and scales poorly with the number of backspaces pressed by the user. — Aaron Montgomery, May 18 '21 at 19:27

AnilGoyal · Answer 4 · 2021-05-19T09:41:00.190

If you can do it in dplyr, this will do for any number of combinations of backspace. Acknowledgements - Hints taken from strategy adopted by @Anoushiravan in his answer

indices of every backspace are collected
converted to a dataframe with column say V1
check for consecutive indices
group_by on each consecutive group
subtract number of consecutive indices from each respective group
c() with original indices
deselect these indices from sentence

library(dplyr)

sentence[-c(grep("backspace", sentence) %>% as.data.frame() %>%
  setNames("V1") %>% group_by(V2 = cumsum(c(0, diff(V1)) != 1)) %>%
  mutate(V2 = V1 - n()) %>% pull(), grep("backspace", sentence))]

 [1] "[g"     "r"      "a"      "d"      "u"      "a"      "t"      "i"      "o"      "n"      "space"  "f"     
[13] "r"      "o"      "m"      "space"  "c"      "o"      "l"      "l"      "e"      "g"      "e"      "space]"

Check on more complex case say

sentence2 <- c("[g", "r", "a", "d", "u", "a", "c", "backspace","t", "i", "o", "n", "space", 
            "f", "r", "o", "m", "space", "g", "o", "t","backspace", "backspace", "backspace",
            "c", "o", "l", "l", "e", "g", "e", "e", "e","backspace","backspace", "space]")


sentence2[-c(grep("backspace", sentence2) %>% as.data.frame() %>%
  setNames("V1") %>% group_by(V2 = cumsum(c(0, diff(V1)) != 1)) %>%
  mutate(V2 = V1 - n()) %>% pull(), grep("backspace", sentence2))]

#>  [1] "[g"     "r"      "a"      "d"      "u"      "a"      "t"      "i"     
#>  [9] "o"      "n"      "space"  "f"      "r"      "o"      "m"      "space" 
#> [17] "c"      "o"      "l"      "l"      "e"      "g"      "e"      "space]"

^{Created on 2021-05-19 by the reprex package (v2.0.0)}

For better understanding of above, check this

grep("backspace", sentence2) %>% 
  as.data.frame() %>%
  setNames("V1") %>% 
  group_by(V2 = cumsum(c(0, diff(V1)) != 1)) %>%
  mutate(V2 = V1 - n())

# A tibble: 6 x 2
# Groups:   V2 [6]
     V1    V2
  <int> <int>
1     8     7
2    22    19
3    23    20
4    24    21
5    34    32
6    35    33

Trying to edit character items into a single sentence

4 Answers4