How to remove space delimited columns in .txt

Question

I have a big space delimited .txt file (about 50 MB) and the structure of the file looks like this. I want to get rid of the first 8 space delimited columns.

L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!
L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!
L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.
L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?
L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.
L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow
L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.

desired output (in .txt):

They do not!
They do to!
I hope so.
She okay?
...

How can I do it in Python 2.7 or 3.4 (please specify the version), in R, or using linux command line? Thank you!

Possible duplicate of [Using awk to print all columns from the nth to the last](http://stackoverflow.com/questions/2961635/using-awk-to-print-all-columns-from-the-nth-to-the-last). It's not exactly the same question because it says 'awk' instead of 'Python', but it pretty much is the same, and the answers cover many ways of doing that with Linux command line tools. — TessellatingHeckler, Nov 19 '15 at 02:30
In R, `sub("^[+]+ ", "", data.table::fread(filename, sep = "$", header = FALSE, select = 5)[[1L]])` works for me and might be sufficiently efficient. — Rich Scriven, Nov 19 '15 at 02:46
@BenBolker - I thought about it, but it seems a bit pointless when one could just read in the entire file as a single column then `gsub` out the unwanted parts. — Rich Scriven, Nov 19 '15 at 04:17

Ben Bolker · Accepted Answer · 2015-11-19T02:47:10.647

8

On my Linux system (Ubuntu 12.04) this works fine:

cut -f 9- -d " " tmp.tmp >newfile.out

-f 9- specifies fields 9 onwards; -d " " specifies space-delimited.

My guess would be that this is pretty fast (since cut is a tool exactly for this purpose). It could probably be done in a couple of lines of Python but might be a little bit slower(?); doing it in R would probably be slow/inefficient.

edited Nov 19 '15 at 02:47

answered Nov 19 '15 at 02:19

Ben Bolker

173,430
21
312
389

Thank you! It works for printing the results. Could you also answer how to save the output to a .txt file? Thanks! – Meilin Nov 19 '15 at 02:39
3

@Meilin usually you just do `> filename` at the end of the command to direct the output to file – Rich Scriven Nov 19 '15 at 02:42

score 2 · Answer 2 · edited Nov 19 '15 at 03:44

An R approach:

txt <- "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!
L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!
L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.
L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?
L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.
L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow
L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie."

txt_obj <- readLines(textConnection(txt))
txt8 <- gsub( "^(([^ ]+[ ]){8})", "", txt_obj)
txt8
#----------
[1] "They do not!"                                  
[2] "They do to!"                                   
[3] "I hope so."                                    
[4] "She okay?"                                     
[5] "Let's go."                                     
[6] "Wow"                                           
[7] "Okay -- you're gonna need to learn how to lie."

score 1 · Answer 3 · answered Nov 19 '15 at 02:38

1

It's so easy to do this use Python slice:

with open('in_file') as in_f:
    with open('out_file', 'w') as out_f:
        for i in [i.strip() for i in in_f if i != '\n']:
            out_f.write(' '.join(i.split()[8:]) + '\n')

answered Nov 19 '15 at 02:38

Casimir Crystal

18,651
14
55
76

score 0 · Answer 4 · answered Nov 19 '15 at 02:25

0

This would remove all the characters from the upto the last +++

sed 's/.*+++[[:blank:]]\+//' file

answered Nov 19 '15 at 02:25

Avinash Raj

160,498
22
182
229

How to remove space delimited columns in .txt

4 Answers4