-2

I have a text file with the following contents:

19810101 20
19810102 31
19810103 1
19810701 1
19811105 5

I want something like this and save as a csv file.

1981 01 01 20
1981 01 02 31
1981 01 03 1
1981 07 01 1
1981 11 05 5

Is there an easy way to do this in R, bash or awk?

I was looking at similar posts: [1] Split a string every 5 characters [2]Split into 3 character length but these are all applicable for strings with the same length.

Community
  • 1
  • 1
Liliputian
  • 55
  • 10
  • 3
    What did you try for yourself? – Inian Feb 18 '17 at 06:01
  • 2
    "Is there an easy way to do this in R, bash or awk?" If that is the question, then the answer is "yes". – anishsane Feb 18 '17 at 06:14
  • Why not parse the date? – alistaire Feb 18 '17 at 06:20
  • 1
    I think, parsing the date would be an overkill if the date is in such standard format "yyymmdd"... `sed -r 's/(....)(..)(..)/\1 \2 \3/' file` is sufficient – anishsane Feb 18 '17 at 06:28
  • When you "save as a CSV file", do you expect any commas to be added, or do you keep the spaces? – Benjamin W. Feb 18 '17 at 06:33
  • Sorry I was out a while ago so I didn't check my post. Before I posted here, I was reading some similar post. For example, http://stackoverflow.com/questions/7452156/split-into-3-character-length, and this http://stackoverflow.com/questions/2247045/chopping-a-string-into-a-vector-of-fixed-width-character-elements, but I these are all applicable for splitting strings of same length. – Liliputian Feb 18 '17 at 11:06
  • Thank you all for your help. – Liliputian Feb 18 '17 at 11:06

4 Answers4

2

We can use extract from tidyverse

library(tidyverse)
extract(df1, v1, into = c("Year", "Month", "Day"), "(.{4})(.{2})(.{2})")

data

df1 <- structure(list(v1 = c(19810101L, 19810102L, 19810103L, 19810701L, 
 19811105L), v2 = c(20L, 31L, 1L, 1L, 5L)), .Names = c("v1", "v2"
), class = "data.frame", row.names = c(NA, -5L))
akrun
  • 674,427
  • 24
  • 381
  • 486
1

Input

$ cat f
19810101 20
19810102 31
19810103 1
19810701 1
19811105 5

Output

$ awk '{print substr($1,1,4),substr($1,5,2),substr($1,7),$2}' f
1981 01 01 20
1981 01 02 31
1981 01 03 1
1981 07 01 1
1981 11 05 5

For CSV

$ awk  '{print substr($1,1,4),substr($1,5,2),substr($1,7),$2}' OFS=, f
1981,01,01,20
1981,01,02,31
1981,01,03,1
1981,07,01,1
1981,11,05,5
Akshay Hegde
  • 15,144
  • 2
  • 16
  • 34
  • Please refrain from answering unless OP clearly shows his research towards solving their problem. – Inian Feb 18 '17 at 06:45
  • Wow sorry inian I didn't read fully, should I delete my answer now ? – Akshay Hegde Feb 18 '17 at 06:46
  • Just trying to maintain the _etics_ of this community. The OP being nearly 3 years in the site should know better to post an effort for this. Refer @anishsane comment above, it already solved the problem, but he hasn't posted it. Waiting for a proper effort shown. So kindly do likewise – Inian Feb 18 '17 at 06:48
  • @Inian : Sorry, I will take care of it from my next posts – Akshay Hegde Feb 18 '17 at 06:49
1

below will work

sed -r 's/([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]{2})/\1 \2 \3/' lines.txt|tr ' ' , > newfile.csv

or

sed -r 's/(.{4})(.{2})(.{2})/\1 \2 \3/' lines.txt |tr ' ' ,  > newfile.csv
Peddipaga
  • 74
  • 5
1
awk '{sub(/..../,"& ")sub(/../,"& ",$2)}1' file

1981 01 01 20
1981 01 02 31
1981 01 03 1
1981 07 01 1
1981 11 05 5
Claes Wikner
  • 1,369
  • 1
  • 7
  • 7