5

Is there any way in bash to parse this filename :

$file = dos1-20120514104538.csv.3310686

into variables like $date = 2012-05-14 10:45:38 and $id = 3310686 ?

Thank you

dreftymac
  • 27,818
  • 25
  • 108
  • 169
pufos
  • 2,642
  • 6
  • 30
  • 36
  • I don't know how to do it .. I'm asking if someone did this because I don't know where to start... – pufos May 16 '12 at 12:02
  • 1
    Have you read the `bash` man page? There's a whole section on variable expansion that should give you some ideas. – larsks May 16 '12 at 12:02
  • 1
    possible duplicate of [How do you parse a filename in bash?](http://stackoverflow.com/questions/49403/how-do-you-parse-a-filename-in-bash) – Romain May 16 '12 at 12:05

4 Answers4

14

All of this can be done with Parameter Expansion. Please read about it in the bash manpage.

$ file='dos1-20120514104538.csv.3310686'
$ date="${file#*-}" # Use Parameter Expansion to strip off the part before '-'
$ date="${date%%.*}" # Use PE again to strip after the first '.'
$ id="${file##*.}" # Use PE to get the id as the part after the last '.'
$ echo "$date"
20120514104538
$ echo "$id"
3310686

Combine PEs to put date back together in a new format. You could also parse the date with GNU date, but that would still require rearranging the date so it can be parsed. In its current format, this is how I would approach it:

$ date="${date:0:4}-${date:4:2}-${date:6:2} ${date:8:2}:${date:10:2}:${date:12:2}"
$ echo "$date"
2012-05-14 10:45:38
Victor Yarema
  • 859
  • 9
  • 14
kojiro
  • 67,745
  • 16
  • 115
  • 177
3

Extract id:

f='dos1-20120514104538.csv.3310686'
echo ${f/*./}
# 3310686
id=${f/*./}

Remove prefix, and extract core date numbers:

noprefix=${f/*-/}
echo ${noprefix/.csv*/}
# 20120514104538
ds=${noprefix/.csv*/}

format the date like this (only partially done:)

echo $ds | sed -r 's/(.{4})(.{2})(.{2})/\1.\2.\3/'

You can alternatively split the initial variable into an array,

echo $f
# dos1-20120514104538.csv.3310686

after exchanging - and . like this:

echo ${f//[-.]/ }
# dos1 20120514104538 csv 3310686

ar=(${f//[-.]/ })
echo ${ar[1]}
# 20120514104538

echo ${ar[3]}
# 3310686

The date transformation can be done via an array similarly:

dp=($(echo 20120514104538  | sed -r 's/(.{2})/ \1/g'))
echo ${dp[0]}${dp[1]}-${dp[2]}-${dp[3]} ${dp[4]}:${dp[5]}:${dp[6]}

It splits everything into groups of 2 characters:

echo ${dp[@]}
# 20 12 05 14 10 45 38

and merges 2012 together in the output.

Victor Yarema
  • 859
  • 9
  • 14
user unknown
  • 32,929
  • 11
  • 72
  • 115
  • @VictorYarema: Please don't put your favorite prompt in front of the commands. It makes it harder to copy/paste them and they aren't part of the code. – user unknown Nov 05 '15 at 01:37
  • Agree. Sorry for that. I did it like in some other questions and answers to make it more easy to dintiguish command and output. Later I noticed the issue that you just pointed out. After that I found that some other users simply write outputs commented. Latter approach makes it even better - you can copy commands with commented outputs and run it safely. I just didn't have time to change to that _style_. – Victor Yarema Nov 05 '15 at 01:47
  • Would you accept if I add hash signs at the beginning of each output line? – Victor Yarema Nov 05 '15 at 01:50
  • @VictorYarema: Since the output can't be interpreted (from the shell) in reasonable cases as new command, the worst thing which can happen is a reaction 'command not found: 20120514104538' or the like. And if a user reads the code and tries to understand it, it should be obvious. However, I did it myself (putting a hash in front for syntax highlightening/color decoration) so I would accept such edits. – user unknown Nov 05 '15 at 01:57
  • Fixed (commented outputs and quoted filename). Thaks for my edit review that turned out into revert. Thank for being so attentive. :) – Victor Yarema Nov 05 '15 at 02:15
3

Using Bash's regular expression feature:

file='dos1-20120514104538.csv.3310686'
pattern='^[^-]+-([[:digit:]]{4})'
for i in {1..5}
do
    pattern+='([[:digit:]]{2})'
done
pattern+='\.[^.]+\.([[:digit:]]+)$'
[[ $file =~ $pattern ]]
read -r _ Y m d H M S id <<< "${BASH_REMATCH[@]}"
date="$Y-$m-$d $H:$M:$S"
echo "$date"
echo "$id"
Dennis Williamson
  • 303,596
  • 86
  • 357
  • 418
1

You can tokenize the string first for - and then for .. There are various threads on SO on how to do this:

  1. How do I split a string on a delimiter in Bash?
  2. Bash: How to tokenize a string variable?

To transform 20120514104538 into 2012-05-14 10:45:38 :

Since we know that first 4 characters is year, next 2 is months and so on, you will first need to break this token into sub-strings and then recombine into a single string. You can start with the following answer:

  1. https://stackoverflow.com/a/428580/365188
Community
  • 1
  • 1
Ozair Kafray
  • 13,001
  • 7
  • 52
  • 76