-2

I have a file with following name format and I want to split date time and file name and parse it to CSV file into different columns
Example file name
2019-12-05_18:02:28.801656_104_1_1575549141338.jpg

and I only need 2019-12-05, 18:02:28,104, 1575549141338

How do I use Regex to do this? Appreciate your help and feedback

tim
  • 790
  • 6
  • 21
mohit guru
  • 35
  • 6

3 Answers3

0

You could do this without using regex.

filename =  '2019-12-05_18:02:28.801656_104_1_1575549141338.jpg'
date1 = filename.split('_')[0]
time1 = filename.split('_')[1].split('.')[0]
number2 = filename.split('_')[2]
number1 = filename.split('_')[-1].split('.')[0]

or as a one-liner,

extract1 = filename.split('_')[0] + '_' +filename.split('_')[1].split('.')[0] +'_' + filename.split('_')[2]+'_' + filename.split('_')[-1].split('.')[0]
Trollsors
  • 472
  • 3
  • 16
0

With re

import re
res = (re.split("_", filename))
# [' 2019-12-05', '18:02:28.801656', '104', '1', '1575549141338.jpg']
date = res[0]
time = res[1].split('.', 1)[0]
info2 = res[2]
info1 = res[3]
filename = (os.path.splitext(res[-1]))[0]
print (date, time, info1, info2, filename)
#  2019-12-05 18:02:28 1 104 1575549141338

Output:

2019-12-05 18:02:28 1 104 1575549141338

Without re

import os
res = filename.split("_")
# [' 2019-12-05', '18:02:28.801656', '104', '1', '1575549141338.jpg']
date = res[0]
time = res[1].split('.', 1)[0]
info2 = res[2]
info1 = res[3]
filename = (os.path.splitext(res[-1]))[0]
print (date, time, info1, info2, filename)
#  2019-12-05 18:02:28 1 104 1575549141338

Output:

2019-12-05 18:02:28 1 104 1575549141338

Links:

https://docs.python.org/3/library/stdtypes.html

https://docs.python.org/3/library/re.html

PySaad
  • 812
  • 12
  • 23
0

You can split the string without re.

>>> filename = '2019-12-05_18:02:28.801656_104_1_1575549141338.jpg'
>>> filename.split('_')
['2019-12-05', '18:02:28.801656', '104', '1', '1575549141338.jpg']

It doesn't give you exactly what you want. You could take it a step further and split on multiple characters ['_' and '.'].

>>> import re
>>> re.split("[\._]",filename)
['2019-12-05', '18:02:28', '801656', '104', '1', '1575549141338', 'jpg']

You could go further and build a re to match the entire string

>>> re.match(r'^(\d+-\d+-\d+)_(\d+:\d+:\d+)\.\d+_(\d+)_\d+_(\d+)\.jpg$', filename).groups()
('2019-12-05', '18:02:28', '104', '1575549141338')

You could take that a step further and extract a datetime

>>> import datetime
>>> date, x, y = re.match(r'^(\d+-\d+-\d+_\d+:\d+:\d+\.\d+)_(\d+)_\d+_(\d+)\.jpg$', filename).groups()
>>> datetime.datetime.strptime(date, '%Y-%m-%d_%H:%M:%S.%f')
datetime.datetime(2019, 12, 5, 18, 2, 28, 801656)
  • Thank you Trevor for your reply .I have multiple files in a folder with same format .How do i proceed to read all files and copy into CSV.Can you please let me know how to proceed . – mohit guru Dec 06 '19 at 07:13
  • Those are two seperate questions in addition to parsing the filename. I'd look at [os.listdir()](https://www.tutorialspoint.com/python3/os_listdir.htm) and [csv](https://pymotw.com/3/csv/) – Trevor Ian Peacock Dec 06 '19 at 07:39